Big data

Jump to: navigation, search


An all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process them using traditional data processing applications ( Wikipedia)

Purpose & benefits

Big data benefits the organisation through better decision making at all levels. It provides strategic, operational and tactical information to decision makers. Big data applications report, analyse and present data, ideally previously stored in a data warehouse although it may have been stored in other, less structured and scalable environments.


Big data refers to the huge and increasing volume of the data available, and the ways it can be processed. Data comes in many forms, structured or unstructured, and it may be generated by organizations themselves or obtained from third parties.

Analytics is the means for extracting value from this data by generating useful insights. Without analytics, businesses have no way of using their big data to establish competitive advantage.

from EY Big Data article


Big Data can also be referred to as Business Intelligence.

Implementation guide

The implementation of Big Data is similar to any successful IT implementation, it requires planning and resources.

From IBM blog post on Big Data implementation:

1. Gather business requirements before gathering data. Begin big data implementations by first gathering, analyzing and understanding the business requirements; this is the first and most essential step in the big data analytics process. If you take away nothing else, remember this: Align big data projects with specific business goals.

2. “Implementing big data is a business decision not IT.” This is a wonderful quote that wraps up one of the most important best practices for implementing big data. Analytics solutions are most successful when approached from a business perspective and not from the IT/Engineering end. IT needs to get away from the model of “Build it and they will come” to “Solutions that fit defined business needs.”

3. Use Agile and Iterative Approach to Implementation. Typically, big data projects start with a specific use-case and data set. Over the course of implementations, we have observed that organization needs evolve as they understand the data – once they touch and feel and start harnessing its potential value. Use agile and iterative implementation techniques that deliver quick solutions based on current needs instead of a big bang application development. When it comes to the practicalities of big data analytics, the best practice is to start small by identifying specific, high-value opportunities, while not losing site of the big picture. We achieve these objectives with our big data framework: Think Big, Act Small.

4. Evaluate data requirements. Whether a business is ready for big data analytics or not, carrying out a full evaluation of data coming into a business and how it can best be used to the business’s advantage is advised. This process usually requires input from your business stakeholders. Together we analyze what data needs to be retained, managed and made accessible, and what data can be discarded.

5. Ease skills shortage with standards and governance. Since big data has so much potential, there’s a growing shortage of professionals who can manage and mine information. Short of offering huge signing bonuses, the best way to overcome potential skills issues is standardizing big data efforts within an IT governance program.

6. Optimize knowledge transfer with a centre of excellence. Establishing a Center of Excellence (CoE) to share solution knowledge, plan artifacts and ensure oversight for projects can help minimize mistakes. Whether big data is a new or expanding investment, the soft and hard costs can be shared across the enterprise. Another benefit from the CoE approach is that it will continue to drive the big data and overall information architecture maturity in a more structured and systematical way.

7. Embrace and plan your sandbox for prototype and performance. Allow data scientists to construct their data experiments and prototypes using their preferred languages and programming environments. Then, after a successful proof of concept, systematically reprogram and/or reconfigure these implementations with an “IT turn-over team.” Sometimes, it may be difficult to even know what you are looking for, because the technology is often breaking new ground and achieving results that were previously labeled “can’t be done.”

8. Align with the cloud operating model. Analytical sandboxes should be created on-demand and resource management needs to have a control of the entire data flow, from pre-processing, integration, in-database summarization, post-processing, and analytical modeling. A well planned private and public cloud provisioning and security strategy plays an integral role in supporting these changing requirements. The advantage of a public cloud is that it can be provisioned and scaled up instantly. In those cases where the sensitivity of the data allows quick in-and-out prototyping, this can be very effective.

9. Associate big data with enterprise data: To unleash the value of big data, it needs to be associated with enterprise application data. Enterprises should establish new capabilities and leverage their prior investments in infrastructure, platform, business intelligence and data warehouses, rather than throwing them away. Investing in integration capabilities can enable knowledge workers to correlate different types and sources of data, to make associations, and to make meaningful discoveries.

10. Embed analytics and decision-making using intelligence into operational workflow/routine. For analytics to be a competitive advantage, organizations need to make “analytics” the way they do business; analytics needs to be a part of the corporate culture. Nowadays, the competitive advantage of data-driven organizations is no longer just a good ally, but a “must have” and a “must do.” The range of analytical capabilities emerging with big data and the fact that businesses can be modeled and forecasted is becoming a common practice. Analytics need not be left to silos of teams, but rather made a part of the day-to-day operational function of front-end staff.

Also see the [Implementation guide] in the Information technology article.

Success factors

Involve users in the definition and creation of the data model and system, ensure that the model and system align with and support operations.

Train the users in the data model and reporting tools.

Try to use global standards, e.g. SKOS and RDF, to make interfaces among different data sources easy and compatible.

See also the [Success factors] in the Information technology article.

Common pitfalls

Common pitfalls are those concerning creating and understanding the data model that is used, i.e. ensuring the source of the data is trustworthy and has integrity, that it represents the operational activities of the organization and provides accurate reporting. Taking time to plan and create the data model is key.

Training staff who will use the system to create reports is key and is often over-looked. Staff must understand the data model and how the system works and fits together so that accurate, reliable reports can be created.

Having/obtaining adequate resources: human (people trained and experienced in big data), financial (to obtain the hardware and software necessary for big data), and time (to plan and implement and maintain big data). This includes establishing the necessary processes and governance for big data.

Other pitfalls are those common to any technology implementation: ensure that it aligns with the organization's goals, objectives, and processes; involve users in the definition and creation of the data model and system; include change management activities in the roll-out and implementation.

See also the [Common pitfalls] in the Information technology article.

Related articles

Business analytics

Information technology

Knowledge management strategy

External articles and refereences

  1. EY: Ready for takeoff? Overcoming the practical and legal difficulties in identifying and realizing the value of data
  2. IBM blog post, 10 big Data Implementation Best Practices