You have probably been seeing the term big data popping up in many places. It has entered the business vernacular.
Johns Hopkins University School of Medicine used data from Google Flu Trends to predict increases in flu-related emergency room visits well before Centers for Disease Control issued its warning. Twitter updates tracked the spread of cholera in Haiti after the 2010 earthquake as accurately and weeks earlier than official reports.
Ford Motor Co. opened a Silicon Valley office in 2012 focused on big data, innovation and the user experience. Walmart records well over 1 million customer transactions every hour, saving them to databases estimated to have over 200 times the information in all of the books in the Library of Congress.
Big data have moved from technology circles into the business mainstream. But what is it?
The term big data describes the volumes of data generated by an enterprise, including Web-browsing trails, point-of-sale data, ATM records and other customer information generated within an organization, plus huge stores of data from new external sources such as social networks like Facebook, Twitter, YouTube and LinkedIn, sensor and even surveillance data and massive public and private databases.
These data sets can be so large and complex that they become difficult to process using traditional database management tools and data processing applications. The data is often unstructured, unformatted and unwieldy. But there can be important business information ready to be unleashed.
Ever improving computer hardware tools, such as virtually unlimited storage and continually faster processing speeds, combined with software tools, such as artificial intelligence, machine learning and pattern recognition, can be applied to these vast troves of data.
Managers can measure and analyze more precisely than ever before, thus allowing much more insight about their businesses and a better understanding of their customers. This knowledge can translate into more accurate predictions, wiser decisions and stronger performance.
It is critical to note that big data require management vision and human insight. There needs to be an understanding of how the mathematics differs from reality and an understanding of the business theory underlying the analytical results. Most importantly, the assumptions underpinning the analysis call for thorough understanding and critical evaluation.
Senior decision makers have a duty to embrace evidence-based decision making. The data-driven CEO welcomes, encourages and creates a culture that supports it. For example, when the CEO has a gut feeling about a business trend, the data might not support that intuition. Senior executives that are genuinely data driven will override their intuitions when the data do not agree with it.
Managers must understand the pitfalls and limitations, as well as the potential of big data. Good data scientists should in part be pessimistic with a great concern about what the information is truly indicating, being sensitive to what can go wrong with predictions and model designs.
Correlation is not causation. With such large data sets and well-honed measurement there is substantial risk of false discoveries. Just because elements of the data are highly correlated, does not mean that there is a causal relationship between them. Indeed, the correlation might not be meaningful at all from a business viewpoint. Management must use skill and experience to avoid this trap.
Furthermore, any mathematical model inevitably is a simplification. Modeling is used very successfully in the physical sciences. This is not the case in substantially more complex systems, such as economics and social systems, which are disciplines directly affecting business.
Managers must understand the observable business theory or insight that explains the statistical inferences. Conclusions are much stronger and more valuable when there is this business insight. The numbers do not speak for themselves. Managers speak for them, giving them meaning.
It is human nature to use analysis to confirm one’s own biases and prejudices. It can be all too easy to use massive data troves to see what management wants to see without realizing it is doing so. Big data might provide raw material for biased fact-finding ostensibly based on statistics.
Big data are a powerful tool to support smart decision-making. Managers must employ it thoughtfully, being fully aware of the obstacles to maximizing its utility. Management can realize value of big data to better serve customers and the enterprise as a whole.