The Era of Big DataPosted: August 18, 2011 Filed under: Measurement and Analytics | Tags: big data, breakthrough improvement, data mining, future of six sigma, structured data Leave a comment
It is clear that the quantities of pure, raw data flowing into and through organizations are growing and that the rate of growth itself is accelerating. This is both an opportunity and challenge for professionals working in areas of performance improvement. The opportunity is to turn that raw data into insight and through smarter methods of mining and refining that data, reveal hitherto unknown insights that could confer competitive advantage. Indeed the skills of data mining, refining, linking, and communicating are themselves bases for advantage. The challenge is that building these skills is not a trivial exercise nor is changing the habits and mindsets of managers who may not have the skills to operate in data-rich environments or have the inclination to pay heed to the insights gleaned.
In their survey of big data last year (February 25th, 2010), The Economist wrote:
According to a 2008 study by International Data Corp (IDC), a market-research firm, around 1,200 exabytes of digital data will be generated this year. Other studies measure slightly different things. Hal Varian and the late Peter Lyman of the University of California in Berkeley, who pioneered the idea of counting the world’s bits, came up with a far smaller amount, around 5 exabytes in 2002, because they counted only the stock of original content.
What about the information that is actually consumed? Researchers at the University of California in San Diego (UCSD) examined the flow of data to American households. They found that in 2008 such households were bombarded with 3.6 zettabytes of information (or 34 gigabytes per person per day). The biggest data hogs were video games and television. In terms of bytes, written words are insignificant, amounting to less than 0.1% of the total. However, the amount of reading people do, previously in decline because of television, has almost tripled since 1980, thanks to all that text on the internet. In the past information consumption was largely passive, leaving aside the telephone. Today half of all bytes are received interactively, according to the UCSD. Future studies will extend beyond American households to quantify consumption globally and include business use as well.
Significantly, “information created by machines and used by other machines will probably grow faster than anything else,” explains Roger Bohn of the UCSD, one of the authors of the study on American households. “This is primarily ‘database to database’ information—people are only tangentially involved in most of it.”
Only 5% of the information that is created is “structured”, meaning it comes in a standard format of words or numbers that can be read by computers. The rest are things like photos and phone calls which are less easily retrievable and usable. But this is changing as content on the web is increasingly “tagged”, and facial-recognition and voice-recognition software can identify people and words in digital files.
On the continuum of performance improvement, there is clear and important delineation between organizations that are either data poor or have some data but lack managers and workers with the skills and mindset to use that data properly, and those organizations that are both data-rich and possess leadership that have the skills and wherewithal to use tools that can take advantage of that data. For example, an organization that does very little proper measurement cannot even think about something as basic as plotting a control chart (or they may have that data but too many managers are unwilling or unable to properly create and interpret such a chart). On the other hand, are organizations with both the data and the skills to perform proper, basic analytics.
But big data is beyond this distinct. The nature of the data collected and its quantities requires both new tools and old tools used in different ways. In many cases, the nature of data is vastly different from those things companies were used to analyzing in the past. For example, much of the new data streams from the external environment, especially real-time data from the population at large.
As part of the background to an upcoming set of articles on the future of Lean Six Sigma and improvement in general, it is important to identify the key issues that any future vision of Lean Six Sigma needs to consider and incorporate into that vision. Understanding the nature of Big Data is one of those key issues. Firms like McKinsey & Company are gearing-up major efforts consisting of 500 or more consultants specifically focused on building the specialized horsepower needed to help their clients take advantage of Big Data.
If you are interested, here’s a short article from The Economist on Big Data.