By: Christopher Surdak, author of Data Crush: How the Information Tidal Wave Is Driving New Business Opportunities (Amacom, 2014)
Big Data is receiving enormous amounts of press these days, and yet there’s a complete lack of consensus on what exactly constitutes “Big Data .”
After all, haven’t companies analyzed their data for decades? Haven’t they mined their existing data to gain new insights into how to improve operations, or how to serve customers better, or how to reduce defects?
Big Data Is Not…
Back in the 1980s I worked for General Electric as an engineering intern. Most of my time on that eight-month assignment was spent analyzing and trying to learn from defective data from one of its computer production lines. Thus, I’m quite certain that data analysis is nothing new.
In fact, any company that hasn’t been mining its existing transactional data for insights by this time has probably already gone out of business.
So the first part of our definition of “Big Data” is defining what it is not; “Big Data” is not the analysis of corporate structured, transactional data… the sort of stuff that is stored in ERP, CRM, SCM, and other corporate systems.
So, then, is “Big Data” the process of analyzing unstructured, collaborative systems such as email, collaboration platforms like SharePoint, or corporate social platforms like Jive? Again, the answer is “no.”
Unstructured data generally doesn’t lend itself to statistical analysis; fifty thousand business emails might contain not a scratch of corporate intelligence, and yet one particular email or web post might be worth millions to a company if it can be found and acted upon. Unstructured data is more easily mined using search tools or-through social processes.
What IS Big Data?
So, we have just defined two things that “Big Data” is not. So, what is it? Big Data really consists of two things.
First, it is the joint analysis of structured and unstructured data from within a company.
Second, it is the joint analysis of internal data sources and external data sources, both structured and unstructured, again to find new insights.
Naturally, both of these types of analysis have an element of “bigness” to them, implying that the sources of data are measured in terabytes, if not petabytes or even exabytes.
Companies that tap into this first class of “Big Data” analysis can then effectively take advantage of the second class, which is combining internal data sources, both structured and unstructured, with external data sources. Those external sources may also be either structured and unstructured, or both, depending on the questions being asked.
Again, part of the key value of these analyses is that they haven’t been done before; in fact, they may not have even been possible prior to the last four of five years.
Big Data Put into Practice
A simple example might add some clarity here. Let’s pretend that you’re the owner of a soda vending machine company, with two hundred vending machines located throughout your local county. You have several dozen drivers who travel on regularly scheduled routes to check and ensure periodically that each vending machine doesn’t run out of soda.
Over time, your drivers have noticed a great deal of variability regarding which soda machines sell a lot of their inventory and which sell very little. You have many years worth of data on these variations in inventory, but could never seem to find any logical patterns to explain why one machine might go weeks without using up its inventory and then suddenly become empty over the course of a day or two.
If we combine some non-traditional data with that at our business’ disposal, we may start to find some interesting trends that will help us understand the variations in demand that our data show.
For instance, if we were to combine our sales data with that of the local weather around each vending machine, we may find that temperature, humidity and precipitation all have an impact on soda sales.
Or, when the local mall has a big promotional event like a music show, our machines at that store similarly run out of stock very quickly.
By combining our traditional datasets with nontraditional data sets, we can start to uncover underlying findings in the collective data sets that are not obvious when looking at them separately.
These nontraditional data sets are typically very large, hence the term “Big Data,” and they typically capture relevant external factors that our traditional data collection lacks.
This, then, is the value of Big Data analysis.
Data Crush: How the Information Tidal Wave Is Driving New Business Opportunities by Christopher Surdak
© 2014 Christopher Surdak
All rights reserved.
Published by AMACOM Books
Division of American Management Association
1601 Broadway, New York, NY 10019
Christopher Surdak is an industry expert in collaboration, social media, information security, regulatory compliance, and Big Data with over 20 years of professional experience. He is a Technology Evangelist for Hewlett Packard, focusing on Information Governance.