I’ll try not to bore you with the description of Big Data’s volume, velocity and variety. You can find that just about anywhere (and just about everywhere). By now, we get it.
But what’s less commonly talked about is why Big Data is such a problem beyond size and computing power. The reasons behind the conversation are the truly interesting part and need to be understood. Here you go…there are three trends that are driving the discussion and should be made painfully clear instead of lost in all the hype:
- We’re digitizing everything. This is big data’s volume and comes from unlocking hidden data from common things all around us that were known before but weren’t quantified, stored, compared and correlated. Suddenly, there’s enormous value in the patterns of what was recently hidden from our view. Patterns offer understanding and a chance for prediction of what will happen next. These each are important and together are remarkably powerful.
- There’s no time to intervene. This is big data’s velocity. All of that digital data creates massive historical records but also rich streams of information that are flowing constantly. When we take the patterns discovered in historical information and compare it to everything happening right now, we can either make better things happen or prevent the worst. This is revenue generating and life saving and all of the other wonderful things we hear about, but only if we have the systems in place to see it happening in the moment and do something about it. We can’t afford enough human watchers to do this, so the development of big data systems is the only way to get to better things when the data gives humans insufficient time to intervene.
- Variation creates instability. This is big data’s variety. Data was once defined by what we could store and relate in tables of columns and rows. A world that’s digitized ignores those boundaries and is instead full of both structured and unstructured data. That creates a very big problem for systems that were built upon the old definition, which comprise just about everything around us. Suddenly, there’s data available that can’t be consumed or generated by a database. We either ignore that information or it ends up in places and formats that are unreadable to older systems. Gone is the ability to correlate unstructured information with that vast historical (but highly structured) data. When we can’t analyze and correlate well, we introduce instability into our world. We’re missing the big picture unless we build systems that are flexible and don’t require reprogramming the logic for every unexpected (and there will be many) change.
There you have it…the underlying reasons that Big Data matters and isn’t just hype (though there’s plenty of that). The digitization, lack of time for intervention and instability that Big Data creates leads us to develop whole new ways of managing information that go well beyond Hadoop and distributed computing. It’s why big data presents such enormous challenge and opportunity for software vendors and their customers, but only if these three challenges are the drivers and not opportunism.
I’d love to get your feedback.
Thanks to TIBCO CTO Matt Quinn for the ideas in this piece.
Nice piece Chris. Great to see everyone everywhere finally adopting Gartner’s “3Vs” of Big Data that we first introduced over a dozen years ago. For future reference, the professional courtesy of a citation, or just interest, here’s the original piece I wrote in 2001 on “The Three Dimensional Data Challenge”: http://goo.gl/wH3qG. -Doug Laney, VP Research, Gartner, @doug_laney