Thank you, Edward Snowden, for making my job easier. I’m withholding judgement on whether you made the world a safer or better place, but you’ve certainly awakened it to the concept of Big Data.
I’ve spent the better part of the past few years involved in Big Data projects and writing about its power and use cases. I’ve spoken at conferences and led workshops on the topic. But nothing I’ve done compares to the spotlight that Snowden cast on Big Data by disclosing the following: It’s no longer necessary to listen in to communications, be it phone or Web nor is it necessary to physically track someone.
Snooping on a call, reading a message or ‘tailing’ someone is so Cold War. Big Data is much bigger and more effective than traditional spy trade-craft.
It’s the Big, not the Data
In the end, it took the most secretive of organizations, the NSA, for the general public to understand something that’s been said many, many times but not understood: The larger the data set, the more visible and correlated the patterns that were previously hidden. Given enough data, it becomes easier to see the ‘what’ even without knowing the ‘how’. Big Data ushers in an era of information on an unprecedented scale and that’s the real difference. And it isn’t just the scale that matters. Scale is merely a physical constraint. What makes Big Data reign supreme is our ability to apply algorithms that pull it apart and find correlation between what might have otherwise been unseen across any number of discreet data sources.
It’s the analytics, too
Without the analytics, Big Data would be, well, just Big. By looking for patterns from those discreet sources, human activity can be derived even where it can’t be observed. Credit card transactions combined with geolocation from a cell phone and digital video creates a portrait of an individual that can then be matched against patterns that are known to be malicious or beneficial. Banks and marketers have known this for some time.
Fraud and offers
Organizations that are well-prepared for Big Data are already watching for fraud or engaging with customers and, in the case of governments, keeping a careful eye on citizenry and foreigners. If this is good or bad depends upon a person’s views on privacy but it also brings tangible, life-saving benefits, like earlier warning of disease outbreaks and better healthcare. In the case of the current flap over Snowden, our comfort level with government-monitored Big Data likely corresponds with how we feel about that government’s trustworthiness and likelihood of abuse. Is there oversight? Do we trust their intentions?
And there are those who say that privacy should be absolute regardless.
Predictive things
The tougher question is what we do about predictive analytics…the kind that show that when X and Y happen, Z is n% likely to occur as well. That’s fine when it delivers security or health, but what about when it indicates that someone is likely to do something bad but hasn’t yet? Is that fair? How far is far enough when the issue is the individual versus the collective good? Big Data will certainly redefine long-held expectations about civil rights.
Latent value
It’s clear by now that Big Data has many facets to consider. In their book Big Data, authors Cukier and Schoenberger argue that we’re just now beginning to unlock the, “implicit, latent value of information.” They argue that we have access to data that was baked into systems and processes long before we thought about Big Data and a future ability to gather and correlate. Like the Gatling Gun forever changed warfare, our tactics need to change to correspond to changing technology and public sentiment.
Click here for an excellent NY Times review of their book.
Chris - I think quite a few data scientists would dispute your assertion that “The larger the data set, the more visible and correlated the patterns that were previously hidden”. In most cases simply adding more data isn’t helpful, how/what/when data is collected matters greatly in getting to insight.
I’d also suggest that is isn’t necessarily correlation that you are after, since correlation is a very different thing than causation, and actually weeding out spurious correlation is a bigger task (not smaller) as the scope/breadth of data grows.
Thanks Tom, for your comment. Indeed like many things, data has a bell curve of effectiveness and getting the right amount along with the how/what/when is the crux move.
Also, if you want to go into causation versus correlation, I’d be happy to have you guest write.
I do not understand why WORDPRESS is rejecting my e-mail address?!! I registered it with them more than 1 1/2 years ago. Some of these servers are behaving like BIG DADDY/ BIG DATA. QUE PASA USA?? PERUCHO-OCHO