Most people working in technology are in some way part of the vast conversation taking place around Big Data. That has to be hundreds of thousands if not millions of individuals. “Powerful insights” is the commonly heard phrase. It should also be completely clear to the public by now that data in large enough sets, analyzed expertly, reveals patterns that humankind never saw before.
And yet there was surprise and dismay to find out that the National Security Agency (NSA) has been building out a world-class Big Data capability since 2007. We’re shocked they have access to and can see patterns in our phone calls? Come on, people, wake up.
Shame on us
Shame on us for not thinking the government would monitor us more in an age of Big Data. Throughout the past decade we posted, tweeted, liked, shared and otherwise participated in an absolute orgy of social media connection and sharing (with plenty of over-sharing). We showed the Powers That Be that we weren’t too concerned about our own privacy.
Before we get too outraged, know this: The NSA’s powerful capability that’s suddenly spinning up the press and certain people (especially those with a political agenda) was released as open source in 2011 and showed up in Defense Department budget documents a while back. It was hiding in plain sight (not buried in vast amounts of data, either).
If we didn’t know, it was because we didn’t care to know.
Accumulo
Sounding like a superhero more than science tool, Accumulo is a mash up of Apache’s Hadoop, ZooKeeper and Thrift is based on Google’s brilliant BigTable design. As GigaOm’s Derrick Harris says in Under the covers of NSA’s big data effort, Accumulo allows the NSA to do something perhaps even more important (and practical) than listening to individual conversations…it allows them to analyze, “…trillions of data points in order to build massive graphs that can detect the connections between them and the strength of the connections.” Doing this with Verizon’s call records, for example, allows the NSA to very quickly see, “…how a suspected terrorist’s network might spread and who might be involved.”
Pretty colors and lines
A similar (if more simplistic) capability was on Facebook and LinkedIn a while back and we were all amazed at the colorful patterns that emerged from our relationships. I took the time to investigate and label my social graph without realizing how revealing that would be a couple of years later. I also enjoyed the technology without pausing much to think about how governments could use the same tools (albeit with more data and more power) to understand patterns leading up to and after terrorist attacks.
And we had a perfect chance to see it in action recently. Who, during the Boston Bombing craziness, would have objected to using the NSA’s power to figure out if the brothers acted alone? Let’s be honest…we would have turned the switches ourselves if we could (many tried, in an ugly effort at ‘social media’ investigation).
Patterns of change
What gets really interesting isn’t just the data in the moment, but analysis of how patterns change over time. Looking at my own social graph between May 2011 and today, these two years show a change in job, role and the effect of becoming an active blogger.
If this was an analysis done by someone who didn’t know me, the color patterns would demonstrate that something significant changed in Chris Taylor’s life. If this was a graph of my calls, it could also indicate radicalization, a shift to a new network, and/or a departure from my historical norms. It would certainly be revealing.
What does it reveal? It shows that my former ILOG workers became integrated into IBM (green, purple and blue of 2011 is now one fuschia color) and that TIBCO isn’t quite as integrated yet with Nimbus (the two shades of orange and their spidery connections). My social graph reveals more than just data about me. Data about my data is just as interesting.
It’s the metadata, stupid
This reminds me of a talk I heard FedEx give at TUCON two years ago. The speaker said that the data about the package is more important than the package itself. In the same way, NSA’s knowledge of a phone call’s metadata is far more important than listening in on conversations, and far less manpower-intensive. The NSA spots patterns just as I can see the merging of ILOG with IBM, they just do it with massively more data and infinitely more processing.
Welcome to the Age of Big Metadata.
The U.S. Government has been quick to defend NSA’s PRISM by calling it “just metadata.” A news conference on Thursday with Senators Chambliss and Feinstein referred to the NSA’s work and its use of “metadata” several times in terms that made it seem like lesser data. It isn’t.
It sounded like this:
To my knowledge, we have not had any citizen who has registered a complaint relative to the gathering of this information. It is simply what we call metadata that is never utilized by any governmental agency unless they go back to the FISA court and show that there’s real cause as to why something within the metadata should be looked at.
Civil rights issue
Before we wring our hands too much, we’ve been warned about the privacy implications of Big Data in many ways. The best is probably still Alistair Croll’s remarkable article Big Data is our generation’s civil rights issue, and we don’t know it on Solve for Interesting where he says:
There are brilliant examples of how a quantified society can improve the way we live, love, work, and play. Big Data helps detect disease outbreaks, improve how students learn, reveal political partisanship, and save hundreds of millions of dollars for commuters—to pick just four examples. These are benefits we simply can’t ignore as we try to survive on a planet bursting with people and shaken by climate and energy crises.
But governments need to balance reliance on data with checks and balances about how this reliance erodes privacy and creates civil and moral issues we haven’t thought through.
As usual, Alistair, you nailed it, and long before the media began to circle.
Respectfully I’d be careful about making a number of these assumptions, both on the legal issues at hand and the technologies.
First - it is far from clear that domestic collection writ large is the actual purpose of the program, nor that all the information collected is actually reviewed. You can’t utilize information that no longer exists, but that does not mean what is collected will inherently be used
Second - from a technology perspective Accumulo is just one of a wide range of technologies here at work here. While I cannot share information beyond what has already been made public, what was open sourced is a not the same stack as is in use.
It’s been interesting to observe the evolution of commentary on the houha about the NSA and government analysis of electronic Big Data traffic: (1) Democratic politicians: I’m shocked, positively shocked, that the government is eavesdropping on private conversations, (2) President Obama: There’s a tradeoff here between security and privacy — it’s not so easy to draw the line, (3) Metadata is legal; it’s different than eavesdropping on actual conversations, (4) We should be concerned even about metadata being analyzed. For me, I net it out this way: the world is moving to transparency. If you have something to hide, be worried. If not, this is good.
Brad, I’m in your camp. With pressure cooker bombers likely to be more common, the public will welcome ways to track down those most likely. The reaction after Boston was all of the proof we needed on that topic.
Jeanne wrote a blog on Sunday that covers the other half of this argument…what to do about it. With a surveillance society becoming a reality, the key is to create the oversight necessary to keep us safe.
Thanks for your comment.
I want to get a job monitoring data. That way your tax dollars can pay me to watch cat videos all day.
It’s interesting how the powers that are - want your data - yet they won’t share their data back in an open transparent way. It doesn’t matter what political party is in power here; they both play the same game with the same endpoint in mind.