Big Data must not be an elephant riding a bicycle

Forrester’s John Rymer sums up his opinion succinctly when he says, “Big Data: The worst category name ever.” It certainly has challenges in name and how people conceive of it. Big Data as the hype would have it, I call, “The elephant riding the bicycle.” I’ll give you the seven things you need to consider, but first, let’s look at the hype.

Big Data hype is everywhere. Fresh from three conferences in the past four weeks, Big Data has been the single biggest topic of discussion with customers, new acquaintances, old friends, coworkers and industry analysts. It would be hard to overstate how much attention Big Data is getting.

But as with any hype cycle, there is an enormous amount of questioning going on around where and how Big Data delivers value. Thinking back to the early Internet, there were very similar conversations. In 1995, there was a growing feeling that something big was coming from The Web but it was hard to sort the success from the sales pitch. People were spending money to create websites with little plan for why. A few learned detractors even published bold criticisms.

Despite the doubt1 and slow start, we found our cyberspace groove and launched wave after wave of new ways to buy, sell, read, listen, watch and converse. For billions of people, the Internet is simply an expectation.

Faster this time around

But the pace of change is now much faster and this time around the pump has been primed…we’ve become very quick at jumping on new technology and far better at marketing software and services. What took years to crank up in the 90′s takes much, much less time today. In what seems like no time, there are a wide variety of vendors selling brand-new products with Big Data labels and more than a few old products getting papered over with Big Data buzzwords. The market is confused, perhaps a bit skeptical and for good reason.

Solutions are well ahead of success stories. There are far more companies selling Big Data solutions than companies providing details around how Big Data is creating value.

What about Hadoop?

Before you think Big Data lacks success stories, it doesn’t. Powerful Big Data analytics have already created enormous value for companies like Google, Facebook and Yahoo that needed to monetize their vast amounts of member and search data. Apache Hadoop, the leading open source solution, was created from Google’s MapReduce and the Google File System but adopted by a Yahoo employee. Facebook uses Hadoop.

No one doubts the value those companies gained but they were in unique positions and employed scores of people in creating and maintaining Big Data solutions. It doesn’t make sense for everyone to go through that effort or throw money at the scarce data scientists needed to run the system. Most organizations shouldn’t follow that model and for one very good reason that has nothing to do with Hadoop. It has to do with elephants and bicycles.

Rymer weighed in on Hadoop with, “…too many people now seem to think that Hadoop is big data, when Hadoop is just one of the several big-data solutions available — and Hadoop isn’t good for many big-data scenarios.

Elephants on bicycles

Unlike the Internet circumstances in 1995, Big Data has a higher barrier to entry and requires a broader perspective to connect, understand, anticipate and act on the intelligence gleaned from data that’s known for its high velocity, large volume and wide variety of sources and types. ”

Simply applying distributed storage and processing (like Hadoop) to extremely large data sets is like putting an elephant on a bicycle…it just doesn’t make business sense.

And there’s another challenge to those thinking about the Google, Facebook and Yahoo model: The need for operational decision making isn’t sufficiently supported by offline, batch ‘research projects’. Things are happening much faster than batch allows. More on that in a moment.

Deep foundation

If the data ecosystem isn’t sufficient for creating value from Big Data, value will be elusive. Data will be ‘dirty’, silo’d and response will be too late. There is a logical path forward emerging among forward-leaning organizations, analysts and a small set of software vendors. It is a balanced ecosystem with a deep foundation. It starts even before the data arrives and it has its roots in how we talk about the challenge and where we want to create value.

As Rymer puts it, “We have yet to see a one-size-fits-all suite or solution for all of these scenarios.” It takes foundational strategy and technology to make Big Data ‘work’.

Here are our seven steps to avoid getting crushed by the elephant riding a bicycle:

1. Use case clarity

Those who haven’t figured out their uses cases for Big Data are in danger of confusing the term with its purposes, a focal point of Rymer’s “‘crabby old guy’ rant”. This is the most likely reason companies who are finding success don’t like to use the words “Big Data”. Instead, they refer to their use cases with terms like predictive analytics, digital customer experience management, compliance, sense and respond, behavioral insights and compliance.

The purpose matters more than the hype and the hype will eventually go away. Those without purpose will be in an Emperor’s New Clothes scenario as either the king or his advisers.

2. Data enablement

You can’t have Big Data without data. While there are companies that simply crunch one-dimensional data, like Twitter feeds or customer shopping patterns, the typical enterprise needs to manage much more complex data sets coming from anywhere and everywhere. There is traditional data and its absolute integrity, defined by complex schema, sitting at rest in databases and log files. There is also the data that arrives in the operational moment, while business is happening and there’s no time to treat, store and recall. This in-motion data is often unstructured and dynamic. It presents challenges for an organization that can’t grab it real-time and make use of it.

Which is where in-memory data grids matter. The ability to use what Rymer calls “elastic caching platforms” allows data at rest and data in motion to be married up in cache, where flexibility and speed matter and many operations and queries can take place at the speed of business.

If you’re skeptical, think of how much information is only important in a moment. We call this volatile data and it ‘dies’ before it can be extracted, transformed, loaded into a database and then queried. Without cache memory, volatile data has no value. Commodities/equities trading and healthcare are big users of volatile data, but as putting it to use gets easier, more and more organizations will demand this capability.

So far, we’ve only talked about typical data sources, but as the world adapts more sensors and live feeds, the amount, speed, types and volatility of data will increase rapidly. Elastic caching options will be a highly critical piece.

3. Infrastructure pipes

We’ve written three recent success stories that stand out from the noisy crowd talking Big Data. Mercy Healthcare, The Nielsen Company and FedEx Services all reap powerful benefits from Big Data solutions. But they all have something basic in common. They all made significant investments in a service oriented architecture (SOA) with interoperable services that allow their organizations to move large amounts of data quickly from outside in, inside out, and across any application, database or other data source. They do this through an enterprise service bus (ESB) that moves data automatically between disparate sources that publish and subscribe rather than trying to connect each application or database separately.

If you think about it for a moment, the need to connect to data precedes the ability to do any form of analysis. Without it, you have the bicycle, unable to support the heavy elephant of Big Data.

4. User-friendly analytics

Organizations that have their infrastructure house in order are ready to crunch data and find meaningful insight. Insight shows up as patterns in the data that can be discovered by using complex algorithms. The best tools have visual interfaces and are useful to business people who don’t need to understand every technical nuance but can manipulate data to discover insights. Drag and drop is the new black.

Done right, visualization is the starting point for creating interactive dashboards that aggregate, present and allow manipulation of large data sets across disparate data sources.

If a PhD is necessary to manage the front end of analytics, the system is unsustainable and won’t allow the typical enterprise to successfully mine data at the pace necessary for meaningful change.

5. Sense and respond

Once a pattern is understood, there needs to be a way to anticipate its occurrence to either maximize its benefit or take steps to prevent or mitigate the problem it presents. Automated event processing is the modern equivalent of what was studied and taught by U.S. Air Force Col. John Boyd, a fighter pilot who realized that decision-making occurs in recurring cycles. He called the process OODA, for Observe, Orient, Decide and Act. He was amazingly effective with his system and it is still taught today in air combat schools.

What Boyd had to do based on training and awareness, we now support with computerized systems that can handle far more data, far faster. Systems trained to watch for events in combination can also apply logic to a pattern discovery to call for more analysis, look for follow-on events, or respond immediately in complex ways.

And just like OODA, it feeds back into itself. Events that are streamed into an event processing engine from either data found across the ESB or coming ‘live’ from external feeds may not be understood in the moment, but are processed after-the-fact in the very same analytical tools that discovered data patterns in the first place. This creates a virtuous cycle of discovery-operation-improvement-operation and so on.

Making sense of the competitive landscape is truly a function of a Big Data solution, but event processing is the secret sauce. It reflects the highly proprietary choices each organization makes. It sets the stage for the highest value step of a Big Data solution.

6. Putting solutions in play

Whether preventing deadly illnesses, providing near-real-time global market research, or solving congestion in the logistics network, Big Data’s big value comes from the action that is taken. The most effective way to respond to opportunity and risk is to have control over both manual and automated processes. Business process management suites are the flexible and fast way to drive efficiency in execution and consistency in response.

Social media has a powerful role as well, applying the Big Data ‘Big Filter’ to put the right information in the right hands at the right moment. Social software is maturing into this role, but users need to catch up. Expect social tools to be the way to define work at the role level of any organization in the future.

Big Data challenges organizations to think across more functional silos than ever before and responsive process management and collaboration will keep the Big Data wave from swamping the boat.

7. Go back and do it again

When the connect-understand-anticipate-act cycle has run its course, the next step is to analyze the outcomes, find new and/or better patterns and improve the system’s function and output. It is a virtuous cycle of change that the best organizations never stop running.

Not just our opinion

It would be hard to oversell the need for the infrastructure outlined above. A recent IDC report lists data integration (#3) as the biggest Big Data IT challenge, not the size or speed required by the system. That same report lists defining business requirements (#1) as the top business challenge, not skills or tools.

Rymer says, “Big data must include complex event processing platforms (#5), elastic caching platforms (#2), and the various not-only SQL (NoSQL) databases. We have yet to see a one-size-fits-all suite or solution for all of these scenarios.”

Likewise, a Gartner Big Data report by analyst Douglas Laney, released just last week, kicks off its analysis section with a warning that organizations need to ensure infrastructure adequacy (#3). Coming out of the late 2000′s downturn, infrastructure adequacy isn’t by any means in place for current needs, much less the significantly expanded requirements for Big Data solutions.

O’Reilly’s Strata site wrote up an interview with an expert that stressed A new focus on user-friendly analysis (#4).

Gartner’s W. Roy Schulte argues that organizations need to Use Complex-Event Processing (#5) to Keep Up With Real-time Big Data in an article from August 2012.

In a report from last summer by Sanchit Gogia, Forrester warns that organizations overlook the key factors that support Big Data solutions like infrastructure (#3) and process (#6).

Forrester’s Clay Richardson uses the term “Big Process” when he observes that, “…even for organizations that might not be focused on big data quite yet, there is still the need to begin thinking from a big process perspective to better understand the relationships and impacts between operational data and business process performance.” (#6). Clay’s blog is appropriately titled, Big Data Ain’t Worth Diddly Without Big Process.

The challenges aren’t a mystery at this point in the Big Data hype cycle. The solutions aren’t either if you take a look at the market analysis and the success stories we’ve mentioned. Big Data, beyond being a questionable term, has real value for those who understand the nuances of working with data that has volume, velocity, variety and volatility.

We’ve included a great deal of information here and we welcome your comments.

—————-

Notes

1. Detractors are always easy to find…here are but three of the more famous ones:

Airplanes are interesting toys but of no military value.” - Marechal Ferdinand Foch, World War I French General

I think there is a world market for maybe five computers.” - Thomas Watson, chairman of IBM, 1943

There is no reason anyone would want a computer in their home.” - Ken Olson, president, chairman and founder of Digital Equipment Corp., 1977

Tags: , , , , ,

21 Responses to “Big Data must not be an elephant riding a bicycle”

  1. John
    October 29, 2012 at 11:55 am #

    Excellent read, thanks for sharing Chris. Big Data is like Web 2.0 a few years back. Everybody had to have it, yet nobody really had a commonly accepted definition of what it was, how to use it or what it meant to the business.

    • October 30, 2012 at 5:51 pm #

      Absolutely. We have lots of people coming to us asking for how we can solve Big Data questions, but it takes a step back to what needs to be achieved. Is it customer experience? Personalization and loyalty? Recommendations? Getting to a Big Data solution takes an holistic view and a broad approach.

      Thanks for your comment.

    • February 6, 2013 at 10:08 am #

      Metaphors unlimited…

      I just love this quote from Dan Ariely:

      Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.

      In a word, brilliant.

  2. November 10, 2012 at 10:24 pm #

    Great read. I look forward to the conversation changing in 2013 from how to get a handle on big data to what problems can we envision. I expect a lot of the problems with implementing solutions that are elastic (able to easily scale in real time), heterogeneous (combining nearly any computing resource, both local and distributed), simplified optimization (100% parallel computing with efficiently designed parallel code broadly available as services), stabilized (dramatic elimination of coding defects), fast and easy to implement (human language dialog with machines that generate the majority of code) and much more. If you’re curious, check out our vision at TxFormative.com.

    • November 10, 2012 at 10:30 pm #

      Ken, very thoughtful response and thank you. Would you be willing to guest post on your digital student project or on this topic? (or both).

      • November 10, 2012 at 11:15 pm #

        Hi Chris,

        I responded to your email directly before I saw your reply here. Glad to share my views or shall I more accurately say, my vision on the upcoming revolution in computing. I call 2013 “The Year of the Super Apps.”

        We’re well on our way to being able to demo such an app. I want a vigorous conversation to quickly follow our unveiling.

        What’s the next step?

  3. November 13, 2012 at 2:40 am #

    An insightful article and as a data old-timer it resonates clearly. You would think that with all the noise about Big Data that ‘data’ had just been invented. I want to believe that organizations with a history of dealing with data (predominantly Sql databases) do understand the issues far better than the various Hadoop vendors do. Significant insights or use can be gained from any size of data - small, medium, large or Big - using modern AI/machine learning methods.

    One nit-pick with the article is that ‘User-friendly analytics’ is too narrow or should be split between analytics and predictive computing. People (well, IBM mainly) tend to lump analytics and machine learning together and maybe it makes sense but I don’t believe it does because it puts into the same bucket the old way of performing analytics with new ways.

    Lastly, agree with your earlier comment about the digital customer experience, personalization, loyalty and recommendations as that is what we are addressing. Happy to provide a link to our site on request or by email.

    • November 13, 2012 at 4:28 am #

      Dinesh, thank you for your thoughtful response. I agree that insight comes from any size of data. More data usually makes prediction more accurate, but not always.

      Point taken on the user-friendly analytics. There are applications that create insight but aren’t necessarily predictive. I work extensively with event processing, which is very connected to the ‘second half’ of analytics.

      Thanks again.

  4. Tom Antony
    March 31, 2013 at 4:53 am #

    Good day, an excellent read and insightful and wonder how I did not come across this earlier.

    Having managed a “few” of the BIG DATA projects across industries/verticals, in my view the points 3 (Infrastructure Pipe) and 1 (Use case clarity) are critical in evisioning a solution to solve a business problem, while the others are also necessary. I would also submit that customer maturity to BIG DATA in addtion to a little patience is also important.

    • March 31, 2013 at 8:32 am #

      Thanks, Tom. There is definitely room for customer maturity and I would suggest that projects are being kicked off without clear understanding of the desired outcome. Thanks for the comment.

Trackbacks/Pingbacks

  1. Big Data comes in three flavors | Successful Workplace - October 30, 2012

    [...] Strong words? Not really. The term Big Data has been grossly oversimplified by media looking to give complex ideas a simple name and vendors wanting to sell one-size-fits-all solutions. The truth is very apparent that Big Data takes an infrastructure village. [...]

  2. The data gold rush is officially on | Successful Workplace - November 3, 2012

    [...] significantly and we now manage onsite and offsite data, upstream and downstream feeds, and the ecosystem that makes it valuable. For added fun, real-time streams and in-memory data impact data’s [...]

  3. Data is not information and information is not insight | Successful Workplace - November 25, 2012

    [...] Looking at Hadoop as the key to Big Data makes little sense for the vast majority of companies that don’t need to make the investment that Google, Facebook and others have made. The ecosystem of Big Data is the key to not having an elephant riding a bicycle. [...]

  4. 5 Steps to Pragmatic Data …er… Big Data | My missives - January 6, 2013

    [...] Taylor address this very clearly (“Big Data must not be an elephant riding a bicycle“) - viz. One has to address the entire spectrum to get value [...]

  5. 5 Steps to Pragmatic Data …er… Big Data | My missives - January 6, 2013

    [...] Taylor addresses this very clearly (“Big Data must not be an elephant riding a bicycle“) - viz. One has to address the entire spectrum to get value [...]

  6. Using “Perishable” information in Big Data Sets - March 5, 2013

    [...] Secondly, Chris Taylor focused on the key question on how does Big Data add value in his article “Big Data must not be an Elephant Riding a Bicycle”. In addition to providing a number of steps to avoiding being “crushed by the elephant riding [...]

  7. Hadoop ages quickly if it isn’t real-time | Successful Workplace - March 7, 2013

    [...] said that Hadoop was the first to offer a way to harness the power of enormous data sets but warned that it lacks the wherewithal to enable real-time decision making. We recognized that any [...]

  8. Gaining Insights from Perishable Data | Efficienarta - May 17, 2013

    […] Secondly, Chris Taylor focused on the key question on how does Big Data add value in his article “Big Data must not be an Elephant Riding a Bicycle”. In addition to providing a number of steps to avoiding being “crushed by the elephant riding […]

  9. Are we stupid? Of course the NSA crunchs our call data. | Successful Workplace - June 8, 2013

    […] reminds me of a talk I heard FedEx give at TUCON two years ago. The speaker said that the data about the package is more important than the package […]

  10. Is Hadoop to slow because it isn’t real-time? No! - Smarter Industry - July 8, 2013

    […] was the first to offer a way to harness the power of enormous data sets but warned that it lacks the wherewithal to enable real-time decision making. We recognized that any […]

Leave a Reply