Big data piles up so high it reaches the cloud

Big Data reaches the cloud“Big data is any data that when you pile it up reaches into the Cloud.” This was the opening statement for Jack Norris, CMO of MapR at the Cloud Connect Conference in Santa Clara today. He was paraphrasing the analysts but it was the ideal frame up for the Big Data Track at a Cloud conference.

A new paradigm

According to Norris, big data and Cloud are a paradigm shift and an architectural change that involves putting data and computing power together as a massive processing unit.

Norris drilled into this by describing the challenge facing today’s enterprise: Separating data and computing as data grows is taking longer and longer. More and more, organizations need to

  • Process more quickly - Things are moving faster every day and competitive businesses need to keep up
  • Combine multiple data sources - Organizations need to blend data to gain insights. That data can’t be stored in one place and can even be outside the organization (such as in the cloud)
  • Expand analysis - There are limits on traditional systems and organizations need to go beyond the traditional SQL-based analysis of the past

Apache ProjectsThese needs led organizations like Google and others to grow their own tools that are now a big data ecosystem. Norris used the picture at the right to describe this ecosystem.

Hadoop in the Cloud

The most interesting part of of the big data story for this setting was how Hadoop and big data are used in the Cloud. For many companies going this direction, Hadoop in the cloud is a very flexible infrastructure. While we often hear about performance questions with Cloud, Norris brought up the current MinuteSort record of 1.5 TB, set by Google working with MapR as proof that Cloud performance is less and less of a question.

It takes more data

Where Norris made some of his strongest points came with his contention that greater data is now filling in the gaps where we used to use complex algorithms. Many things, like human behavior, have been deemed too complex to understand completely. Norris pointed out using uses cases like fraud detection, flu trends and the Netflix recommendation engine to show that even the most complex behavior becomes predictable when enough data comes to the table.

Simple algorithms

If this concept is true, than the ability to add additional data in cost-effective ways becomes one of the most important enterprise strategies available. It’s easy to see where the Cloud plays a critical role in providing a place for that data to sought, reached, and incorporated efficiently.

Norris provided the following examples of where Hadoop is being used in the Cloud:

  • Targeted advertising/clickstream analysis
  • Security for anti-virus, fraud detection, and image recognition
  • Pattern matching/recommendations
  • Data warehousing/BI
  • Bio-informatics like genome analysis
  • Financial simulation like Monte Carlo
  • File processing like image resizing and video encoding
  • Web indexing

Big Data lessons

This was a very comprehensive talk and drew a sizable crowd for the last day of the event. Norris closed with “Big Data Lessons from the Cloud”:

  • Big Data requires a new approach
  • Hadoop is a paradigm shift
  • Easy to get started with Hadoop in the Cloud
  • Scale clusters up and down in the Cloud
  • Only pay for what you use
  • Expand data for analysis
  • Combine data sources
  • New application from new data source
  • New analytics
  • Wide variety of applications appropriate for Hadoop

Norris can be found on Twitter at @NorrisJack.

Tags: , ,

No Responses to “Big data piles up so high it reaches the cloud”

  1. DataH
    April 8, 2013 at 12:04 pm #

    Very insightful article Chris. Cloud computing is driving a new wave of innovation in the area of big data. The open source solution from HPCC Systems provides a single platform that is easy to install, manage and code. Designed by data scientists, HPCC Systems is a data intensive supercomputer that has evolved for more than a decade, with enterprise customers who need to process large volumes of data in a 24/7 environment. Its Thor Data Refinery Cluster, which is responsible for ingesting vast amounts of data, transforming, linking and indexing that data, and its Roxie Data Delivery Cluster are now offered on the Amazon Web Services (AWS) platform. Taking advantage of HPCC Systems in the cloud provides a powerful combination designed to make Big Data Analytics computing easier for developers and can be launched and configured with the click of a button through their Instant Cloud solution. More at http://hpccsystems.com

Leave a Reply