We’re going to be hearing more in the coming months and years about machine learning and how it works alongside Big Data as a way to ‘turn the corner’ on the challenges of data’s volume, velocity and variety. Often called optimization or normalization, machine learning allows a computer to ‘learn’ a better model for solving a problem by exposure to known data sets, with the expectation that it’s processing power can find trends and occurrences in new and unseen datasets. The more involved humans are in the process, the more machine learning is called ‘supervised’ and the less, ‘unsupervised.’
It’s advantages are simple:
- It performs more complex analysis than humans in many cases
- It can be faster and more real-time than batch-driven analysis cycles
- Through automation, it can cycle through answers on the fly, while new data is being generated
Big Data Republic offers a good, succinct definition of machine learning in the form of a video store example:
By using machine learning technology, customers to Tobias’s online movie store get a more personalized, evolving service. Based on pages and products viewed, a customer to the site is presented with potential films he or she might like to purchase. This is based on the machine learning engine spotting correlations in the data of customers with similar demographics who have viewed similar pages, and recommending potential purchases from their purchase history.
As this is an automated system that “learns,” Tobias doesn’t need to be constantly tweaking the algorithms, and the machine learning tool continues to learn, so when a new natural purchasing trend emerges among customers, it makes recommendations based on having recognized these new patterns.
As the world gets faster, the need to find faster solutions despite the growth and complexity of data leads us toward machine learning. It offers the opportunity, depending on the use case and desired outcome, to reduce the amount of analysis done by expensive and scarce data scientists. Expect to hear more.
Good morning Jeanne - an important topic but this isn’t quite how it works. There are couple items here that need to be corrected
First - The following isn’t accurate “The more involved humans are in the process, the more machine learning is called ‘supervised’ and the less, ‘unsupervised.” Supervised vs unsupervised learning deals with how the model is initially trained and how the inputs and outputs are guided - people are involved in either approach.
Second - Machine Learning isn’t usually deployed to ” to reduce the amount of analysis done by expensive and scarce data scientists.” Rather, it is a tool used by data scientists. As I had a chat (online) with Chris about the market needs to understand that these techniques require quite a bit of human work before ever being run, as well as work post run to validate the results and refine the approach.
Tom, those are esoteric arguments. Humans guide those inputs and outputs, typically and whether people are involved in either approach. It is a spectrum, as I described it.
Whether people are necessary before work being run wasn’t the point. Anything that updates itself based on previous results lessens the work on continuing analysis.
These and other pieces I’ve written were vetted by machine learning experts. They would know, wouldn’t they?