Big data, crowdsourcing and machine learning tackle Parkinson’s

Beth Lee, 62, shows a bracelet identifying her as a Parkinson's patient.Parkinson’s is a very tough disease to fight. People suffering from the disease often have significant tremors that keep them from being able to create accurate records of their daily challenges. Without this information, doctors are unable to fine tune drug dosages and other treatment regimens that can significantly improve the lives of sufferers.

It was a perfect catch-22 situation until recently, when the Michael J. Fox Foundation announced that LIONsolver, a company specializing in machine learning software, was able to differentiate Parkinson’s patients from healthy individuals and to also show the trend in symptoms of the disease over time.

Crowdsourcing Big Data analysis

To set up the competition, the Foundation worked with Kaggle, an organization that specializes in crowdsourced big data analysis competitions. The use of crowdsourcing as a way to get to the heart of very difficult Big Data problems works by allowing people the world over from a myriad of backgrounds and with diverse experiences to devote time on personally chosen challenges where they can bring the most value. It’s a genius idea for bringing some of the scarcest resources together with the most intractable problems.

Machine learning a Big Data solution

To create a solution using machine learning, the winning group from LIONsolver had to consult with doctors to first figure out what symptoms look like in data form. From that information, they were able to create a training set of data that represented known disease symptoms. Using that training set combined with data streamed from mobile apps worn by patients, LIONsolver’s software was able to learn any individual patient’s particular patterns and provide doctors with highly accurate information that is crucial to appropriate treatment.

Drake Pruitt, CEO of LIONsolver, explains the long-term benefit of this discovery this way:

We see this opportunity as part of an overall trend in healthcare toward applying forecasting and prediction to health record and wellness data, in order to help doctors and their patients achieve healthier lives with manageable healthcare costs. In short: More and more mobile devices are linking with monitoring services to analyze a growing amount of data. This analysis will provide a unique opportunity to take better care of patients, and to teach patients to take better care of themselves.

The takeaways

One of the biggest takeaways from this story is the evidence that Big Data isn’t just hype. Enormous opportunities exist for passive data collection through smartphones and other sensors. We’ve reached a point where the cost of data collection is significantly lower and in this case, was essentially an app running on a common device.

A second takeaway is the power of crowdsourced solutions in areas where resources like data scientists are hard to find and hire through conventional means.

The third significant lesson is the value of machine learning alongside Big Data. Classic data mining skills aren’t effective for every problem and machines learning can be used to find patterns in data that humans can’t. Machines don’t feel the same incentives that scientists feel to prove their own theories and for this reason alone, can be far more effective.

Tags: , , , ,

No Responses to “Big data, crowdsourcing and machine learning tackle Parkinson’s”

  1. August 1, 2013 at 5:04 am #

    Good post but need to be very careful with the following assertion (common mistake);

    “Machines don’t feel the same incentives that scientists feel to prove their own theories and for this reason alone, can be far more effective.”

    There is still a LOT of human judgement involved in machine learning. What data sets, what methods, dealing with overfitting issues etc all require human experience and that introduces bias back into the equation.

  2. KP Nuts
    August 2, 2013 at 7:26 am #

    An interesting article that highlights a positive step forward in the diagnosis and treatment of Parkinson’s, but only to a point. I am actively engaged in using data-driven methods to improve the epidemiology for Parkinson’s Disease and I am a little uncomfortable with some of the statements and story telling in this blog.

    In this article and in a subsequent exchange on a linkedin group (http://linkd.in/13GCFlt) a number of points were raised that need to be reset, especially for anyone reading the blog or the group exchange that has an interest in Parkinson’s but doesn’t have experience or an accurate understanding of the data science being referred to. The linkedin discussion was closed so I wasn’t able to comment there and I’ve noticed that a reference to this blog has been tweeted too.

    Let’s first take subject of Machine Learning (ML) not ‘feeling’ the same incentives as scientists. Interesting to anthropomorphise an algorithm, but lets stick with it. Actually, let’s not. ML is a valuable method to distinguish between a number of pathologies for known conditions. No distinction has been made in the blog regarding supervised (SML) or unsupervised (UsML). Given the 100% success rate reported by LIONsolver I’d respectfully suggest that a SML algo was used. In which case the training data used to establish the known conditions,and the subsequent interpretation and further refinement of the model requires significant human intervention - and creates opportunity for more than an ‘up front’ bias that the blog’s author accepts. As a result the algorithm is an operational process rather than a cognitive one, despite the use of the term ‘learning’. If that’s the case, the argument about intuition, feeling, bias, etc, was pointless. UsML operates differently to SML but still needs significant human intervention and therefore the opportunity for bias is still available and the disagreement about feeling, bias, etc, was pointless.

    ML, as a generic family of algorithms, is often powerful but is constrained by dependency on existing human knowledge and understanding and the intended purpose that the method is being used to discover, describe and predict - it cannot create a new diagnostic pattern or a new hypothesis autonomously - not even unsupervised algo’s. UsML can find patterns in data that are not labelled (known conditions) but it is often limited by the variables it works against and won’t often produce a significant and interesting outcome unless the data supports density estimation in line with he desired target outcome (you’ve got diabetes, Parkinson’s, etc). Human intuition and bias cannot be removed from either ML process - but do not confuse that with a God Complex (as defined by Archie Cochrane) whereby some scientists, statisticians, economists, politicians, etc, fit the data to their ‘theory’ because they know best. However, anyone feeling that they need to prove their own theory can (and have) used ML to do just that.

    I’ll put it really bluntly - ML is not bias immune at any stage (before, during and after), whether supervised or unsupervised.

    LIONsolver leaned heavily on the information that Dr. Michele Tagliati, Professor and Vice-Chairman in the Department of Neurology and Director of Movement Disorder at the Cedars-Sinai Medical Center and her team provided into the study - strongly suggests supervised methods were used to diagnose and prescribe, perhaps after unsupervised methods were used to classify and extract features from input data. Without that SME knowledge, the study (what LIONsolver called it) would have been of questionable value.

    LIONsolver’s own write up clearly shows that they have done nothing more than show ML on the competition data - which only reflects known diagnoses and progression. Early indications show that this approach could have significant value to health service delivery and ultimately to PD sufferers. But there is a lot more to test. That’s not what has been implied by the blog above and I have to say that in some sections it reads more like one of the myriad marketing pieces for a vendor hyping the big data analytics agenda. And there’s no indication of the big data element in the story above - what’s the universe size? The crowdsourcing element is not big data. The Ml element is not big data.

    By the way, sometimes ‘bias’ is needed to take a step forward. Sergey Brin is heavily engaged in another big data + analytics approach to Parkinson’s. But he has knowingly built bias into his research as his focus is on Parkinson’s sufferers that have a genetic marker or predisposition. Brin has the genetic marker. People that contract Parkinson’s but do not have a genetic basis (the vast majority) are excluded from that research. While there are a myriad genetic factors in this one field, expanding beyond the genetic linkage would be too big and too complex for big data, data mining and ML combined. Should everyone with a smartphone use the app? If so we then run into the similarity between PD and other neuro-degenerative diseases where ML would be far less accurate to discriminate between symptoms that could point to PD or MS for example, and there are so many more conditions that express similar symptoms at early and mid-stages that pretty soon the signals become less distinct. So in this case, to get any chance of finding an effective outcome, bias is good to reduce the problem area.

    The blog doesn’t cover if the study is across the whole PD universe or a sub-universe like Brin has targeted. I can believe it would be the latter (conforms to a known probability distribution, ideal for SML, and would encompass a small volume of telemetry, can generate perfect 100% predictive results). Those that have a genetic pathogenesis for Parkinson’s could be more easily and accurately diagnosed through an SML algorithm prior to the onset of any motor symptoms and ensure better prescription strategies by monitoring symptom control. And it would be a containable universe as it would be limited to people who take the DNA test, find out they have a genetic marker, use the smartphone app and wait for the prognosis to change. Brin is also working with the MJF Foundation, so perhaps that has already happened.

    ML is compared to classic Data Mining (DM) in this way: “Classic data mining skills aren’t effective for every problem and machines learning can be used to find patterns in data that humans can’t”. This is a a very poor comparison. Data mining is a huge consumer of unsupervised machine learning algorithms. In fact let me set the record straight. Unsupervised Machine Learning draws heavily on Data Mining methods especially in preparation of the data. It’s a bit like saying pork has nothing in common with bacon.

    I generally don’t comment on misleading or ambiguous propositions in short blogs or pieces like this. It is common that so many people commenting in this space don’t know their supervised backside from their unsupervised elbow. Who has the time?

    Whilst I understand the redux challenge to compress facts and explanations into a short article its important to reset the perspective being communicated when the subject matter is more than someone’s view of ‘how analytics works’ from a very distant position.

    Parkinson’s Disease is a far wider ranging area than this specific area of study and this article covers and a lot more thought should have been given to the ‘people’ behind and around the subject of Parkinson’s disease and not the app people who are really at the centre of the piece.

    Or perhaps one day we’ll have “ML bloggers” who won’t have the incentives that human bloggers feel to prove their own industry hype? But who will unsupervise them?

    • August 2, 2013 at 8:08 am #

      Thank you for your reply, KP Nuts (a delicious snack if there ever was one). I appreciate the time you took to write out your response and I assure you there’s need to prove any industry hype. The goal instead was to highlight a great example of crowdsourcing and finding the right expertise to attack a very difficult problem that causes people to suffer. Again, thanks for the time and effort in adding to the discussion…that’s the best outcome possible.

  3. August 2, 2013 at 4:46 pm #

    Kaggle is an impressive company. And this also reminds me of the Geoffrey Beene Alzheimers prize (http://geoffreybeenechallenge.org), offering a $100,000 prize to the team who can identify early indicators of Alzheimers. Also the X Prize’s Genomic Prize.

    If we can quantify the outcomes which we want, why don’t we crowdfund the prize awards for these outcomes, in a “I’ll pay $X for Outcome Y” fashion? This way we pay only for performance, we incentivize cost-effectiveness, and we harness the crowd for unorthodox solutions. More thoughts at prizl.org and prizl.wordpress.com.

Trackbacks/Pingbacks

  1. Suivre les oiseaux à la trace dans le monde entier « En Pratique « Business-analytics-info.fr - October 21, 2013

    […] un article (en anglais) sur un autre exemple de crowdsourcing, dans le secteur de la santé […]

Leave a Reply