We hear the term “unstructured data” often. It’s brought up as the enormous challenge of big data and often cited as the reason why traditional relational databases don’t meet the needs of Big Data. But that conversation doesn’t adequately describe the challenge organization’s face with unstructured data.
To get your head around unstructured data, you have to consider the history of data itself. When we first started digitizing our world in the 20th century, we first went after the low hanging fruit of transactional data…accounting. It was an quick win to transfer spreadsheets of information in neat columns and rows.
Decades later we’re digitizing everything in sight and sharing it across the enterprise, our partners and our personal connections. Despite everything that we’ve accomplished there is still an enormous amount of enterprise information that sits in text documents and presentations, graphics, email, audio, video, web pages and in various office software. Keep this in mind…it isn’t that unstructured data lacks any structure…it’s that unstructured data doesn’t fit the enterprise relational data model.
Even worse, much of our enterprise process exists as unstructured data, in the heads of workers and lacking any systematic approach for capture, management, communication, measurement and improvement. When the work activities themselves are unstructured, the day to day behavior of workers lacks cohesiveness and efficiency. But I digress. Let’s get back to data itself.
Why haven’t we fixed this?
What keeps us from successfully managing unstructured data? A few things:
- A lack of tools that easily manage unstructured data. Tools need to provide efficient text parsing and analytics, taxonomy and metadata management.
- Difficulty integrating unstructured data with existing information systems. The two are often seen as apples and oranges when it comes to analytics and decision making.
- Shortage of skills in existing staff
- Missing sense of urgency for managing unstructured data
Despite our best efforts to corral the unstructured beast, this kind of data continues to grow larger and presents a real problem for organizations that want to automate and improve their ability to understand their business, anticipate what’s coming and act quickly on risk and opportunity. There are certainly tools that are maturing and providing the beginnings of a solution. The challenge, however, will be in finding the urgency and getting our organizations to see the value of getting data out of its various hiding places and into a place that it can be used and valued.
Chris,
You didn’t digress. You approached the root cause and returned to the symptom. The enterprise model that everyone fell to their knees to worship has been keeping its zealots on their knees in a rising tide of unstructured data. The reason it proliferates is because ERP systems never fit the process the people need to do their actual job. Since they get paid to do their job they have to get results. If the system doesn’t support the job unstructured data abounds. You are right about the common misunderstanding about what unstructured means. Until business gets off its knees or other southern extremities and realizes the emperor has no clothes the more catch phrases will be created to distract from the real issue. You can’t Run your business on a system that can’t even run the simple processes in your company that happen thousands of times per year, month or day. It’s either negligence by the customers to not realize this or abuse by the vendors and systems integrators to not acknowledge it.
Thanks for the comments. You feel strongly and expressed it well.
First, Steve is right in saying that unstructured data abounds because it’s about people doing their jobs. It’s not really about the form data comes in but about building systems to perform the tasks that the business requires. Companies that have embraced the notion that software is eating the world are actually hiring software engineers to build their business instead of hiring more workers to act as cogs in the machine.
Second, us Chris Taylor’s need to stick together, so thanks for the post
Thanks, Chris Taylor. We’re in an age where more workers can actually impede getting work done. This idea was brought to software engineering years ago by The Mythical Man Month but was a communication challenge more than anything else. Now, the challenge is that more workers, poorly aligned, creates more unstructured data, which is a fundamental problem for the organization. Great comments!
Unstructured data is the result of poorly documented and executed processes. To clean up unstructured data you must first clean up your processes. You didn’t digress. As Steve said, you identified the root cause of the problem.
I agree with Steve but I’ll rephrase the rationale in my own words: the reason there is so much unstructured data floating around is that it hasn’t been economically feasible to focus on that problem. Believe it or not, implementing an ERP system was intended to reduce the reliance on “unstructured data” for financial and supply chain processes. It has worked. To say that ERP systems don’t solve the problem is to forget what problem they originally intended to solve.
Just because there is an ecosystem of inefficiency around the ERP system doesn’t mean that the ERP system is not doing its job - it means that other processes haven’t been updated to interact with ERP systems effectively, resulting in people having to step in and make “point” solutions.
Again, the point is that unstructured data is the result of unstructured processes. To fix the unstructured data problem, fix the process problem.
Thanks, John, for that comment. While I agree that ERP’s were meant to structure the data (and they do), the first generation became too popular, too fast and because of that, failed to evolve. Software evolved to SaaS long before ERP’s considered it, which is why Workday is gaining so much traction over SAP.
As someone who has worked in the healthcare documentation field for nearly 15 years, this is a subject near and dear to my heart. The vast majority of recorded information in our industry-dictated or in text form-is in the form of unstructured data, even with the advent of EMRs which purported to capture ALL patient data in structured form. The simple reason is because it is more time-efficient for care providers to dictate a narrative patient encounter report than it is for them to point-and-click or hunt-and-peck. The good news is that advances in Natural Language Processing/Understanding have made it possible for unstructured, narrative patient encounter records to be analyzed and converted to structured data for use in a number of individual and population-based healthcare uses. Unstructured data is no longer a barrier to meaningful use, in either the “official” or “practical” sense. Our company offers a unified clinical documentation and analysis platform which can capture data at any point in the medical record lifecycle-from initial encounter to transcription, Clinical Documentation Improvement (CDI), Computer-Assisted Coding (CAC) for both ICD-9 and ICD-10, and powerful analytics to uncover Core Measures and other quality metrics. The best news of all: it’s NOT one of the “big dogs” of the industry, thus allowing us to provide much more nimble consultation, implementation, and competitive pricing.
Sorry for the plug, but the fact remains that we have the tools, now, to solve the unstructured data dilemma.
Jay, we don’t mind a plug when the commenter leaves behind other good information. Thanks.
I also agree with the assessment that the unstructured work activites are at the root of the issue, which is why Business Process Management should be at the root of the solution. By creating good structured business processes and capturing the appropriate data along the way, we can (1) minimize the unstructured data that is collected in favor of discrete data and (2) make more sense out of the unstructured data that remains. Tools are already available for item 1 and are rapidly improving for item 2.
Thanks, Kevin. Absolutely.
Humans think in narrative. We have a rich history of telling stories, not creating books, songs, movies, and stone tablets with bullet points. Doctors using EHRs have the ability to put in many, many “bullet points” of data that can be collected, collated, and analyzed, and may be very helpful to others not working at the patient’s bedside, but at the end of 5 pages of data points printed from the patient’s EHR, they complain that they still don’t understand the patient’s “story”. Why did the last doctor make that particular decision? What’s different about this 75-year-old woman who fell down her stairs from the other woman who fell down the stairs (one was grieving her dead husband, didn’t take her meds, and collapsed. The other tripped over her grandchild’s toy). What seems superfluous to data enthusiasts is the nuance of the story that makes all the data points make sense.
Interesting discussion.
Patients are notoriously unstructured, and highly structured systems capture discrete data that is collected, but may miss seemingly unrelated symptoms that are unclear to the physician. A skilled MD studying the whole chart, including labs and other structured data, will then read the narrative and from all of that, and years of practice, will develop an idea about what is happening to the patient, and how to treat them. Over time, we learn more about disease and what was thought to be one kind of problem a decade ago might be diagnosed very differently now, so over-reliance on structured data could eliminate clues that we do not have the knowledge to sort out today. Since diagnosis is scientific and also a medical opinion, the narrative may hold the key to understanding a patient and is critical when a complex illness requires review by another practitioner.
Where EHR’s excel is in longitudinal collection, display and rearrangement of data, such as serial blood glucose measurements or blood pressure readings, or in collections of lab data. Having spent a good deal of time working with charts, I would like to see more emphasis on devising ways to customize views or collections of data (labs, schedules of treatments, output from blood glucose monitors, patient vitals etc.) that physicians could set up according to their specialty, to get to what they use most, more rapidly.