What’s the Best Way to Manage Big Data for Healthcare: Batch vs. Stream Processing?

Big Data for HealthcareThe healthcare industry is on the verge of a “big data” revolution. By 2019, data levels will reach 40 zetbitabytes, triple what they are today. Ninety percent of the data in the world has been created in the last two years alone.

So what does big data mean for healthcare?

With this massive influx of patient and physician data from EHRs, surveys, personal data sources, and more, healthcare institutions need to find effective ways to leverage the information available to them.

Historically, the healthcare industry has lagged in technology adoption. In fact, many systems are still relying on large data processing in the form of batch processing, which comes with certain limitations, including the inability to process data quickly. Stream processing, on the other hand, has the ability to process data in near real time, giving health systems a much higher level of accuracy with time-sensitive data processing.

Let’s take a look at batch vs. stream processing and explore the implications for the healthcare industry:

What Is Batch Processing?

Batch processing is an efficient way of processing high volumes of data where a group of transactions is collected over a period of time. Typically, these systems are structured around complex event processing (CEP), which uses event-by-event processing and aggregation.

Big data is about volume, velocity, and variety. Batch processing addresses volume and variety in the big data architecture. The masses of structured and semi-structured historical data are typically stored in Hadoop with a batch processing system.

One of the primary challenges of batch processing is, however, the latency of the computation. In other words, data that comes in big batches and is cleansed through a batch processing system can be several hours, days, or sometimes weeks to a month old by the time it reaches healthcare professionals.

The end result is oftentimes-outdated data that comes as a result of “too late” architecture.

What Is Stream Processing?

Stream processing is model that computes one data element, or a small window of recent data at one time. The computations are then generally in near real time, or take seconds to minutes at most. Technology capable of stream processing is able to produce near real-time data because it slows data through the system and processes it as it comes through. Within this type of processing system, there is a higher level of accuracy, which is significant for time-sensitive data.

Contrary to batch processing, stream processing analyzes and acts on real-time data using “continuous queries.

“Essential to stream processing is streaming analytics, or the ability to continuously calculate mathematical or statistical analytics on the fly within the stream,” notes one article on stream processing. “[Such] solutions are designed to handle high volume in real time with a scalable, highly available and fault tolerant architecture. This enables analysis of data in motion.”

Stream Processing in Healthcare

patient-and-physicianAs the healthcare industry increasingly moves toward a value-based model, there’s more need for near real-time decision making to personalize patient marketing campaigns, improve patient outcomes, and create greater patient engagement.

This is where stream-processing architecture comes into play.

When everything is connected, from administrative perspective to historical data and output in near real-time data points, health institutions have the opportunity to deepen the patient and physician connections and enhance their experiences.

Near real-time data processing allows healthcare systems to make better decisions based on more robust and better quality data. As a result, they can take immediate action on data analysis, which can be significant to the health of a patient as well as their experience with a hospital or health institution.

Near real-time output has the additional benefit of being more scalable (as a result of speeding up data processing). What this means is that health systems have the ability to process data until they feel the quality of the data is most accurate. What’s more, if there is a data quality issue, the root cause can be remediated much faster with stream processing than with a batch processing system.

Batch vs. Stream Processing

Though stream processing has its benefits, there’s room for both data processing methods in the healthcare industry. Batch processing is often a less complex and sometimes more cost effective solution, and can be applicable for certain bulk data processing needs.

As outlined above, however, there is a lag time with batch processing systems whereas stream processing is able to process more data at a much faster pace. By analyzing data in near real time, stream processing solutions implicitly allow for higher quality data that can inform near real-time decision making.

Final Thoughts

As the healthcare industry continues to evolve in the years to come, one of the fundamental challenges will be how health systems make effective use of the mountains of data at their disposal. While there is no “one-size-fits-all” solution that will be the right fit for every organization, it’s important to consider the advantages and disadvantages of batch and stream processing systems to determine the right approach.

How is your organization harnessing the power of big data for healthcare?

Digital Marketing and Big Data for Healthcare