The amount of data being digitally collected and stored is vast and expanding rapidly. As a result, the science of data management and analysis is also advancing to enable organizations to convert this vast resource into information and knowledge that helps patients before they become even slightly ill. Computer scientists have invented the term big data to describe this evolving technology. Big data has been successfully used in astronomy, retail sales, search engines and politics. However, the most important application of big data still awaits to be fully unraveled and exploited for the best of humankind and is in the realm of healthcare.

By 2025, global estimates suggest 463 exabytes of data will be created each day. While it is difficult to picture the overall volume of data in the world, one visual is that with 44 zettabytes of data in the current digital universe represents 40 times more bytes than stars in the observable universe. While some of that new data doesn’t need to be stored long term, experts predict that about 7.5 ZB (zettabytes = 1021 bytes) of data will need a long-term home in 2025, up from about 1.1 ZB in 2019.  This is a 581% increase.

The current availability of qualitative and quantitative health data is unprecedented in the history of mankind (i.e. today it equals more than 2.5 quintillion bytes/day) and present estimates suggest that a single patient generates close to 80 megabytes each year in imaging and electronic medical record (EMR) data. The overall volume of all electronic data doubles every two years. However, healthcare data outperforms this estimate and is expected to be the highest data growth business sector with a compound annual growth rate (CAGR) of 36 percent through 2025.

Plus, with additional data demands such as tracking major health outbreaks, staying abreast of the latest information on treatments and vaccines, tracking patient diagnoses and treatments across multiple providers and supporting the growth in telemedicine, the role of health data has never been more critical.

Healthcare organizations are faced with managing this tremendous amount of patient data along with an increased demand for real-time access to complete patient records. In conjunction, they must streamline their application portfolios to decommission legacy applications and keep protected health data stored and accessible for compliance, research and reporting.

Very soon technology-computing, connectivity and storage capacity will enable further availability of health data exponentially in two ways: increasing computerization and quantifying self and mobile-health. Both these advances have potential issues: increasing computerization of medical processes and procedures would need to provide means to link data to information and to knowledge in a virtuous interexchange while quantifying self and mobile-health, (i.e. the possibility for each person, healthy or not, to measure their medical condition through smartphones, wearable sensors and any other digitally reducible phenotypes) by becoming the primary source of health data would need to be reconciled with data access, usage, portability, privacy and – in Europe – with GDPR compliance.

Clinical research, as patient-oriented research conducted with human subjects (or with any biomarker/material of human origin that can be linked with certainty to a unique individual) represents a broad scope of research, generating increasing amounts of data, that needs technologies and transparent rules to allow sharing in meaningful way inside scientific and industry community.

One of the primary tasks of telemedicine involves connecting patients and doctors beyond the clinic. However, this communication has been expanded, with the involvement of social networks, to new levels of social interaction. This new feature has opened new possibilities of patient-to-patient communication regarding health beyond the traditional doctor-to-patient paradigm. One-fourth of patients with chronic diseases, such as diabetes, cancer, and heart conditions, are now using social network to share experiences with other patients with similar conditions, thereby providing another potential source of big data.

In addition to biological information, geolocation and social apps provide an additional feature to understand the behaviors and social demographics of patients, while avoiding resource intensive and expensive studies of large statistical sampling. This advantage has already been exploited by several epidemiological studies in areas, such as influenza outbreaks, collective dynamics of smoking, and the misuse of antibiotics. Text messages and posts on online social networks are also a valuable source of health information, e.g., for the better management of mental health. Compared to traditional methods, such as surveys, fluctuations and regulation of emotions, thoughts and behaviors analyzed over social network platforms, such as Twitter, offer new opportunities for the real-time analysis of expressed mood and its context. For example, when validating against known patterns of variation in mood, the 2.73 × 109 emotional tweets collected over a 12- week period in a study reported claimed to find some correlation between emotion tweets and global health estimates from the World Health Organization on anxiety and suicide rates. Social media and internet searches can also be combined with environmental data, such as air quality data, to predict the sudden increase of asthma-related emergency visits.