Healthcare Real-World Data Impacts on Biopharma AI Innovation: Trends and Insights

GE Healthcare

Medical technology concept. Telemedicine. Remote madicine.

The healthcare industry generated more than 2,314 exabytes[1] of real-world data (RWD) in 2020 alone.  Applying that to the real world, a single hospital environment contributes up to 40 petabytes of data per day.

Within this data, there are 26 individual data elements per patient. Furthermore, 90 percent of electronic medical records (EMR) are connected and accessible across the hospital environment. The data is there, it exists…and that makes the next few statistics confusing.

Biopharmaceutical (biopharma) companies are aware of the vast quantities of data generated in hospital environments. They do, however, lack a consistent means to acquire said data. This means that:

  • 80% of clinical trials do not meet their designated enrollment deadlines.[2]
  • 65% of BioPharma companies feel they lack access to the Real-World Data that makes analysis valuable.[3]
  • 80% plan to use, or already use artificial intelligence (AI) and big data approaches to improve research and development (R&D) performance[4]

This is troubling. Clinical trials are delayed by lackluster enrollment, and biopharma companies have no insight to guide their AI innovations. Additionally, with real-world data being the optimal fuel source for AI, how do the 80 percent begin using AI in the first place?

The result is limited capacity to distribute and integrate AI at the point of care, ultimately limiting the quality of patient care and patient outcomes.

Primary Challenges for Biopharma in Acquiring Real-World Data

Biopharma companies have found that discovering sources of accurate, consistent, and unique real-world data is no easy task. Patients are relying on the due diligence of biopharma to create safe, sensitive, and transformative AI diagnostic solutions for complex conditions. Without quality real-world data, due diligence becomes more difficult.

In terms of data governance, healthcare institutions traditionally govern data for primary use cases. Secondary use cases, such as with biopharma AI development or clinical trials, fall to the wayside due to a lack of governing capacity.

Another overarching obstacle is data regulations and laws which vary by country and jurisdiction. These regulations can limit or prevent access to valuable real-world data, complicating acquisition for biopharma and distribution by healthcare institutions.

Finally, there is data utilization. Clinical quality assurance (QA) processes such as Six Sigma assess acquired data for suitability and applicability. Now that the data is available, how does biopharma maximize its use to ensure maximal value is extracted from the data? This QA phase starts before the data is processed, making fear of data underutilization an ironic obstacle for biopharma.

Integration Challenges for Biopharma When Acquiring Real-World Data

 Another formidable challenge is the lack of data linking across systems. This could include healthcare metadata such as admission details by modality and condition, imaging details by machine and software, and lab results across hematology and histology. It is also important to consider historical and prospective datasets, and the rarity or complexity of the condition in question.

This metadata can be used to train AI algorithms or feed the AI during execution. Taking a step back, this is more a question of: is the data connected in a way that is useful for AI, versus useful for clinicians in a clinical environment?

In the software development world, this is eloquently known as “garbage in, garbage out.” The better targeted the data is to a specific condition, the better an AI application can target its diagnoses and treatment recommendations. This ultimately guides biopharma on their approach to design, deployment to market, and executive decision-making for medical AI solutions.

Three Experiences Shared by Healthcare Providers and Biopharma in Their AI Journey

Healthcare institutions and biopharma share three experiences.

First, of course, is real-world data and how to get the right data, anonymize the data, conduct simulations, and analyze overall data quality.

Then, we have AI algorithm deployment and design in clinical settings. Healthcare providers are concerned about compatibility, integration, and ongoing maintenance of the host IT systems. Biopharma is equally concerned about meeting these expectations and ensuring algorithmic consistency across diverse IT architectures and data environments.

The final area is actual deployment and usage of AI and real-world data in clinical settings. How will the AI be installed on servers, medical devices, and clinician computers? After the AI is installed, how can biopharma and healthcare providers collaborate to collect, learn from, and compliantly repurpose this data in pursuit of Precision Health?

The journey starts with the raw data held by healthcare institutions and progresses through transforming said data into something consumable. AI applications and tools merely leverage the resulting data. The most difficult part of the RWD journey is governance, legal access, and catering to secondary use cases. This applies whether the institution wants to externally monetize the data or acquire data for the internal development of AI algorithms. What is the market value of that data, and how complete is the dataset in terms of dictionary linking and data lineage?

This is critical with image data, where EMR categorization and linking answers important questions. Is this the right image for the right patient? What condition(s) does the patient have? Naturally, embedding this metadata within images puts patient privacy at risk, making these EMR processes vital for compliant real-world data storage. Without EMR categorization, the image is just that to an AI…an image. EMR categorization in this case could be viewed as contextual real-world data.

Technical Challenges in Acquiring Images and EMR Metadata for Specific Patient Cohorts

 Identifying rare diseases in specific patient cohorts is impossible without the right data. This lack of identification is also why there is such a stark lack of data on rare patient cohorts–a patient paradox, per se.

Every wearable data request, every algorithmic dataset used to feed or train the AI is unique to that scenario and what developers are trying to solve. Even when the algorithm and its claimed benefits change, the data sources often stay the same. What can differ is the EMR location for data, for example with wrist wearable systems. This impacts how clinicians and AI applications interact with those datasets, choosing whether to pull data from PACS, DICOM, and similar systems to fulfill unique requirements for each request.

The workflow is something radiologists and clinicians will notice. How are images captured, where are they sent, and how do I access them? How does the user interface, and augmentation of the interface by AI, impact productivity? Does the outcome change despite the same input, and vice versa?

The foundational consideration here for healthcare software AI is, of course, hardware compatibility. AI software may have particular hardware or software requirements, such as a specific operating system and specialized processors or accelerated/graphical processing units (APU and GPU). Is the AI’s performance and capacity to diagnose and help patients being squandered by subpar hardware or software, both at the server-level and with computers used by clinicians?

This technical viewpoint is important for biopharma AI companies in particular. If biopharma can understand the healthcare IT environment, workflows, and the people, working around existing strengths, flaws, and preferences, then AI can enhance and uplift healthcare capabilities without a significant change in persons, habits or equipment. The easier AI is to adopt, manage and use, the faster industry-wide adoption can occur; a key objective of GE Healthcare EdisonTM is the pursuit of Precision Health.

Biopharma Challenged with Disease Characterization Across Regulatory Areas

Across the world, healthcare is delivered differently. The European Medicines Agency (EMA), the Food and Drug Administration (FDA), the Israeli Ministry of Health; each regulatory body has its own framework, viewpoint and characterization metrics.

Biopharma must not only implement compliance across regulatory catchment areas, but across healthcare departments. Cardiology in the US may be treated differently to Europe, with different expectations and requirements at the regulatory level. Depending on the level of congruity between jurisdictions, this can help or hinder biopharma as it expands business across borders.

For effective and compliant data usage, a clear definition of the uniqueness of the algorithm, the uniqueness of the RWD request itself, and a statement of work (SOW) is essential. Healthcare institutions have already collected massive quantities of data; it’s navigating the regulatory framework to compliantly acquire that data which biopharma finds difficult.

A Statement of Work is integral to effectively tackling complex diseases with AI. Targeting algorithms towards complex disease involves the use of peripheral data, something which a SOW elucidates. The SOW also shows why biopharma needs access to specific data types, assisting with navigating regulatory requirements.

Finally, we return to the valuation of data. If the data lacks EMR categorization and is decades old, the data value drops significantly. As more metadata, more EMR categorization occurs, the relational value of this data grows, subsequently increasing the value within the scope of the algorithm. Healthcare institutions create and store data for clinical care, but typically not for algorithm development. They are slightly different data sets, though from the same source. Biopharma can access tools and services to extract, transform and load data (ETL), taking the larger proportion of clinical data and converting it to an algorithm-ready state.

Why Biopharma Wants to Bring Healthcare AI to the Masses

Biopharma companies have embodied the mantra of “prevention is better than the cure.” The goal with healthcare AI is detecting rare diseases that are chronically undiagnosed, along with early detection of more common but equally devastating conditions. A big area of focus is cardiology and pulmonary conditions, including rare diseases.

As these companies are developing algorithms in the present, many have already pushed forward not only licensing, but partnership with healthcare institutions and other for-profit partners. There are clear signs that momentum is increasing, making the next few years an exciting time for healthcare AI.

Edison Managed Services Help to Protect Biopharma Against Privacy and Legal Pitfalls

The GE Healthcare EdisonTM data acquisition process has been internally and externally reviewed by legal and privacy councils. This is observed through the lens of a particular region, following localized privacy and data protection laws. This includes GDPR, HIPAA, and DISHA among others. GDPR is about data control, whereas HIPAA is more prescriptive; something which biopharma must consider.

More than 180+ data sharing programs have already gone ahead. As individual programs commence, processes are vetted by the legal team, where Edison then follows a fair market value for those transactions before preparing data for our data partners and customers. Whether internal or external, artificial intelligence programs are very similar.

At a technical level, we consider the destination for the algorithm. Where is it running? On a local server, data is ring fenced and stays within the confines of the healthcare institution. In the cloud, data leaves this ring fence, and is thus subject to more stringent data protection. What if a patient revokes access to their data, how does biopharma purge this data to comply with a ‘right to be forgotten’ request? At the edge of the network, how is performance on clinicians computing devices affected as data moves back and forth? Does the edge of the network impact the core? Our goal is to allow healthcare institutions to adopt and successfully use AI in the long-term, without the IT or regulatory burden.

We also do this on a data-level. Privacy policies are ever-changing, where separation and segmentation of the data prior to going into the algorithm is something to consider. Is this possible, and should that be done in the hospital environment, or at the data source? These considerations reduce privacy issues concerned with the sharing of images, where said image has already been filtered in preparation for use with the algorithm. This is an evolving problem that needs to be addressed truthfully throughout the AI development process, particularly as non-medical grade devices start feeding data into the certified-medical realm.

Greater Data Access Leads to Better Healthcare AI and Rare Disease Outcomes

Edison truly believes that by expanding access to quality, compliant sources of real-world data, more healthcare AI innovation can occur. This AI safety net can catch clinician errors, identify hidden elements in images and data, and deliver targeted treatment recommendations for rare disease patients.

Collaboration between biopharma, AI developers, healthcare institutions and patients is essential in compliantly making RWD accessible to all. The result is greater productivity and certainty for clinicians, greater innovation from biopharma, and greater outcomes for patients.

Find out more about EdisonTM Digital Pharma Solutions and our real-world data acquisition services by attending an on-demand webinar, or schedule an initial consultation by emailing


[1] "Digital Disruption In Biopharma (Page 7 - Big Data)". 2019. ICON Plc.

[2] Karen Overstreet, Pamela DeSaro, Amelia Johnston, and Kevin Boylan. 2021. "Enrollment And Retention Of Participants In NIH-Funded Clinical Trials". Nih.Gov.

[4] "Digital Disruption In Biopharma (Page 4 - Introduction)". 2019. ICON Plc.