Zhenxue Jing, Hologic

Developing Robust AI Medical Products Requires Facing Data-Driven Truth

By Zhenxue Jing, Ph.D.
Zhenxue Jing, Hologic

As more medtech companies take advantage of the game-changing technology to innovate and advance products, they must anticipate the need to obtain trustworthy, evidence-based comprehensive data—and be prepared to do their own due diligence to verify the chain of evidence and meet increasingly stringent regulatory requirements.

Much already has been written about the promise of artificial intelligence to revolutionize the healthcare industry by making it easier for providers to do their jobs and help improve patient care. But far less digital ink has been given to the significant “truth” obstacle medical device manufacturers must overcome in order to successfully bring their products to market, or the increasing industry pressure to accumulate and track real-world evidence afterward.

In terms of AI development, data is everything. Getting and curating it is complex, and so is using it. Even as databases and the global market rapidly grow (it’s expected to reach $62 billion by 2027), amassing enough verified-as-fact data takes significant time, hundreds of thousands to millions of dollars, and steadfast rigor. So does monitoring the performance of a product once it has launched.1

Based on more than 15 years of developing leading global AI products for women’s health solutions, this article explains the criticality of qualified truth in data and shares keys considerations for how to attain it.

How and Why AI Works

First described in 1950, the use of machines to mimic the functions of human cognition did not evolve into full use for medical products until the early 2000s. That’s when the advent of machine learning (ML) made it possible to use known data samples to train statistical models to perform specific tasks, and medical technology manufacturers began developing products. Since then, AI medical products have been developed to serve three main purposes:

  1. To help clinicians in diagnostics and decision support
  2. To improve the efficiency and workflow within a hospital or healthcare setting
  3. To support patient management and segment treatment needs.

To create an AI/ML algorithm, scientists use hundreds to thousands of data points to “teach” the machine to recognize patterns and make predictions. For example, for early computer-aided breast cancer detection products, the research and development teams shared numerous mammogram images with and without cancer to teach the machine to recognize the features, and thereby the differences, of each. From this process, a set of rules could be created and the machine was programmed to review a mammogram image and mark any potential areas of cancer for radiologist review.

With ML, the scientist actively trains the machine, and can explain how this instruction was conducted using what data and why.

With the recent immense increases in computational power, AI product teams can now take advantage of the stronger, more robust subset of ML, deep learning (DL). DL uses the massive computational power offered by Graphical Processing Units (GPUs) to train very complex statistical models that contain hundreds of layers of parameters. Unlike its simpler predecessor, DL works more like an artificial brain with neural networks that can digest and sort information into patterns. Rather than being taught, the machine automatically processes the data and develops algorithms to recognize patterns and make predictions. It is difficult to explain exactly how this works, just that it does work when enough data is used to train a DL algorithm. As such, even more so than with ML, which has the benefit of a human supervisor, per se, the accuracy of the machine’s DL “black box” is fully dependent on the quality and accuracy of the data provided. Data accuracy is an assessment of the correctness of the information, which can include how representative the data is of the intended user groups.

In fact, generally part of the regulatory review process for an AI medical product is ensuring the data used to train the AI is verifiably true and adequately representative. As such, a company must know and be able to prove, validate and articulate where the data came from, how it was acquired, how it was analyzed and why (with evidence) it represents fact. For example, the final biopsy report is the qualifying truth for whether a mammogram image does or does not feature malignant cancer. For AI diagnostic products for mammography, the FDA requires each data point to be validated as true and for the entire data set to be adequately representative. To meet this requirement, a company must enlist a qualified physician to manually review each image against the final biopsy report and “truth mark” it.

While this qualified truth step is absolutely critical in order to accurately train an AI algorithm—and to gain regulatory approval—it requires significant resources. Even before DL came to be, getting the amount of comprehensive data necessary, especially for new products, took significant planning and time—sometimes multiple years—and financial resources.

Increasing Regulatory Pressure

Now, with the added DL complexity and power, regulators are putting more scrutiny on AI software as a medical device product development tool.

In January 2021, the FDA released the agency’s first “AI/ML-Based Software as a Medical Device Action Plan”, making the regulatory process more transparent for future AI/ML based software devices. The plan describes a multipronged approach to advance the agency’s oversight of AI/ML-based medical software. It emphasizes the importance of good machine learning practice when it comes to data collection and labeling, and includes plans to initiate a pilot for real-world performance monitoring. In May, the European Union published plans to tighten regulations on high-risk AI medical products. Other regulatory bodies also are weighing in, including China’s National Medical Products Administration.

Regulatory bodies generally aim to avoid imposing unnecessary burdens on companies that could discourage or delay innovation. However, their priority remains safety and efficacy. And, because no one can explain how the novel DL/AI black box works, regulators want more assurances that products will perform as initial studies indicated once in clinical use.

Part of this “real-world” scrutiny aims to double-check for potential bias in the AI algorithm. Data collection practices, requirements and regulations vary, especially across borders. As a result, depending on where the data comes from, an AI product development team may have more or less data itself and/or information on its derivation and history. For example, in the United States, data privacy laws require that any identifying patient-related information (i.e., name, age, race, physician attending them, etc.) be scrubbed from the record before it can be shared. While this protection is in place for good reason, it makes it next to impossible for scientists to know if an AI algorithm has been trained using representative population samples, especially those in which patient characteristics could impact clinical outcomes of a given treatment option.

Beyond regulatory bodies, these potential discrepancies also raise eyebrows among some clinicians, rendering them reluctant to adopt a new technology until real-world evidence proves its efficacy among patient groups.

With this in mind, medical technology companies need to take steps early in the process to minimize potential sample bias, such as by collecting data from various regions and population types, or, where possible and relevant, different global databases, understanding collection and sharing practices differ from nation to nation. Still, doing so will not guarantee the data is free of bias. A risk of bias (or lack of representation) is why some countries—including China and certain member nations of the European Union—require companies to conduct clinical studies with local populations as part of their regulatory approval process, and why real-world performance post-launch is gaining ground.

Adopting Real-World View

To be clear, in the United States and other parts of the world, AI medical products must meet strict requirements and go through rigorous review processes for approval to establish whether the device safely and efficaciously serves its intended purpose. However, this currently is established through clinical studies and trials in controlled environments for the most part. Once a product is being used in the real world, more variables come into play, such as more diversity among the patient characteristics, more variation in clinical know-how and quality, etc., which can impact performance. As such, medical technology companies interested in harnessing the power of AI should as a best practice monitor real-world performance with or without regulatory pressure to help ensure patient and clinical efficacy.

For medical technology companies themselves, collecting real-world evidence offers several compelling benefits. First, this demonstrative evidence will help differentiate products for competitive advantage and thus drive adoption. It empowers reimbursement strategies to come to fruition since they often depend on real-world clinical evidence. Importantly, it also lays the foundation for the regulatory approval and market adoption of product upgrades and future innovations. The FDA’s guidance on the predetermined change control plan will enable manufacturers to bring future product enhancements to market in more controlled and timely fashion.

However, conducting real-world clinical studies is often cost-prohibitive and time-consuming for medical technology companies of all sizes. To address this critical barrier, the FDA’s action plans to initiate pilot programs for real-world performance monitoring that will be in coordination with other ongoing FDA programs focused on the use of real-world data.

A similar approach has been proven effective with past medical products, including when a multicenter trial was conducted in the early days of diagnostic testing for mammograms. In that case, three different companies’ products were tested to gauge whether they helped improve breast cancer detection when compared to radiologists reviewing the images independently. The two-year clinical trial and one-year of follow-on data analysis proved digital mammography’s efficacy. After that, digital mammography became accepted for breast cancer screening and clinical institutions began adopting it. As a result, in the regions like the United States and Europe that widely adopted the technology, breast cancer prevention and detection improved, and death rates continue to decline as more individuals gain access to screenings.

As seen with mammography and many other care sectors, AI medical products promise to revolutionize the healthcare industry as we know it. As more and more medical technology companies take advantage of the game-changing technology to innovate and advance products, they will need to anticipate the need to obtain trustworthy, evidence-based comprehensive data—and be prepared to do their own due diligence to verify the chain of evidence and meet increasingly stringent regulatory requirements.


  1. Artificial Intelligence (AI) in Healthcare Market Size Worth $61.59 Billion By 2027 | CAGR of 43.6% By Reports and Data”. (January 19, 2021). Reports and Data.

About The Author

Zhenxue Jing, Hologic