AI Maturity Model for GxP Application
A Foundation for AI Validation in the Pharmaceutical Industry, Part 1
There is no specific regulatory guidance for the validation of AI applications that defines how to handle the specific characteristics of AI. The first milestone was the description of the importance and implications of data and data integrity on the software development life cycle and the process outcomes.
No life-science-specific classification is available for AI. There are currently only local, preliminary, general AI classifications that were recently published.
This lack of a validation concept can be seen as the greatest hurdle for successfully continuing digital products after the pilot phase. Nevertheless, AI validation concepts are being discussed by regulatory bodies, and first attempts at defining regulatory guidance have been undertaken. For example, in 2019 the US Food and Drug Administration published a draft guidance paper on the use of AI as part of software as a medical device, which demonstrates that the regulatory bodies have a positive attitude toward the application of AI in the regulated industries.
Introducing a Maturity Model
As part of our general effort to develop industry-specific guidance for the validation of applications that consider the characteristics of AI, the ISPE D/A/CH (Germany, Austria, and Switzerland) Affiliate Working Group on AI Validation recently defined an industry-specific AI maturity model (see figure). In general, we see the maturity model as the first step and the basis for developing further risk assessment and quality assurance activities. By AI system maturity, we mean the extent to which an AI system can take control and evolve based on its own mechanisms, subject to the constraints imposed on the system in the form of user or regulatory requirements.
Our maturity model is based on the control design, which is the capability of the system to take over controls that safeguard product quality and patient safety. It is also based on the autonomy of the system, which describes the feasibility of automatically performing updates and thereby facilitating improvements.
We think that the control design and the autonomy of an AI application cover critical dimensions in judging the application’s ability to run in a GxP environment. We thus define maturity here in a two-dimensional matrix spanned by control design and autonomy and propose that the defined AI maturity can be used to identify the extent of validation activities.
Control design is a five-stage process. In stage 1, the applications run in parallel to GxP processes and have no direct influence on decisions that can impact data integrity, product quality, or patient safety. This includes applications that run in the product-critical environment with actual data. The application may display recommendations to the operators. GxP-relevant information can be collected, and pilots for proof of concept are developed in this stage.
In stage 2, an application runs the process automatically but must be actively approved by the operator. If the application calculates more than one result, the operator should be able to select one of them. In terms of a 4-eye principle (i.e., independent suggestion for action on the one hand and check on the other hand), the system takes over one pair of eyes. It creates GxP-critical outputs that have to be accepted by a human operator. An example for a stage 2 application would be a natural language generation application creating a report that has to be approved by an operator.
In stage 3, the system runs the process automatically but can be interrupted and revised by the operator. In this stage, the operator should be able to influence the system output during operation, such as deciding to override an output provided by the AI application. A practical example would be to manually interrupt a process that was started automatically by an AI application.
In stage 4, the system runs automatically and controls itself. Technically, this can be realized by a confidence area, where a system can automatically control whether the input and output parameters are within the historical data range. If the input data are clearly outside a defined range, the system stops operation and requests input from the human operator. If the output data are of low confidence, retraining with new data should be requested.
In stage 5, the system runs automatically and corrects itself, so it not only controls the outputs but also initiates changes in the weighting of variables or by acquiring new data to generate outputs with a defined value of certainty.
To our knowledge, there are currently no systems in pharmaceutical production at level 4 or 5. Nevertheless, with more industry experience, we expect applications to evolve for applications at levels 4 and 5.
Autonomy is represented in six stages. In stage 0, there are AI applications with complex algorithms that are not based on machine learning (ML). These applications have fixed algorithms and do not rely on training data. In terms of validation, these applications can be handled similar to conventional applications.
In stage 1, the ML system is used in a so-called locked state. Updates are performed by manual retraining with new training data sets. As the system does not process any metadata of the produced results by which it could learn, the same data input always leads to the generation of the same output. This is currently by far the most common stage. The retraining of the model follows subjective assessment or is performed at a regular interval.
In stage 2, the system is still operating in a locked state, but updates are performed after indication by the system with a manual retraining. In this stage, the system is collecting metadata of the generated outputs or inputs and indicates to the system owner that a retraining is required or should be considered, e.g., in response to a certain shift in the distribution of input data.
In stage 3, the update cycles are partially or fully automated, leading to a semi-autonomous system. This can include the selection and weighting of training data. The only human input is the manual verification of the individual training data points or the approval of the training data sets.
In stage 4 and stage 5, the system is completely autonomous with reinforced ML independently based on the input data.
In stage 4, the system is fully automated and learns independently with a quantifiable optimization goal and clearly measurable metric. The goal can be defined by optimizing one variable or a set of variables. In production, the variables could be the optimization of the yield and selectivity of certain reactions.
In stage 5, the system learns independently without a clear metric, exclusively based on the input data, and can self-assess its task competency and strategy and express both in a human-understandable form. Examples could be a translation application that learns based on the feedback and correction of its user. If the user suddenly starts to correct the inputs in another language, in the long term, the system will provide translations to the new language.
Authors: Nico Erdmann, Manager, Deloitte, Germany;
Rolf Blumenthal, Senior Consultant, Körber Pharma Software;
Ingo Baumann, Partner, Head of Delivery, Thescon;
Markus Kaufmann, Global GMP Auditor, Novartis
This article, which is part 1 of an excerpt from a more in-depth article first published in the March/April 2022 issue of ISPE Pharmaceutical Engineering, was developed as part of a larger initiative regarding AI validation. The maturity model is the first step. In fact, many other topics such as data management or risk assessment have to be considered in the validation of AI. The basic maturity model will have an influence on the risk assessment of the AI application.
International Society for Pharmaceutical Engineering (ISPE)
6110 Executive Blvd., Suite 600
20852 North Bethesda
Maryland, United States
ISPE D/A/CH e.V.