Low-dimensional neural ODEs and their application in pharmacokinetics

Pharmacology and pharmacometrics play an important role in research, development, and application of therapeutics [1]. Pharmacometric analyses, including pharmacokinetic (PK) and pharmacodynamic (PD) analyses, are usually based on describing clinical data through mathematical-statistical models with well-defined differential equations based on law-of-mass action and first principles [2,3,4].

Recent research papers propose machine learning (ML) as a tool complementing conventional pharmacometric methods [5,6,7,8,9,10]. Several ML approaches were applied to develop predictive models, e.g., to forecast risk of phototherapy in newborns with hyperbilirubinemia [10], to differentiate between diabetes insipidus and primary polydipsia [11], and to facilitate covariate screening and selection [12, 13]. A main method in ML is a neural network (NN), which is basically an approximation approach for non-linear functions. Due to their capability to approximate various input–output relationships, NNs were applied in different scenarios such as (i) data imputation of missing covariates [14], (ii) covariate selection [15], and (iii) model reduction of quantitative systems pharmacology models [8]. Other publications discuss the use of NNs to approximate PK functions for concentration–time profiles and the possibility to perform PK simulations [16].

The substantially different character of most ML approaches compared to conventional PK approaches impede their broad application in pharmacometrics. Therefore, the recently presented approach of neural ordinary differential equations (NODE) [17] gained special attention. In this approach, ordinary differential equations (ODE) are combined with NNs. Although their functionality is similar to ODEs, the right-hand side of an NODE is no longer a mechanism developed by the modeler but an NN that learns the mechanism solely based on available data. It has been shown that NODEs are able to fit PK and PD data as well as other dynamic systems from health sciences well [18,19,20,21,22]. However, despite the shared concepts between ODEs and NODEs, several aspects of NODEs differ from classical modeling with ODEs. Compared to previous publications utilizing NODEs in PK, we aim at developing a pharmacometric based ML model and present a low-dimensional NODE approach related to PK principles.

This research work includes three major parts. First, we build a theoretical concept for our NODEs. To this end, we developed an NODE structure explicitly based on PK principles. This approach differs significantly from most conventional ML approaches, where the structure is developed empirically [23]. With our development, we ensure that the NNs in the NODEs can describe various PK scenarios. Additionally, we discuss opportunities to combine partially known mechanisms with NNs, a discipline called scientific ML [24, 25]. Second, we develop a methodological setup for applying our NODE structures in pharmacometrics. Here, we face common ML challenges, such as avoiding overfitting and performing simulations for unseen data, and provide practical solutions to these challenges. Third, we utilize previously elaborated concepts and setups for the application of our NODEs in pharmacometrics and present the results for various PK scenarios.

The overarching goal of this research work is to provide a general insight into NODEs to further stimulate research and applications in the field of pharmacology and pharmacometrics. The presented low-dimensional NODE concept differs significantly from other NODE implementations including e.g., encoder-decoder structures [19, 26]. It does not include inter-individual variability and covariate effects, which will be subject of further investigation.

Theoretical

The aim was to develop an NODE structure that is tailored to handle various linear and non-linear PK behavior including distribution processes and potentially delayed absorption. As this paper aims at introducing a general concept, we currently only focus on fitting the average profile of a population and no covariate effects were included. In this section, we focus on theoretical concepts of NODEs in pharmacometrics, particularly PK analyses, which will be presented in five sections. First, a brief introduction to NNs and their characteristic as function approximators is provided. Second, the concept of substituting the right-hand side of an ODE with an NN is presented. Third, the reduction of multi-dimensional ODE systems to a one-dimensional system is presented. This means that there are no assumptions about the mechanism required (e.g., number of peripheral compartments). Fourth, specific NODE structures based on PK principles are developed such that they can be applied to various PK scenarios. Fifth, the concept of combining partially known mechanisms with NNs is briefly addressed. Throughout all presented concepts, single-dose scenarios were considered. Possible adjustments for multi-dose scenarios are provided in the discussion. In addition to the presented NODE related concepts, general ML concepts, such as hyperparameter tuning and cross-validation should be applied if required in specific real-life projects [27, 28].

Introduction to neural networks

An NN is a parameter-dependent function that characterizes an input–output relationship. The input–output relationship is based on compositions of serial calculation steps, referred as layers. Each layer consists of several neurons and each neuron is a simple calculation step of multiplication and addition [8]. We focus on NNs consisting of one hidden layer \(H\in }^_}\) with \(_\) neurons, and an input feature \(X\in }^_}\) that is mapped to an output feature \(Y\in }^_}\) with the NN \(_ : }^_}\to }^_}\). Hence, the NN structure reads

$$_\left(X\right)=^\left(^\cdot ^\left(^\cdot X+^\right)+^\right)$$

(1)

where \(^\in }^_,_}\) are the weights from the input layer to the hidden layer, \(^\in }^_}\) the biases at the hidden layer, \(^\in }^_,_}\) the weights from hidden layer to output layer, \(^\in }^_}\) the biases at the output layer, \(^ : }^_}\to }^_}\) the activation function from input to hidden layer, and \(^ : }^_}\to }^_}\) the activation function from hidden to output layer. Both activation functions are applied component-wise at the right-hand side of Eq. (1). During the training of an NN, the parameters are optimized in order to approximate the underlying function characterizing the input–output relationship observed in the training data. The advantage of an NN is that, based on some assumptions, even an NN with one hidden layer is capable of approximating any continuous input–output [29].

For illustration purpose, we present an NN structure with \(_=1\) and \(_=1\), the prominent non-linear ReLU activation function from input to hidden layer

$$^\left(z\right)=\mathrm\left(0, z\right) ,$$

and the identity from hidden to output layer

The NN in Eq. (1) can be reformulated and the matrix multiplications can be written as the following summation:

$$_\left(x\right)=\sum_^_}_^\cdot \mathrm\left(0,_^\cdot x+_^\right)+^ ,$$

(2)

where the indices denote the entries of the matrices (e.g., \(_^\) the i’th entry of the weight matrix \(^)\). We observe that an NN is basically a summation of activation functions and that in this case, the output is a stepwise linear function.

In Fig. 1A, we present a schematic NN with \(_\) = 5. In Fig. 1B, we show an example of the unit activations in the hidden layer and its resulting NN output.

Fig. 1figure 1

In panel A, the structure of an NN with a one-dimensional input, one hidden layer with five neurons, and a one-dimensional output is shown. Arrows denote multiplication with weights, plus signs denote addition of biases and ReLU and Id indicate the applied activation function from input to hidden layer and from hidden to output layer, respectively. In panel B, outputs of the neurons in the hidden layer and the final output of the NN in panel A is illustrated

Introduction to neural ODEs

The basic concept of NODEs is based on ODEs expressing the derivative of a variable \(x\) as an explicit function \(f\left(x\right)\) as follows:

$$\fracx=f\left(x\right) , \,x\left(0\right)=^ .$$

(3)

In the previous section, NNs were presented as function approximators. In NODEs, NNs are utilized to approximate the right-hand side of an ODE. Hence, the function \(f\left(x\right)\) from Eq. (3) is now substituted with an NN, namely \(_\left(x\right)\). This results in

$$\fracx=_\left(x\right) , \,x\left(0\right)=^ .$$

(4)

The NODE in Eq. (4) can then be solved with any ODE-solver, like the ODE in Eq. (3), and the NN parameters of \(_\left(x\right)\) are optimized based on training data. Thus, NODEs are a data-driven approach to approximate the dynamics observed in training data, such as PK concentration–time data.

Reduction of multi-dimensional systems to a non-autonomous one-dimensional system

PK models are usually autonomous multi-dimensional ODE systems, meaning the right-hand side of the ODE is time-independent. The reason for this multi-dimensionality is the characterization of the underlying pharmacological, physiological, and biological mechanism based on first principles and law-of-mass action. For example, a two-compartment intravenous (IV) model consists of two equations, one for the central compartment fitted against the measured concentration data, and one for the peripheral compartment. Another example are oral (PO) models with delayed absorption, i.e., including one or multiple transit compartments. An even more complex example is the target-mediated drug disposition (TMDD) IV model [30] with three equations (central compartment, receptor, and drug-receptor complex).

In contrast, the advantage of NODEs is that they are a data-driven approach. Thus, no assumptions about the mechanistic model, e.g., the number of transit or peripheral compartments, should be required. To this end, we reduced multi-dimensional ODE systems to a one-dimensional ODE system. As shown for example for linear ODE systems in Appendix A1 and A2, this dimensional reduction of autonomous multi-dimensional systems results in non-autonomous systems, i.e., the function on the right-hand side is time-dependent, as indicated in step 1 of Fig. 2.

Fig. 2figure 2

A schematic overview of the major concepts to build our NODE for PK. The first step towards our NODE for PK is to reduce autonomous (time-independent right-hand side) multi-dimensional ODE systems to a non-autonomous (time-dependent right-hand side) one-dimensional system since the general NODE should be applicable without mechanistic assumptions. The second step is to substitute the right-hand side of an ODE with an NN. In a third step, the concentration variable and time variable were separated into two NNs with one-dimensional input and output. In a fourth step, generally known structures such as drug administration with absorption process can be included in the NODE. In a fifth step, the NODE can optionally be combined with a known mechanistic part, if prior knowledge is available

Basic NODE structure of a non-autonomous one-dimensional system

As indicated in step 2 of Fig. 2 and based on the reduction of autonomous multi-dimensional systems to a non-autonomous one-dimensional system, our basic NODE structure, e.g., capable to fit IV PK data without assumptions about the number of peripheral compartments, reads

$$\frac_=_^\left(_,t\right) , \,_\left(0\right)=\frac ,$$

(5)

where we emphasize with \(_\) the “central compartment”, i.e., the state variable that is fitted against the concentration measurements. In addition, \(t\) denotes the explicit time, \(d\) the dose and \(V\) the volume of distribution. The volume of distribution \(V\) can be considered as a special PK parameter since it can appear in the initial condition of the PK model, compare Eq. (5), and basically scales the PK profile. Therefore, the estimated parameters are the NN parameters and \(V\), if not stated otherwise.

Development of our NODE based on PK principles

In this section, several PK principles are utilized to adjust and simplify our NODE structure for the application of NODEs tailored to PK scenarios.

General NODE structure with separated time- and concentration-dependent right-hand side

As presented in Eq. (5), a one-dimensional NODE to fit IV PK data must be non-autonomous. In contrast to the formulation in Eq. (5), however, the concentration-dependent and time-dependent functions are additively separated, according to Appendix A1 and A2. Therefore, and with additional motivation in Appendix A4, the inputs concentration and time were separated into two NNs with one-dimensional input and output in our NODE structure. As indicated in step 3 of Fig. 2, this results in our general NODE structure with a concentration-dependent NN, \(_^\left(_\right)\), and a time-dependent NN, \(_^\left(t\right)\), according to

$$\frac_=_^\left(_\right)+\frac\cdot _^\left(t\right) , \,_\left(0\right)=\frac .$$

(6)

Like observed, e.g., after the completion of distribution processes, and to assure that for \(t\to \infty\) the state variable \(_\) does not increase infinitely, the time-dependency in the NODE should vanish. To this end, we restrict the weights from input to hidden layer in \(_^\) to be as \(_^\) < 0, such that \(\underset}\mathrm\left(0,_^\cdot t+_^\right)=0\) and therefore \(\underset}_^\left(t\right)=^\). This was achieved by applying \(=-(^)\) where \(w\) indicates the original weight and \(w\) the restricted weight applied in the NN. The term \(\frac\) in Eq. (6) is required to fit different dose levels with the same NODE.

NODE structure with absorption

In principle, Eq. (6) can produce non-monotonic behavior, as observed e.g., for PO administered drugs. Since the route of administration is known from the clinical setup, we apply the typical absorption structure and add an additional absorption compartment \(_\) to Eq. (6), as illustrated in step 4 of Fig. 2. Hence, we obtain the two-dimensional NODE

$$\frac_=d\cdot _^\left(t\right)-_^\left(_\right) , \,_\left(0\right)=0 ,$$

(7)

$$\frac_=_^\left(_\right)-_^\left(_\right) -d\cdot _^(t) , \,_\left(0\right)=0 ,$$

(8)

where we set the weights from input to hidden layer as \(_^\) < 0 in \(_^\) and \(_^\). In case of an absorption compartment, we omit \(V\) since scaling can take already place in \(_^\). Motivated by Appendix A2, the time-dependent NN in the absorption compartment \(_^\left(t\right)\) allows to also model PO administered drugs with delayed absorption.

NODE structure with infusion

Like the NODE structure with absorption, IV infusion can be explicitly built into the NODE since the route of administration, the infusion rate \(_\), and infusion time \(_\) are known from the clinical setup. Thus, Eq. (6) can be modified like a conventional ODE for IV infusion

$$\frac_=\frac_}\cdot 1\left(t\le _\right)+_^\left(_\right) +\frac\cdot _^\left(t\right), \,\,_\left(0\right)=0 .$$

(9)

Note that \(V\) must be estimated since the input to the central compartment is explicitly built in with \(_\) and in contrast to the absorption in Eq. (8), the scaling cannot be approximated by an NN. Dose \(d\) is calculated as the amount of total drug applied, i.e., \(d=_\cdot _\).

Combining partially known mechanisms with neural networks

NODEs can be combined with partially known mechanisms, as illustrated in step 5 of Fig. 2. Combining known mechanistic parts with NNs that are supposed to learn the unknown parts, called scientific machine learning [24, 25], is currently strongly evolving. Here, we only scratch on the surface of this discipline and present for illustration purpose two potential PK examples.

NODE structure with mechanistic elimination

Even though an NODE with Eqs. (7, 8) can fit various oral PK models, prior knowledge of the drug of interest can be leveraged. Assuming a one-compartment model, e.g., from a previous IV study, with a known elimination mechanism and an unknown absorption mechanism, Eqs. (7, 8) can be modified to

$$\frac_=d\cdot _^\left(t\right)-_^\left(_\right) , \,_\left(0\right)=0 ,$$

(10)

$$\frac_=_^\left(_\right)-_\cdot _ , \,_\left(0\right)=0 .$$

(11)

Note that this would also be applicable, if linear elimination with unknown distribution mechanism is assumed, replacing the concentration-dependent NN \(_^\left(_\right)\) in Eq. (6) by \(-_\cdot _\).

NODE structure with mechanistic absorption

Like Eqs. (10, 11), a known linear absorption mechanism but with unknown distribution and elimination mechanism can be assumed by modifying Eqs. (7, 8) to

$$\frac_=-_\cdot _ , \,_\left(0\right)=\frac ,$$

(12)

$$\frac_=_\cdot _-_^\left(_\right)-\frac\cdot _^\left(t\right) , \,_\left(0\right)=0 .$$

(13)

留言 (0)

沒有登入
gif