Oral drug suitability parameters

Assessing the oral drug suitability of compounds as early as possible is an important objective in drug discovery. Two new oral drug suitability parameters are proposed to facilitate the ranking of compounds with respect to dose and in vivo exposure.


Introduction
When assessing the oral drug suitability of compounds it is common to consider physicochemical parameters such as hydrophobicity and molecular size. 1,2 Over the past two decades, virtual drug design has been strongly influenced by hypotheses derived from the comparison of large sets of physicochemical property data associated to oral drugs; [3][4][5][6][7][8][9] in particular, hypotheses that indicate limitations on a compound's predicted log P. However, such propositions tend to be ambiguous with regard to causality, as they rely heavily on arguments based on simple data trend observations. In practice, a compound's success as an oral drug in humans is predominantly defined by in vivo exposure, efficacy and toxicity factors; the evidence directly implicating a dependence of predicted log P on these in vivo factors is weak and influenced by the type of compounds being considered. 8,10 The octanol-water partitioning system from which a log P is derived, and on which a predicted log P is based, is intended as a model system for compound transfer between aqueous and lipid phases and is arguably too simplistic for a whole mammalian system. This study addresses this point by proposing alternative oral drug suitability parameters for consideration during virtual drug design.
The oral drug suitability parameters are derived from rate equations for determining in vivo quantity levels associated to a relevant mathematical model of a mammalian system. The focus of this study is a mathematical model for a human system in the form of a three-compartment pharmacokinetic (PK) model, which imposes a time limitation on the absorption of an orally dosed drug from the small intestines, assumes non-instantaneous drug distribution between plasma and tissue, and only allows for elimination of the absorbed drug from one compartment. 11,12 Application of the proposed PK model in a repeat-dosing simulation can approximate steady-state in vivo data for a quantity (e.g., maximum plasma concentration) as a function of the oral dose, resulting in a dose-quantity curve (Fig. 1).
Each dose-quantity curve has two compound-specific features, which are treated as a pair of oral drug suitability parameters: the first is the dose-quantity intercept (DQI) for the regression line fitted through the curve's linear PK region; the second is the highest dose that still leads to linear PK (or near-linear PK) for the particular quantity (highest linear dose [HLD]). This study describes the methodology for calculating a compound's DQI and HLD, and demonstrates how to interpret the results to assess a compound's oral drug suitability. Importantly, the PK model is not intended to be physiologically accurate, rather it is a minimum model system that permits consideration of key factors related to a drug's in vivo exposure. Hence, the DQI and HLD contain information on the absorption, distribution and elimination properties of a compound based on this model system, which can facilitate the ranking of compounds. However, the calculated DQI and HLD values may differ from an in vivo measurement.
This study considers the application of the aforementioned PK model to 15 known oral drugs. The size of the set was restricted to 15 because of the limited number of compounds (within the literature and the European Bioinformatics Institute's ChEMBL database 13 ) with experimentally derived values for pK a , aqueous solubility at pH 7.4 (and room temperature, solubility pH7.4 ), apparent human Caco2 membrane permeability (A to B) at pH 6.5 (P app,Caco2,pH6.5 ), human volume of distribution at steady state (V ss ), human in vivo plasma clearance (Cl), human in vivo efficacy and, optionally, human plasma protein binding (PPB). However, it is envisaged that the DQI and HLD should be used at the virtual design stage and that such compound information would be generated using in silico QSAR models. Such methods have associated prediction errors, which can be large, so the impact of these errors on the DQI and HLD values is discussed.

PK model
The PK model combines a two-compartment model with an additional compartment to model absorption of extravascular doses of a compound from the small intestines. 11,12 This model can be described by a series of linear differential equations that can be solved using the Laplace transformation technique to derive a series of rate equations. 30 Fig. 2 shows a compartmental representation of the model, and details of the mathematical equations can be found in the ESI. † The application of the rate equations, along with the input data required to calculate compound compartment levels at different time points, constituted a simulation. Simulations were run at a series of doses and included multiple distribution kinetics scenarios. An open-source Python (software) library was written to perform and analyse these simulations. 31 To establish steady-state conditions, simulations considered repeat dosing. Using the principle of superposition, compound compartment levels were considered for each repeated dose from time of dose to end of the simulation, and then all levels at each time point were summed for the repeated doses. Compartment A represented a hypothetical uniform cylindrical intestinal segment, filled with a constant intestinal fluid volume (V intestinal , L) of 0.08 L, 32 into which the compound (in solid form) was delivered as a bolus; for this study, the intestinal fluid was represented by aqueous pH 6.5 buffer. The uniform mixing and dissolution process within compartment A was treated as instantaneous and only solubilised compound could undergo the one-way transfer into compartment B; this transfer was treated as either a zero-order or a first-order process, characterised by the absorption rate constant k 1 (min −1 ), similar to the approach used by Dressman et al. 12 The model restricted this process to an absorption window, during which compound could only be removed from compartment A by said absorption; any remaining compound at the end of the absorption window was discarded, and the model reverted to a classic twocompartment PK model. 33 For clarity, the absorption window represented the transit of this uniform cylindrical slug of intestinal fluid (containing the dosed drug) through the small intestines, during which its shape and volume remained unchanged.
Compartment B (the central compartment) represented the body spaces into which a compound could distribute extremely rapidly (i.e., plasma and well-perfused tissues, including the major eliminating organs). The volume of compartment B was compound specific and represented the compound's initial dilution volume (V central , L). Compartment C (the peripheral compartment) represented the body spaces into which a compound distributed more slowly. Compound transfer from compartment B to compartment C was treated as a first-order process, characterised by the rate constant k 2 (min −1 ), and the reverse process was characterised by the rate constant k 3 (min −1 ). It was assumed that, once absorbed, compound could only be removed permanently from the body from compartment B via a first-order process, characterised by the rate constant k 4 (min −1 ).

Simulation time frame
Each simulation spanned 168 h and comprised 14 repeat doses of similar size for each compound. Each simulation time frame consisted of a start time (t 0 , min) and an end time (t end , min). With respect to the ith dose, t i (min) was the time that this dose was given, and the first dose (t 1 ) was equal to t 0 . The time of subsequent doses (t 2 , t 3 , etc.) was 12 h after the previous dose and reflects the dosing interval. Mathematically, each dose was considered in isolation, where the amounts of compound in the PK model's three compartments was zero at t i . A bolus of the ith dose appeared in compartment A at time t i , A , where the time difference between t i and t i,A represented an absorption delay of 1 h to reflect the gastric lag phase. 34 The absorption window started at t i,A and ended at time t i,B , and equated to 4 h. 35,36 Compound in compartment B was transferred to compartment C or removed from further consideration during the time between t i,A and t end . This time period also reflected the time during which compound in compartment C could be transferred to compartment B.

Saturation state of compartment A
For scenarios where V intestinal was saturated with compound at t i,A , a constant rate of compound transferred from compartment A to compartment B applied, given by k 1 multiplied by the saturated amount of compound in the intestinal fluid (mg). The latter was derived from the compound's aqueous solubility at pH 6.5 (solubility pH6.5 , M) multiplied by its molecular weight and V Intestinal . This rate of transfer was considered up to time t i,B or t i,A,unsaturated , when V intestinal became unsaturated. At t i,A,unsaturated the amount of compound remaining in compartment A reflected its solubility pH6.5 , from which point the rate of transfer followed first-order kinetics until time t i,B .
The solubility pH6.5 was calculated by multiplying a compound's solubility pH7.4 by the ratio of its fraction neutral at pH 7.4 to its fraction neutral at pH 6.5; the fraction neutral values were calculated using the method described by Wenlock. 37 Simulation time points Each simulation was defined by approximately 1000 time points. This included time points at 15 min intervals between t 0 and t end , along with 336 randomly selected time points from the same time frame. The following specific time points were also considered: t 0 , t i , t i,A , t i,B , t end and, where applicable, t i,A,unsaturated . Estimating k 1 Estimations were based on a compound's predicted human jejunal effective permeability at pH 6.5 (P eff,human,pH6.5 , cm s −1 ) from a uniform cylindrical intestinal segment, filled with intestinal fluid equal to V intestinal , using the following equation: where SA is the surface area of the cylindrical intestinal segment. 38 For a uniform cylinder, V intestinal is equal to πr 2 l and SA equals 2πrl, where r is the radius equal to 1.25 cm and l is the length. 35,39 To account for intestinal folds, the value of SA was further multiplied by an absorption amplification factor of 2. 39,40 P eff,human,pH6.5 was estimated based on a compound's P app,Caco2,pH6.5 using a similar approach to that previously described. 37,41 Critically, this approach focuses on the establishment of a (log 10 -log 10 ) linear regression equation for a set of compounds between their neutral species human membrane permeability at pH 6.5 (P m,human,neutral,pH6.5 , cm s −1 ) and their neutral species human Caco2 membrane permeability (A to B) at pH 6.5 (P m,Caco2,neutral,pH6.5 , cm s −1 ). For this study, a larger set of 32 compounds was used to establish the following regression equation: log 10 (P m,human,neutral,pH6.5 ) = 0.916·log 10 (P m,Caco2,neutral,pH6.5 ) + 1.579 (2) where r 2 = 0.88. The data for this regression equation were predominantly sourced from Avdeef and Tam, 19 and supplemented with data from Sjögren et al. 41 The charge type at pH 6.5 of the compounds considered was either monoacidic, monobasic, neutral or zwitterionic. Further details can be found within the ESI. †

Fraction escaping first-pass metabolism (F h )
The model assumes that drug absorption is exclusively through the small intestines, with no gut wall metabolism, and that F h can be estimated using the equation: 42 where Q h (mL min −1 ) is the rate of liver blood flow and Cl b is the human in vivo blood clearance (mL min −1 kg −1 ) for a compound. This study used a value of 1450 mL min −1 for Q h in humans and a body weight of 70 kg. The model assumes that the ratio of blood concentration to plasma concentration equals 1 for all of the compounds considered. Hence, Cl is assumed to equal Cl b .
Distribution kinetics scenarios and estimates for k 2 , k 3 and k 4 Three volume of distribution terms are relevant for each compound considered: V central , V ss and the terminal volume of distribution (V terminal , L). A compound's V ss value is one of the model's required input data values (see Table 1). With respect to V central and V terminal , each simulation considers five distribution scenarios: (i) V central equals 3.0 L, and the ratio of V terminal to V ss equals 1.1; (ii) V central equals 50% of the V ss , and the ratio of V terminal to V ss equals 1.1; (iii) V central equals 3.0 L, and the ratio of V terminal to V ss equals 2.0; (iv) V central equals 50% of the V ss , and the ratio of V terminal to V ss equals 2.0; and (v) V central equals the mid-point between the two previous values of V central , and V terminal equals the mid-point between the two previous values of V terminal . The first four scenarios represent possible extreme values and the fifth represents a mid-point value for a compound. For each scenarios, values for k 2 , k 3 and k 4 were derived using the values for V central , V ss , V terminal and Cl, in conjunction with rearrangement of standard mathematical equations associated to a two-compartment model. 33 Details of these equations can be found in the ESI. † Five simulations were performed for each compound and each dose, based on the five scenarios.

Quantities calculated
A simulation resulted in estimates of quantities of drug in compartments B and C at different time points. For each dose, the compound levels at steady state in compartment B were calculated, including the maximum concentration (C ss,central,max , mg L −1 ) and area under the curve (AUC ss,central , mg min L −1 ). Steadystate conditions were assumed to have been reached by the time of the last repeat dose in a simulation. The AUC ss,central was calculated using the composite Simpson's rule, integrating over the time course of the last repeat dose. Free levels were calculated by multiplying the total C ss,central,max or AUC ss,central by the fraction unbound (i.e., (100 − % bound)/100).
The quantities calculated for each simulation were collated and used to create a (log 10 -log 10 ) dose-quantity curve for each distribution kinetics scenario.

Estimating DQI and HLD
For a set of dose-quantity data, the DQI equals the intercept for a linear regression equation through the region of the log 10 Ĳdose) versus the log 10 Ĳquantity) curve, where the slope was approximately equal to 1 (within ±0.0001). The HLD was the log 10 Ĳdose) calculated using said linear regression equation and the maximum value of the log 10 Ĳquantity) data (i.e., log 10 Ĳmax_quantity)). Typically, this maximum value reflected the level of the horizontal plateau (Fig. 1). In the few cases where no plateau was observed, the maximum value was that of the highest dose considered.

Assessing the impact of errors on DQI and HLD
Variations in DQI and HLD values associated to total level quantities, based on errors in V ss and Cl values for each compound, were also considered. An additional 50 scenarios were calculated for each compound, where original values for these two parameters were replaced by values randomly selected from a Gaussian distribution. Each distribution was based on the log 10 (original value) and a standard deviation value of 0.3.

Presentation of DQI and HLD data
Each set of DQI and HLD data was treated as a Cartesian coordinate pair; due to the five distribution kinetics scenarios, each compound had five pairs. Where errors were considered, each compound had 255 coordinate pairs (i.e., five distribution kinetics scenarios for each of the 51 input data scenarios). Non-self-intersecting closed polygons, whose vertices were described using these Cartesian coordinates in the DQI-HLD plane, were used to represent each compound. The centroid and area of each polygon were calculated using standard methods. Again using standard methods, the second moments of area of each polygon were calculated to reflect the distribution of vertices in the x-axis (i.e., HLD dimension) and y-axis (i.e., DQI dimension).

DQI and HLD parameters
The proposed oral drug suitability parameters are intended to provide insight into the in vivo quantities of a compound based on a relevant PK model of a mammalian system. For the purpose of this study, steady-state conditions are of interest, and 14 repeat doses within a simulation are considered sufficient to establish such. Importantly, the DQI and HLD parameters are composite terms that simplify the understanding of how a compound with a given pK a , solubility pH7.4 , P app,Caco2,pH6.5 , V ss , Cl and (in the case of free levels) PPB profile behaves within the PK model, facilitating dose predictions. Fig. 1 highlights a typical dose-quantity curve resulting from the application of the PK model. It is characterised by a linear region with a slope of 1 for doses below a certain limit. As the dose increases above this limit, compartment A becomes increasingly saturated for the duration of the absorption window. This results in the dose-quantity curve bending as the dose increases, eventually plateauing. For the linear region, the corresponding regression equation takes the form of: log 10 (quantity) = 1·log 10 (dose) + DQI Rearrangement of eqn (4) leads to eqn (5) or (6): However, these equations only apply up to a log 10 Ĳdose) equal to the HLD; this equates to the extrapolated log 10 Ĳdose), using eqn (4), for the log 10 Ĳmax_quantity) observed for a particular distribution kinetics scenario (Fig. 1). The HLD value arrived at using this method is not strictly the HLD, as the dose-quantity curve has already begun to deviate from linearity at this log 10 Ĳdose) (Fig. 1); rather, it is an approximation using a standardised approach.
Importantly, the DQI and HLD provide a simple way to understand a compound's dose-quantity curve. A plot of DQI against HLD provides an alternative scale against which to evaluate different compounds.

PK model
The PK model used in this study is considered the minimum necessary to account for time-limited oral absorption, noninstantaneous drug distribution between plasma and tissue, and elimination from the central compartment (B). The PK model is intended to be a generic model for use at the virtual drug design stage, and the model settings described in the Experimental section are intended to have broad applicability. Importantly, the PK model and the associated constraints may not be optimal for all compounds. The approach used for modelling oral absorption is based on that used by Dressman et al. 12 It is simplistic in nature but deemed sufficient for use at the virtual drug design stage. More complex models are available and Sjögren et al. can provide further insight. 41 For clarity, the absorption window is necessary to account for non-linear PK resulting from compound absorption limitations, and the calculation of a compound's HLD is dependent on this feature.
Other PK models based on this linear differential equation approach are possiblerate equations for some alternative PK models can be found in the ESI. † The simplest of these is a one-compartment intravenous (bolus dosing) model that only considers a first-order rate constant for elimination. 43 This model can be extended to include an oral absorption stepthe approached used by Wenlock and Page. 37,42 Such a model is applicable for compounds where oral absorption cannot be assumed to be instantaneous; it can also be refined to account for time-limited oral absorption, and details of this can be found in the ESI. † A limitation of onecompartment models is the assumption of instantaneous drug distribution between plasma and tissue; this is overcome in the present PK model by use of a second compartment that permits distribution kinetics to be modelled. For reference, details of a two-compartment intravenous model can be found in the ESI. †

Distribution kinetics scenarios
Treating distribution of absorbed drug as not being instantaneous can lead to significant variations in calculated in vivo quantities. 33 Specifically, the model assumes noninstantaneous distribution between compartments B and C, albeit distribution within each compartment is instantaneous.
It is intended that DQI and HLD be calculated at the virtual drug design stage using in silico technologies 37,44 to provide the input data, but, for simplicity, this study only considers experimentally derived input data. With respect to a compound's distribution kinetics, it is unlikely that there will be any insight into these at the virtual design stage. To account for this, the model considers five hypothetical distribution kinetics scenarios for each compound, intended to cover a range of possibilities that encompass the true situation. These scenarios depend on a range of hypothetical values for a compound's V central and V terminal . V central ranges from 3.0 L, which is the approximate value for the plasma volume in human, to a value that is 50% of the compound's V ss ; a mid-point value is also considered. V terminal ranges from 1.1 times the V ss to 2.0 times the V ss ; a mid-point value is also considered. For comparison, Rowland and Tozer 33 provide human details for aspirin, salicylic acid and gentamicin C: their V central values can be calculated as 5.3, 6.8 and 14.0 L, respectively, and their V terminal values as 1.1-, 1.3-and 5.1-fold greater than V ss , respectively. The V terminal of gentamicin C is considered an extreme and a value 2.0-fold greater than V ss deemed appropriate. To illustrate the effect of these different distribution kinetics scenarios, Fig. 3 shows the dose versus C ss,central,max curves generated by the PK model for two of these scenarios for diazepam (further details can be found in the ESI †).
It can be seen from Fig. 3 that differing distribution kinetics scenarios can lead to significant variation in values for DQI (i.e., 2.1-fold) and HLD (i.e., 1.3-fold).

Visualising DQI and HLD data
To properly reflect the lack of knowledge regarding a compound's distribution kinetics at the virtual design stage it is important to consider the DQI and HLD from each of the five distribution kinetics scenarios. A way of visualising this information is to treat each pair of DQI and HLD values as a Cartesian coordinate within the DQI-HLD plane, to represent a vertex of a non-self-intersecting closed polygon. Fig. 4 shows such a plot for the C ss,central,max total levels for the 15 compounds considered, and a similar plot for the C ss,central,max free levels (further details can be found in the ESI †).
The polygons shown in Fig. 4 highlight the DQI-HLD space occupied by a compound. The extent of the area overlap of the polygons for two different compounds reflects their similarity. Importantly, at the virtual design stage it is not  possible to define precise values for a compound's DQI and HLD. Instead, it is possible to determine an area of DQI-HLD space, defined by a non-self-intersecting closed polygon, where the true value lies. Provided that the two polygons do not overlap, the compounds can be argued to have distinct DQI and HLD values.
This method can be expanded to incorporate errors in the estimation of the input data. A Gaussian distribution can be assumed, based on the original value and an estimation of the associated standard deviation. Additional simulations using randomly selected values from such a Gaussian distribution can then be considered. To illustrate, DQI and HLD were calculated for the 15 compounds, using 50 additional simulations to account for an error of 0.3 on a logarithmic base 10 scale for the values of V ss and Cl. To simplify the plot in Fig. 5, DQI-HLD polygons are shown for only six compounds: chlorpromazine (monobase), diazepam (neutral), ketoprofen (monoacid), nifedipine (neutral), trimethoprim (monobase) and warfarin (monoacid) (further details can be found in the ESI †). It is clear from Fig. 5 that the polygons are very irregular, but again, provided that two polygons do not overlap, the compounds can be argued to have distinct DQI and HLD values. This approach also applies when the input data are sourced from in silico QSAR models, where any prediction error is expressed as a standard deviation. 45 The areas covered by the different polygons vary, with that for chlorpromazine being the largest and that for warfarin the smallest. The calculated areas are summarised in Table 2, along with the polygons' second moment of area and centroid values in the DQI and HLD dimensions. The second moment of area indicates the sensitivity of the compound to errors in the input data with respect to the DQI and HLD dimensions; the larger the value, the greater the sensitivity. With respect to warfarin, the DQI value is more sensitive to the combined errors in the values of V ss and Cl than that for HLD; the reverse is true for prazosin. It is useful to think of the HLD dimension reflecting the ability of a compound to be absorbed into the body, while the DQI dimension reflects the extent of elimination of a compound from the body.
To simplify the information represented by the polygons in Fig. 5, it is proposed that the centroid coordinate could be used to standardise comparisons. Fig. 6 shows a DQI-HLD plot for the total level C ss,central,max centroid values for the 15 compounds considered.
Interpreting the DQI and HLD oral drug suitability parameters Fig. 6 shows how different compounds have vastly different tendencies in terms of absorption and elimination. Absorption is indicated by a compound's centroid value on the HLD dimension, and elimination by a compound's centroid value on the DQI dimension. Arguably, compounds with a higher DQI (i.e., lower elimination) are preferable, but a higher HLD can compensate for a lower DQI. Consider the two monobases prazosin and chlorpromazine: prazosin has a higher centroid DQI (total C ss,central,max ) of −2.30, compared with −2.93 for chlorpromazine, but a lower centroid HLD (total C ss,central,max ) of 0.39 compared with 2.61 for chlorpromazine. If the HLD value is used as the log 10 Ĳdose) in eqn (4), as shown in eqn (7) (where the slope of 1 is omitted): log 10 (max_quantity) = HLD + DQI (7) then the log 10 (total C ss,central,max ) for prazosin is −1.91 compared with −0.32 for chlorpromazine. It follows that better absorption can compensate for higher elimination.  These observations are well established, but evaluation of the properties of oral compounds (i.e., pK a , solubility pH7.4 , P app,Caco2,pH6.5 , V ss , Cl and, optionally, PPB) can be vastly simplified to a quantitative comparison within the two dimensions of DQI and HLD. The magnitude of a compound's DQI and HDI should be considered in conjunction with the desired log 10 Ĳquantity).

Proposed use of DQI and HLD within virtual drug design
It is envisaged that input data for the present PK model would be sourced from in silico QSAR models. 37,44 At the virtual design stage many potential compounds can be considered and it is proposed that their DQI-HLD polygon areas be calculated as described (including consideration of errors in the estimations of the input data). 45 The results of these calculations can be visualised as described to give insight into the polygon DQI-HLD space occupied by different compounds. These compounds can also be ranked by application of eqn (5) or (7). For a target log 10 Ĳmax_quantity) requirement, eqn (7) can be used to estimate the log 10 Ĳmax_quantity) for each compound; if this exceeds what is needed, a lower log 10 Ĳdose) will suffice. However, if the calculated log 10 Ĳmax_quantity) is less than what is needed, then such compounds are not viable as absorption limitations would prevent the use of a higher dose. Similarly, if compounds need to be ranked on the magnitude of a quantity at a particular dose, e.g., 50 mg, eqn (5) could be applied using a value of 50 mg for compounds with an HLD ≥1.70 (i.e., log 10 Ĳ50)) and an anti-log 10 ĲHLD) value for those compounds with an HLD <1.70.

DQI and in vivo efficacy
The DQI is a single factor that relates dose to a compound's in vivo (steady-state) quantity following repeat oral dosing in a model system representation of the body. It can be considered a measure of a compound's in vivo exposure.
Human in vivo efficacy (mg L −1 ) data for the 15 compounds considered have been sourced from Schulz et al. (further details can be found in the ESI †). 46 Comparison to predicted log P (ref. 13) indicates a very weak, non-significant linear relationship, but comparison to Cl shows a significant linear relationship (Fig. 7). This is not unexpected, as a compound's Cl will heavily influence in vivo levelswith respect to an intravenous one-compartment model, the steady-state concentration upon repeat dosing is, in theory, inversely proportional to Cl. 43 Fig. 8 shows two plots between the 15 compounds' human in vivo efficacy and their centroid DQI for (a) total level C ss,central,max and (b) total level AUC ss,central quantities (further details can be found in the ESI †). Both show significant linear relationships, such that those that displaying human in vivo efficacy at higher levels tend to have higher centroid DQI and vice versa. The statistics for both plots are slightly better than for the plot in Fig. 7. It can be argued that these centroid DQI values contain slightly more information than Cl for this small data set of 15 compounds. This can be understood from the perspective that oral absorption and drug distribution will have influence over in vivo levels in addition to that of Cl. It can be inferred from these plots that compounds with lower potency require a higher centroid DQI (total level C ss,central,max or AUC ss,central ). This can be understood from the perspective of a fixed dose, such that a less potent compound requires more of the oral dose to be in the central compartment at steady state to drive the therapeutic effect, whereas a more potent compound requires less of the oral dose to be in the central compartment to drive a similar effect.
Clearly, an appreciation of potency is a key factor in understanding the plots in Fig. 8. Still, the centroid DQI values for the total level C ss,central,max or AUC ss,central quantities are sufficient to explain approximately half of the information in the human in vivo efficacy values for the 15 compounds.  Relatedly, DQI and HLD values can be proposed as novel compound descriptors for use in conjunction with QSAR modelling methods (and other descriptors) to model other in vivo quantity endpoints. At the virtual design stage, such an approach would involve use of in silico predictions for input data for the model to derive DQI and HLD values, which could then be used as descriptors in further in silico models.

Additional considerations
This study uses solubility 7.4 data generated at room temperature, when 37°C would be more relevant. It is reasonable to assume that solubility 7.4 will increase with temperature in all cases, resulting in higher HLD values. Furthermore, variations in the magnitude of model parameters (e.g., V intestinal , V central , V terminal to V ss ratio, absorption window, dosing intervals, etc.) will lead to variations in DQI and HLD values. How-ever, the approach discussed here focuses on the evaluation of DQI and HLD for a set of compounds within a standardised PK model (with specific settings) and it is the relative, rather than absolute, values that matter. If absolute values are important, then correction factors are required. For example, if the in vivo quantity and corresponding dose for a representative set of compounds are known, DQI values could be determined using the method described in this work and eqn (6) applied to predict the doses based on the known in vivo quantity. Correction factors for the predicted dose can be derived from a linear regression equation between the known and predicted doses. A similar approach can be used to determine correction factors for an in vivo quantity.
Although this study focuses on determining DQI and HLD for the total and free level C ss,central,max and AUC ss,central , other in vivo quantities could also be assessed using this model. These include total and free levels for the average concentration (C ss,central,average ) and the minimum concentration (C ss,central,min ) at steady state in compartment B. In addition, the corresponding levels in compartment C (peripheral compartment) can be considered, including steady-state total and free levels for the maximum concentration (C ss,peripheral,max ), the average concentration (C ss,peripheral,average ), the minimum concentration (C ss,peripheral,min ) and the area under the curve AUC ss,peripheral .
Importantly, a twice-daily dosing scenario is considered; changing the dosing interval will lead to different DQI and HLD values for the same compound. Changing the simulation length can also affect these values, in particular if it is shortened. From a virtual drug design perspective, consideration of 14 repeat doses using a once-or twice-daily dosing scenario is recommended.
With respect to the use of 20 different dose simulations to define a dose-quantity curve, this can be reduced to two in theory. The DQI can be determined from a simulation using a very low dose (e.g., 0.000001 mg), where linear PK can be assumed. The HLD can be determined from a simulation using a very high dose (e.g., 10 000.0 mg), where compartment A can be assumed to be saturated throughout the absorption window and the corresponding log 10 Ĳquantity) is at its maximum (i.e., log 10 Ĳmax_quantity)). Relatedly, the standardised method used to determine the HLD only provides an approximationother methods can be used, including fitting the dose-quantity curve to a power function of the form: log 10 (quantity) = log 10 (max_quantity) − log 10 (1 + 10 HLD−log10(dose) ) ( The application of eqn (8) using a least-fit method benefits from the use of as many data points as possible to define the transition region of the dose-quantity curve from linear to non-linear PK.
The DQI and HLD calculations can be made in different mammalian species by adjusting the PK model settings accordingly and by using species-specific in silico predictions for V ss , Cl and PPB (and assuming the use of P app,Caco2,pH6.5 for other species).
Finally, an extension of this study, which is beyond the scope of the present work due to its size and complexity, would be to use in silico QSAR models for pK a , solubility pH7.4 , P app,Caco2,pH6.5 , V ss , Cl and (optionally) PPB to predict the input data for a larger set of compounds, calculate their DQI values, and assess how well they relate to in vivo efficacy and toxicity data.

Conclusions
Assessing the oral drug suitability of compounds at the virtual design stage is an important objective. This study describes a methodology that provides an alternative to the heuristic approaches that emphasise controlling a compound's physicochemical properties. The methodology attempts to simplify the evaluation of compounds based on their estimated in vivo quantity levels within a mammalian body. This simplification comes from the application of the compoundspecific DQI and HLD values for a particular in vivo quantity, calculated by assessing a series of PK model simulations. In essence, the PK model takes the form a series of rate equations that can estimate the varying exposure of a compound in different parts of the body over time and repeated oral dosing; for this study, the focus was on steady-state exposure levels. An open-source Python library 31 provides a mechanism to perform such complex calculations, also taking into consideration distribution kinetics variations and random (Gaussian) error in the values of the input data. Such functionality facilitates the use of this method at the virtual design stage, where in silico QSAR models can be used to provide the input data and the associated prediction errors factored into the calculation of a compound's DQI and HLD. The DQI parameter is a power term that relates an oral dose of a compound to its estimated in vivo quantities; for a given dose, a larger DQI value corresponds to a larger in vivo quantity. This is only relevant during linear PK conditions and the corresponding HLD value approximates the upper dose limit of applicability.
Application of this methodology to 15 known oral drugs demonstrates how different compounds that have vastly different tendencies in terms of absorption into, and elimination from, the body can be compared on the same scale. In spite of the computational complexities associated with gaining such insight, DQI and HLD provide a direct relationship between a compound's dose and in vivo exposure. Evaluation of a compound's oral drug suitability is simply dependent on the ability to match a compound's dose to the required in vivo exposure.

Conflicts of interest
In carrying out this work, the author has used computational tools supplied by InSilicoLynx Ltd, of which he is the sole company share owner. InSilicoLynx Ltd is a member of the RSC Enterprise Plus program.