Daniel
Yanes
a,
Vasiliki
Paraskevopoulou
b,
Heather
Mead
b,
James
Mann
b,
Magnus
Röding
cd,
Maryam
Parhizkar
e,
Cameron
Alexander
a,
Jamie
Twycross
*f and
Mischa
Zelzer
*a
aSchool of Pharmacy, University of Nottingham, University Park Campus, Nottingham, NG7 2RD, UK. E-mail: mischa.zelzer@nottingham.ac.uk
bGlobal Product Development, Pharmaceutical Technology & Development, Operations, AstraZeneca, Macclesfield, SK10 2NA, UK
cSustainable Innovation & Transformational Excellence, Pharmaceutical Technology & Development, Operations, AstraZeneca, Gothenburg, 43183 Mölndal, Sweden
dDepartment of Mathematical Sciences, Chalmers University of Technology, University of Gothenburg, 41296 Göteborg, Sweden
eSchool of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK
fSchool of Computer Science, University of Nottingham, Jubilee Campus, Wollaton Road, Nottingham, NG8 1BB, UK. E-mail: jamie.twycross@nottingham.ac.uk
First published on 15th September 2025
Liposomes are amongst the most promising and versatile nanomedicine products employed in recent years. In vitro release (IVR) tests are critical during development of new liposome-based products. The drug release characteristics of a formulation are affected by multiple factors related to the formulation itself and the IVR method used. While the effect of some of these parameters has been explored, their relative importance and contribution to the final drug release profile are not sufficiently understood to enable rational design choices. This prolongs the development and approval of new medicines. In this study, a machine learning workflow is developed which can be used to better understand patterns in liposome formulation properties, IVR methods, and the resulting drug release characteristics. A comprehensive database of liposome release profiles, including formulation properties, IVR method parameters, and drug release profiles is compiled from academic publications. A classification model is developed to predict the release profile type (kinetic class), with a significant increase in the balanced accuracy test score compared to a random baseline. The resulting machine learning approach enhances understanding of the complex liposome drug release dynamics and provides a predictive tool to accelerate the design of liposome IVR tests.
Drug release from liposomes is dictated by multiple factors. The drug release profile depends on (i) critical material attributes (CMAs) such as drug and excipient properties, (ii) critical quality attributes (CQAs) of the formulation such as particle size, zeta potential, and drug loading,4 and (iii) IVR test method parameters such as release medium temperature, medium pH, stirring speed, and release apparatus.5,6 The characteristics of IVR test methods currently used to assess drug release from nanomedicines have been extensively reviewed before,7–9 and the best fitting kinetic models were assessed for given drug release profiles of non-conventional dosage forms.10,11 Nonetheless, the interplay between the parameters determining the profile is not well understood. To date, no attempt has been made to quantitatively link formulation characteristics, IVR test method parameters and the drug release profile of liposome formulations.
Machine learning (ML) has been used in different fields such as sustainable chemical reaction design,12 nanomaterials characterisation,13 and materials science14 to accelerate processes and gain deeper insights into datasets and inform experiments. In pharmaceutical sciences, ML is now commonly used to establish connections between formulation and process parameters of drug delivery modalities to predict CQAs15,16 and biological performance.17 In addition, ML methods have been used to predict drug release from various formulations, including 3D printed tablets,18 long-acting polymeric injectables,19 and oral formulations with polysaccharide coatings.20 For lipid-based formulations, ML was applied to predict characteristics such as particle size,21–23 liposome formation,22 polydispersity index,21 drug loading,23 and encapsulation efficiency.23 Prediction of drug release profiles from liposome formulations based on formulation and IVR characteristics using ML approaches has not yet been reported.
Here, we have curated a database from the existing literature on drug release from liposome formulations. Further, we have developed an ML workflow that explores and links drug release profiles from these systems with the formulation and IVR method characteristics. We identify which of the included factors are important for predicting drug release from liposomes. We furthermore establish a benchmark classification score for predicting types of drug release behaviour. The ML workflow presented here provides a deeper understanding of the complex relationship between nanomedicine CQAs, drug type, IVR method parameters, and their connections to kinetic drug release parameters.
Drug release profiles were clustered using principal component analysis (PCA) and k-means clustering (KMC). Weibull model parameters with f2 scores below 50 and those displaying two release mechanisms (biphasic), which were not described sufficiently well by the Weibull model were excluded. To enable clustering of drug release profiles based on similar shapes, a simulated drug release dataset was generated using fitted Weibull parameters. As the original database contained release profiles with varying time scales, including seconds, minutes, hours, and days, the lack of standardisation of the time range of the investigation, would mean that a profile initially measured in seconds but simulated over 24 hours would automatically be classified as fast. Since hours was the most common unit, only profiles originally measured in hours were selected. The fitted Weibull parameters of the resultant profiles were used to simulate drug release data over 0–24 hours, resulting in a dataset with 500 time points as columns and rows representing individual simulated profiles. The dataset was standardised and reduced to a low-dimensional representation via PCA. The first two PC scores were clustered using KMC with the number of clusters (k) ranging from 2 to 8. The clustering quality was evaluated by assessing cohesion and separation for each k (SI, Section 2.8).
To evaluate the performance of each model, the target output (kinetic class) obtained using each model was compared to their true labels determined by PCA and KMC. The classification task was multi-class and imbalanced, therefore careful selection and interpretation of metrics compared to simpler binary tasks was required. Briefly, micro-averaged F1 score, balanced accuracy, and Matthews correlation coefficient (MCC) (SI, eqn S17, S19 and 24) were used to ensure evaluation of global and class-specific performance. To determine whether the performance between classifiers were statistically significant, pairwise comparisons were conducted using the Wilcoxon Signed-Rank test across cross-validation folds. Additional details of the rationale, equations, and implementation of the classification metrics and statistical tests are provided (SI, Sections 2.10, 2.11.1).
Shapley Additive exPlanations (SHAP) analysis27 was conducted on the XGB classifier trained on all available data. XGB classifier was chosen due to the ability to handle complex relationships and good predictive accuracy on small tabular datasets.28 SHAP values were calculated for every feature and each instance to quantify input contributions. Beeswarm plots were produced using the SHAP library (v0.44.1).
The optimal subset of feature inputs to predict the kinetic class was determined for the XGB classifier using backward elimination with cross-validation. Briefly, the classifier is trained on all 9 feature inputs, and the importance of each feature using the gain feature importance attribute of the XGB classifier model is determined. The least important features are pruned from the current set of features, the procedure is recursively repeated on the pruned set until 1 feature remains. The optimum subset of features was determined as the one yielding the maximum 5-fold cross-validated test score with minimum variance.
The optimum subset of features was used to assess the performance of each classifier on unseen (test) data. The cleaned dataset (n = 77) was split into training and test sets using stratified 5-fold cross validation, to ensure a similar class distribution in each fold. Due to 5 not being a divisor of 77, the training and test sets consisted of 61–62 and 15–16 samples respectively. On each fold, the balanced accuracy, MCC, and F1 score were calculated and the mean score and standard deviation across the 5 folds were calculated.
Permutation testing was conducted to evaluate whether the average cross-validated balanced accuracy score of the best performing classifier significantly exceeded that of a three-class random classifier. Target labels were permuted 1000 times while keeping all input features unchanged. For each permutation, the balanced accuracy was calculated. An empirical p-value was then calculated to estimate the probability of obtaining the observed average balanced accuracy score by chance. This calculation used the number of permutations with scores greater than or equal to the average cross-validated test score obtained by the best performing classifier (C), given the number of permutations (S) (eqn (1)). The choice of number of permutations was validated by assessing the empirical p-value as a function of the number of permutations (Fig. S9). Details of implementations and rationale are provided in the SI, Section 2.11.
![]() | (1) |
The first step was to assemble a dataset suitable for the classification task of predicting release kinetics from liposomes. To achieve this, a structured query language (SQL) database was compiled using data extracted from 34 academic publications. The database included 141 distinct formulations, 22 active compounds, and 31 excipients (lipids), culminating in a total of 271 IVR tests. Each IVR test featured unique formulations with varying numbers and types of excipients, along with different characterisation data, compounds, and experimental IVR testing conditions. The SQL database provided a well-defined structure, making it advantageous for organising such complex and variable data. As high throughput techniques become increasingly prevalent,29–31 our database, compared to storing tabular data in Excel, ensures the efficient, scalable, and secure management of large datasets generated across multiple formulations, testing conditions, and drug compounds.
To establish connections between the four distinct groups of variables (Fig. 1) stored in the database and drug release profiles, nine features were selected (SI, Table S3). By organising the variables into the groups outlined (Fig. 1) and using the database schema, this approach can be adapted to other drug delivery modalities. For instance, the lipid properties group could be replaced with polymer properties to use the workflow for connecting drug release for nanoparticle, dendrimer or hydrogel systems.
Liposomes are formulated with a variable number of lipids, at different lipid ratios of different lipid types. Hence, for convenience, the lipid-based features were reduced to a single scalar value per formulation. Here, the molar-weighted excipient molecular weight (Mw) and weighted lipid phase transition temperature (Tp) were selected, as Tp links lipid structure and size to drug release behaviour.32 Since no standardised methods exist for IVR testing of liposome products,7 the impact of user-selected apparatus, and release medium conditions was also explored. This comprehensive feature selection aimed to identify the most important features for predicting liposome drug release.
To establish relationships between the assembled feature input vectors and the drug release profiles, we described drug release profiles by parameterising the entire release profile over time. This is an alternative approach to reducing release profiles to discrete points such as T20, T50, and T80 (times to reach 20%, 50%, and 80% drug release), commonly used in ML-based drug release prediction tasks.18 To our knowledge, this represents a novel approach to drug release prediction which involves automated batch parameter estimation of drug release profiles, unsupervised learning to group similar profiles, and feature importance analysis to identify drivers of types of drug release behaviour. By fitting kinetic models to release profiles, this method captures the complete release kinetics, providing deeper insights into the drug release process and more flexibility for data analysis.
Software tools for drug release curve fitting are often limited to using one model at a time, for example by using the Excel plugin DDSOLVER.33 This approach is not suitable for assessing multiple models across numerous drug release profiles. To overcome this, a custom in-house Python tool was developed for batch-wise parameter estimation of drug release profile models. The 209 drug release profiles were subjected to screening using the tool, where six commonly employed kinetic models (SI, Section 2.6) that are used to describe drug release profiles were selected. These are denoted zero order, first order, Higuchi, Korsmeyer–Peppas, Weibull, and Reciprocal.
Based on values of mean relative root mean squared error (RRMSE) and Akaike information criterion (AIC) for each drug release profile, the Reciprocal model demonstrated the lowest average RRMSE (0.08) and AIC (30.20) (SI, Fig. S1), indicating a superior fit compared to other models. To further evaluate its performance, a Kolmogorov–Smirnov test was conducted to assess whether the absolute error distribution of the Reciprocal model was smaller than that of the Weibull model. The test yielded a p-value of 0.77, indicating no significant difference between the two models' error distributions (SI, Fig. S2 and Table S4). These findings align with previous studies comparing kinetic models for liposome systems, where both the Reciprocal and Weibull models provided acceptable fits. However, the Reciprocal model showed better fitting for thermosensitive liposomes, particularly those smaller than 100 nm.11
To define an accuracy threshold for parameters yielding sufficiently similar simulated and experimental drug release profiles, the similarity factor (f2) was used with a cutoff value of 50, based on FDA guidelines, where higher values indicate greater similarity.34 While the Reciprocal model achieved the highest average f2 score, the Weibull model covered a greater proportion of profiles with f2 greater than 50 (Fig. 2). Based on our need for greater data availability and the advantage of a more generalisable applicability of the chosen model to diverse datasets, the Weibull model was selected.
![]() | ||
Fig. 2 f 2 score of each drug release profile (datapoint) fitted to six kinetic models to describe drug release. Models ordered based on increasing average f2 score (bold). |
Plotting each drug release profile together with the fitted Weibull model (SI, Fig. S3 and S4) visually demonstrated that the Weibull model fits the data well in most cases. Examining profiles near the cutoff range (f2 scores 45–55) shows that the Weibull model failed to describe biphasic release patterns (SI, Fig. S3). As a result, profiles with f2 values of less than 50 and biphasic release were excluded, leaving 197 profiles where the Weibull model provided acceptable similarity to the original release profiles. With current software tools, this analysis would have required 1254 individual Excel files. Our Python tool enabled the analysis to be conducted in batch, making it suitable for large volumes of drug release data and thus more applicable to emerging trends in the field.
Using the fitted Weibull parameters (α, β) of these profiles, a simulated drug release dataset was generated, capturing a variety of release behaviour (Fig. 3a). Principal component analysis (PCA) was used to transform the high-dimensional release profiles into a lower-dimensional space while retaining 95% of the variance in the first two PCs (SI, Fig. S5). The resulting PCA score plot represented each drug release profile as a point (Fig. 3b). Profiles represented by points closer together were considered to have similar underlying data structures, i.e., similar drug release behaviours. To group the PC scores of the first two dimensions into a set of distinct clusters, where data points in each cluster would share common data structures, k-means clustering was used (Fig. 3b). A range of k values were explored to minimise the within sum of squares (WSS) variation and maximise the silhouette coefficient (si), representing cluster compactness and separation, respectively (SI, Fig. S6). The number of clusters was set to 3, reflecting a reasonably minimal WSS (SI, Fig. S7) and maximum average si across each cluster (SI, Fig. S8). Based on the k-means cluster assignment, each drug release plot was assigned into a respective cluster, which was used to simulate the drug release behaviour within a given cluster (Fig. 3c). This revealed that each cluster assignment contained distinct types of drug release behaviour. Specifically, the slow release was less than 50% release, medium was characterised by gradual release and fast was characterised as burst release. Overall, the PCA-KMC process facilitated the assignment of target outputs to develop a multi-class classification model via machine-informed classification of drug release profiles, eliminating the need for manual annotation of the 169 profiles. While this approach has proven successful for the present dataset, it must be noted that the generalisability of the approach has yet to be established and translation of the approach to different kinetic models would have to be verified.
Here, for the 9-feature XGB model, Shapley Additive exPlanations (SHAP) analysis was used to comprehend the model's decisions and provide insight into the contributions of each feature to the prediction.35 SHAP values represent contributions of each feature to the model output,27i.e., the kinetic class (slow, medium, and fast) in our case. The SHAP analysis will thus provide an indication of how much a feature contributes to the prediction output, i.e., one of the drug release classes: slow, medium or fast. The magnitude of the SHAP value gives indication of the importance of a feature on making a prediction, were positive SHAP values show features pushing model output towards a prediction.
To identify whether higher or lower values of specific feature(s) increase or decrease the prediction of release kinetics, a bee-swarm plot was used to visualise SHAP values for slow and fast release predictions (Fig. 5a and b). Slow and fast release predictions were selected as extremities, but the same analysis could have been conducted for medium release. For example, lower media temperature and Tp made positive contributions to the slow-release class. Bearing in mind the lipid bilayer structure of liposomes, it follows that lowering the media temperature reduces the lipid bilayer membrane permeability, thus causing a reduction in drug efflux. The lower Tp driving slow release is linked to the formulation composition and can therefore be adjusted by the choice of lipid type and by varying the ratio of different phospholipids.36
Other noticeable positive contributions to slow-release classification were made by higher media pH values and higher excipients Mw. The compound in the slow-release class was predominantly doxorubicin (14/25) which has a pKa of 8.2. In contrast to the fast release scenario, where low pH values are expected to lead to increased ionisation and hence increased solubility of doxorubicin, slow-release kinetics would be observed at higher media pH values where doxorubicin would remain non-ionised. Unlike media pH, which is well known to be of importance for drug release, the importance of Mw of excipients used in the formulations has not been reported. This could be a potential avenue for future exploration, for example selecting higher Mw excipients to make the formulation such as 1,2-distearoyl-sn-glycero-3-phosphoethanolamine-poly(ethylene glycol) (DSPE-PEG-10k).
Across slow and fast release, the lamellarity of the vesicle (structure type) and user choice of IVR apparatus had the least influence on the prediction (Fig. 5a and b) of kinetic class. This is expected, because the method developed is properly validated to separate free drug from the nanocarrier and hence, the user choice of apparatus used for the IVR test should not influence release kinetics. This is shown by a comparison of ultracentrifugation and continuous flow sampling methods for polymeric nanoparticles,6 this suggests the same finding extends to liposome systems.
It is worth noting that categorical features such as release method, structure type, and API type were label encoded, removing the original meaning. It must be noted that the drawback of this method is that the selection of integers that are assigned to each categorical variable could influence the SHAP analysis and other subsequent analysis. This could be mitigated by one-hot encoding, which would preserve the original feature input by creating separate binary columns for each category type, e.g., doxorubicin: yes/no; however, the drawback of this approach is an increase in the dimensionality of the dataset. As we aimed to minimise the complexity of the dataset, we decided against implementing one-hot encoding in this work.
Overall, the SHAP analysis was used to identify and rank features impacting types of drug release from liposomes. It is clear, and hardly surprising to experimentalists, that IVR conditions (temperature, pH) play a key role in determining drug release kinetics. As IVR tests are typically done at physiological conditions (37 °C), the value of this analysis lies in the insight it provides on the effect of formulation properties. The choice of lipid, and the Mw of excipients are key drivers to obtain slow-release lipid based formulations for the drugs included in this study (SI, Table S5). It should be noted that the SHAP analysis provides insights for the compiled and cleaned dataset fitted to a XGB model, it remains to be seen if these insights can be generalised to all liposomes and different models. Additionally, a feature identified as being important for a prediction does not always equal a causal relationship.
To examine how well the XGB classifier model with default hyperparameters would perform on unseen data, stratified five-fold cross validation was used to assess the ability of the model to generalise and gauge the range of prediction error. To achieve this, the data was split 80:
20 into training and testing sets respectively. Each of the 5-folds were stratified such that the class distribution in each set was as similar as possible.
Prior to this, it was necessary to determine which combination and number of input features led to the maximum cross validated balanced accuracy test score with lowest variance. To do so, backward feature elimination was used. This showed that a selection of 7 features was optimal for maximising balanced accuracy whilst minimising variance (Fig. 6a). Addition of features relating to the testing apparatus used and structure type decreased the mean balanced accuracy score, which aligns with the SHAP analysis (Fig. 5). Variations in trends of number of features versus accuracy such as those observed here have been reported previously for backward feature elimination.38–40
The release_method and structure_type features were identified as least influential, aligning with the SHAP analysis. These two parameters were subsequently excluded, resulting in 7 features that were used as inputs to screen the performance of the 7 classifiers. The screening showed that the XGB classifier model had the highest balanced accuracy score with the lowest variance (0.70 ± 0.03) (Fig. 6c, ESI Tables S7 and S8). To assess whether the XGB model significantly outperformed the other classifiers, across all metrics, a pairwise Wilcoxon Signed-Rank test was applied to the individual fold results per classifier (SI Tables S9 and S10). The analysis revealed no statistically significant differences in evaluation metrics between models, suggesting no single classifier consistently outperformed the others. Although there is no statistical difference in model performance, we suggest to use XGB for this dataset because of the XGB classifier shows the lowest variance, suggesting that it might result in a more reliable and stable performance.
The cross validated training and testing scores for the XGB model demonstrated a large difference, indicative of overfitting, which could be prevented by regularisation via tuning model hyperparameters. Nonetheless, the XGB model had the highest cross validated balanced accuracy score of 0.70 ± 0.03 (n = 5) with the lowest variance, reflecting 10–11 correct predictions out of 15 unseen examples. The classifier score obtained is reasonable considering that, as the number of classes to predict increases, it becomes harder to achieve higher accuracy scores.41
To assess whether the accuracy score on unseen data improved beyond random predictions of slow, medium or fast release kinetics, the classification labels were permuted 1000 times to remove dependencies between features and targets. The resulting permutation scores, representing the null distribution, showed an average balanced accuracy of approximately 0.33, as expected for a three-class classification task. The XGB model mean cross validated scores on the original data, indicated by the black dotted line in Fig. 6b, demonstrated significant improvement over the permuted data (p-value = 0.001). This suggests that the observed balanced accuracy is unlikely a result of chance, confirming dependencies between features and target outputs.42
These results establish the model's ability to outperform random guessing and sets a benchmark for classifying liposome release kinetics, which can make liposome IVR test design more rational. With the recently proposed DELIVER framework43 which aims to reduce nanomedicine development timelines and clinical trial failures, this computational workflow could integrate IVR and formulation data generated during lead discovery and optimisation. The DELIVER framework has identified experimental testing conditions that lead to uncontrolled drug release as a key risk blocking nanomedicine development timelines. The present approach may have the potential to de-risk preclinical evaluation by providing greater understanding of the drug release data generated in a post-hoc analysis, to help identify formulation(s) and/or experimental testing conditions that may lead to uncontrolled drug release. In the future, it would be of benefit to have real-time risk quantification of conditions leading to uncontrolled release.
By using SHAP analysis on a dataset of liposome drug release, the user choice of IVR testing conditions has greater influence on predicting release kinetics than drug type, lipid composition, and formulation characteristics. Furthermore, by using widely available input parameters, an XGB classifier model is identified as a stable model to predict the class of drug release profile (slow, medium or fast). Compared to a random baseline classifier, there is a significant improvement in balanced accuracy score from 0.33 to 0.70 ± 0.03, indicating that the model has learnt a real relationship between feature inputs and target outputs. This establishes a baseline performance for the classification of release kinetics from nanomedicines.
An ML tool such as the one presented here could be applied for digital screening of experimental conditions and/or formulation properties to reduce material expenditure. With additional data and a larger dataset size, it is anticipated that the classifier accuracy would increase. Experimentalists could expand the dataset and this also highlights the benefit of adopting FAIR principles of scientific data management for published articles to the community.44 Using existing, accessible data, this work maps the workflow for handling IVR and formulation data and provides an approach to de-risk experiments in often time critical pharmaceutical development activities. Overall, this research takes a step towards the accelerated development of non-oral dosage form IVR tests. This workflow could be adapted to other drug delivery modalities and demonstrates the potential benefit of adopting ML in pharmaceutical development to reduce experimental burden and streamline the transition of nanomedicine products into clinic.
Supplementary information containing methodologies for data curation, database construction and model evaluation as well as additional results on model outputs and performance is available. See DOI: https://doi.org/10.1039/d5dd00112a.
This journal is © The Royal Society of Chemistry 2025 |