Open Access Article
Yiming
Ma
a,
Shang
Gao
a,
Neel
Mehta
a,
Qinqing
Fu
b,
Wei
Li
a and
Brahim
Benyahia
*a
aDepartment of Chemical Engineering, Loughborough University, Leicestershire, LE113TU, UK. E-mail: b.benyahia@lboro.ac.uk
bSchool of Design and Creative Arts, Loughborough University, Leicestershire, LE113TU, UK
First published on 19th September 2025
Solvent selection in pharmaceutical crystallization plays a pivotal role in determining overall manufacturing efficiency while also significantly impacting environmental performance and regulatory compliance. A data-driven solution for sustainable solvent selection, applicable to both single and binary solvent systems, was developed and integrated into SolECOs (Solution ECOsystems), a modular and user-friendly platform for Sustainable-by-Design solvent selection in pharmaceutical manufacturing. A comprehensive solubility database containing 1186 active pharmaceutical ingredients (APIs) and 30 solvents was constructed and used in conjunction with thermodynamically informed machine learning models, including the Polynomial Regression Model-based Multi-Task Learning Network (PRMMT), the Point-Adjusted Prediction Network (PAPN), and the Modified Jouyban–Acree-based Neural Network (MJANN), to predict solubility profiles along with associated uncertainties. Sustainability assessment was performed using both midpoint and endpoint life cycle impact indicators (ReCiPe 2016) and industrial benchmarks such as the GSK sustainable solvent framework, enabling a multidimensional ranking of solvent candidates. Experimentally validated case studies involving APIs such as paracetamol, meloxicam, piroxicam, and cytarabine confirmed the approach's robustness, adaptability to various crystallization conditions, and effectiveness in supporting single and binary solvent screening and design.
Green foundation1. This work advances green chemistry by introducing SolECOs, a sustainable-by-design digital platform for solvent and solvent mixture selection, integrating predictive modelling and comprehensive sustainability assessment to support greener pharmaceutical manufacturing.2. SolECOs predicts optimal single or binary solvents for 1186 APIs using a database of 30 3. Greener performance could be achieved by expanding the database to include more bio-based solvents, adding renewable feedstock pathways in LCA, and integrating real-time process data for adaptive, in-process solvent design. |
On average, it takes approximately 12.5 years and up to £1.15 billion to bring a new drug to market.15 While many factors contribute to this painstaking and costly process, inefficiencies in crystallization solvent selection remain a persistent bottleneck, particularly in unit operations such as API synthesis, crystallization, liquid–liquid extraction, wash-filtration, drying, and granulation.16 Despite decades of accumulated experience, solvent selection in crystallization continues to rely heavily on empirical rules and trial-and-error strategies.17–19 These approaches are time-consuming, resource-intensive, and heavily reliant on expert judgment, which collectively limits efficiency and scalability in process development.20,21
Driven by these challenges, solvent selection is gradually transitioning from traditional empiricism to data-driven intelligent screening and machine learning (ML)-assisted design approaches.22–26 Technically, the objective of solvent selection aligns closely with solubility prediction, an area that has seen substantial progress in recent years.27–33 However, based on accurate characterization of solubility behavior, a critical differentiating step lies in effectively linking API dissolution behavior in a given solvent with its environmental footprints under variable real-world production conditions.
Solvent selection approaches are developed to meet various single or multi-objective targets, such as maximizing product yield, controlling crystal polymorphism, and enhancing solvent sustainability.34,35 From an industrial standpoint, a key and often unavoidable goal is to reduce environmental impact while still achieving the desired product yield. Computer-Aided Molecular Design (CAMD) serves as a systematic approach to identify crystallization solvents.36–41 Karunanithi et al.42 developed a framework combining CAMD, database screening, and experiments, with attention to crystal morphology. Wang and Lakerveld43 presented a systematic approach for the simultaneous optimization of process conditions and solvent selection for continuous crystallization including solvent recycling. Chai et al.44 introduced the Grand Product Design (GPD) model, incorporating technical, economic, and regulatory factors. Liu et al.45 proposed an ML-integrated CAMD approach focused on solvent recovery. Watson et al.46 designed a CAMD-based method for optimal solvent blend selection in pharmaceutical crystallization, capable of simultaneously determining ideal process temperature, solvent and anti-solvent species, and their compositions.
To improve practical applicability, efforts have focused on user-friendly tools that integrate process needs, solvent properties, and environmental constraints.47–51 Larsen et al.52 developed a green solvent selection tool for printed electronics, organizing a wide range of solvents based on Hansen solubility parameters and sustainability indicators. Similarly, an interactive tool has been developed to support solvent selection by incorporating chemical functionality, physical properties, regulatory considerations, and Safety, Health, and Environmental (SHE) impacts.53
Despite the emergence of various solvent design and selection frameworks in recent years, significant limitations remain. Firstly, the implementation complexity of many methods and models hinders their broader adoption. While computational methods demonstrate strong performance in specific case studies, they typically rely on intricate parameter settings and assumptions, making it difficult to generalize or directly apply the results in real-world scenarios. Without substantial expertise, users may struggle to navigate these tools effectively, thereby diminishing the cost and efficiency advantages of non-experimental approaches. Secondly, the “optimal” solvents identified by computational approaches sometime lack practical feasibility. These designed solvents may face challenges in industrial adoption due to high synthesis costs, limited commercial availability, supply chain constraints, or issues related to transportation and storage. Thirdly, existing methods often lack flexibility and consistency, particularly in sustainability assessment. Current industrial practices rely on diverse and sometimes inconsistent sustainability indicators, each emphasizing different aspects – such as carbon footprint, toxicity, biodegradability, or energy consumption during production. The absence of a unified evaluation framework makes it difficult to comprehensively assess and compare the environmental impacts of solvents or solvent mixtures.
To address these limitations, this study sets out three key objectives. First, to improve usability, we develop a computationally efficient and user-oriented platform that enables solvent selection without requiring advanced modeling expertise or high-performance computing. Second, to enhance practical relevance, the framework focuses on commonly used solvents and their binary combinations, avoiding hypothetical or industrially inaccessible candidates. Third, to accommodate diverse sustainability criteria, the methodology incorporates multiple assessment schemes, allowing engineers and experimentalists to select evaluation criteria aligned with specific environmental, health, or regulatory frameworks.
Hybrid modeling approaches integrating ML and theoretical methods were developed. A Polynomial Regression Model-based Multi-Task Learning Network (PRMMT) was designed with multiple shared layers to accommodate different design requirements. The Point-Adjusted Prediction Network (PAPN) was developed for solubility prediction at specific temperatures, while the Modified Jouyban–Acree Model-based Neural Network (MJANN) was tailored to handle the complexities inherent to the design of binary solvent systems.
To enhance reliability, discrepancies between predicted and actual solubility values in the validation set were quantified and mapped to optimal probability distributions of prediction residuals. By preserving probability variations across different distribution values, a robust solvent selection framework was established, ensuring reliable solvent recommendations. The entire workflow has been integrated into the user-friendly SolECOs platform, providing an efficient tool for solvent screening and selection (Fig. 1).
To facilitate model development and validation, the entire dataset was divided into three independent subsets, each serving a specific purpose. Approximately 70% of the data was allocated to the training set for model development, while 30% was used as a validation set to fine-tune model performance. Additionally, a separate test set, consisting of data from 20 independent APIs, was reserved for final model evaluation.
The environmental impact of solvents was quantitatively evaluated using SimaPro 9.5 and the ReCiPe 2016 v1.1 method, based on the Ecoinvent 3 database, in accordance with ISO 14040-14043 standards.56 Both midpoint and endpoint indicators were considered to provide a comprehensive evaluation of environmental impact (Fig. S1). The midpoint approach enabled a detailed examination of each solvent's impact across different environmental categories, while the endpoint approach focused on the overall long-term environmental consequences. In addition to the sustainability indicators provided by the methodology, a weighted summation of impact factors (eqn (1)) was also considered, where higher a Sustainability Throughput Index (STI) values indicated a greater negative environmental impact.
To further strengthen the sustainability assessment, the platform also incorporated the solvent evaluation framework proposed by the regularly updated GSK Solvent Sustainability Guide.57,58 This method categorizes solvents into ten distinct subcategories, which are subsequently aggregated into four major sustainability category scores and ultimately synthesized into a composite sustainability score (G), as described in eqn (2)–(5). All scores range from 1 to 10, where a low score indicates poor sustainability, while a high score reflects favorable environmental performance. Overall, the platform offers 23 different sustainability indicators for users to choose from.
![]() | (1) |
![]() | (2) |
![]() | (3) |
![]() | (4) |
![]() | (5) |
![]() | (6) |
![]() | (7) |
The Jouyban–Acree (JA) model is widely utilized to correlate the solubility of solutes with both temperature and the initial composition of binary solvent mixtures. This model effectively captures the dependence of solution behavior on solvent composition and temperature in multi-solvent systems. One of the key advantages of the JA model is its simplified three-parameter structure, which significantly enhances computational efficiency and makes it well-suited for integration with ML frameworks.60–62 The general form of the JA model is presented in eqn (8).
![]() | (8) |
A Point-Adjusted Prediction Network (PAPN) and the Modified Jouyban–Acree-based Neural Network (MJANN) were also developed to predict the solubility of APIs in solvents at a single temperature point, as well as their solubility in binary mixed solvents. These models follow the same framework as previous studies.60,63 The inputs and outputs of the models are presented in Tables 1 and S2, and the procedure for residual distribution fitting and probability estimation is detailed in the SI.
| Model name | Polynomial regression model-based multi-task learning network (PRMMT) | Point-adjusted prediction network (PAPN) | Modified Jouyban–Acree-based neural network (MJANN) |
|---|---|---|---|
| Input | Representative API molecular descriptors | Representative API and solvent molecular descriptors | Representative API and solvent molecular descriptors, interaction between solvents, pure solvent solubility values |
| Output | PR model parameters | Solubility of API in temperature T | JA model parameters |
The predictive performance of the model was evaluated using multiple statistical metrics, including MAE, Root Mean Squared Error (RMSE), Root Mean Squared Log Error (RMSLE), and the coefficient of determination (R2). Given the variability in scale across thermodynamic and empirical parameters and solubility values, there is a risk that solvents with lower solubility might be underestimated by the model, potentially leading to biased exclusion in decision-making. To address this, RMLSE was adopted as a key performance metric, as it penalizes underprediction more strongly than conventional metrics. Further details on the experimental procedures, computational methodologies, and evaluation metrics can be found in the SI.
The density and size of the circles in Fig. 2(a) represent the frequency of solubility data points across different solvent-temperature combinations. A noticeable clustering of data is observed in the solubility range of 10−6 to 101 mole percent and within the temperature range of 0–50 °C, indicating that most data falls within these conditions. Although some data points exist at temperatures above 100 °C, predominantly in the 1–10 mole percent solubility range, statistical analysis reveals that these cases account for less than 2% of the total dataset.
Compared to aqueous solubility data, solubility data measured in organic solvents are relatively limited (Fig. 2b). The 30 solvent-specific tasks defined in the PRMMT model align with the most frequently occurring solvents in Fig. 2b (see Table S1 for details). Analyzing the logarithmic solubility values (log
S) of the collected data indicates that water and ethanol are the most extensively represented solvents, collectively accounting for over 30% of the total dataset. In contrast, solvents such as propyl acetate and 1,2-xylene appear far less frequent, contributing to less than 5% of the dataset.
The solubility data in water exhibit a relatively narrow distribution, primarily falling within the log
S range of −4 to 4 (in mole percent). Conversely, solvents like ethanol and Propan-2-one show a broader solubility distribution, suggesting that solubility variations are more pronounced across different solutes. Importantly, even for solvents with lower data availability, the dataset does not exhibit an overly concentrated distribution, maintaining a relatively diverse range of solubility values. This diverse solubility distribution underscores the effectiveness and representativeness of the dataset.
Ideally, for model training, a uniform distribution of log
S values across the entire dataset would be preferred. However, due to practical limitations in data availability, no log
S-based pruning was applied to the organic solvent dataset. This ensures that the dataset retains its inherent diversity, which is crucial for robust model performance and generalizability.
For single-solvent solubility prediction, since each solvent prediction task was assigned to independent parallel tasks, explicit solvent descriptors were not required. Instead, the selected descriptors needed to comprehensively capture API molecular characteristics. To determine the most relevant descriptors, random forest modeling combined with Monte Carlo sensitivity analysis and an independent random forest approach were applied to assess descriptor importance. Tables S3 and S4 list the top 25 molecular descriptors, along with their definitions. Their importance rankings, after Unit Vector Normalization, are visualized in Fig. 3. While some variations in ranking exist, most high-ranking descriptors exhibit consistent trends.
![]() | ||
| Fig. 3 Statistical analysis of descriptor importance values determined by combined random forest model and monte carlo vs. random forest model. | ||
Key descriptors include GCUT_SLOGP, which incorporates both structural features (via graph cut) and hydrophobicity (via log
P), descriptors related to the heat of formation of the compound, distance and adjacency matrices of heavy atoms, descriptors describing mass distribution relative to the molecular center of mass, and those characterizing molecular flexibility. To minimize redundancy, a representative heat of formation descriptor was selected, and a Pearson correlation analysis (Fig. S2) was performed to ensure descriptor independence.
For binary solvent systems, additional considerations were made for solvent importance and solvent–solvent interactions. Key solvent descriptors included molecular weight, AM1_dipole, and ASA (accessible surface area). The AM1_dipole represents the dipole moment calculated using the AM1 Hamiltonian, while ASA quantifies the solvent-accessible surface area.
![]() | ||
| Fig. 4 The environmental impact of solvents analyzed using: (a) the ReCiPe midpoint method, (b) the ReCiPe endpoint method, and (c) the GSK solvent sustainability guide. | ||
Under midpoint indicators, solvents such as pyridine, propan-1-ol, and oxolane exhibit significant environmental burdens across multiple impact categories, including global warming potential, marine and freshwater ecotoxicity, and ozone layer depletion. Furthermore, propan-1-ol and oxolane demonstrate notable effects in human health-related categories, particularly in carcinogenic and non-carcinogenic toxicity. These solvents not only pose potential risks to workers and end-users throughout their life cycle but also contribute to long-term environmental degradation due to waste emissions that impact ecosystems. In contrast, water and solvents such as toluene, and 1,2-xylene which exhibit relatively lower impact values across most categories, may be more environmentally sustainable options for API purification and production (Fig. 4a).
The endpoint indicators integrate the midpoint assessment results, providing a more comprehensive evaluation of the overall environmental impact (Fig. 4b). The endpoint analysis reveals that propan-1-ol, acetonitrile, pyridine, and N-methyl-2-pyrrolidone exhibit the most pronounced environmental impacts across multiple categories, particularly concerning human health and ecosystem damage. In contrast, solvents such as water, heptane, hexane, and ethanol demonstrate relatively lower overall environmental impact values, especially in resource depletion categories, indicating potential advantages in environmental sustainability.
A comparison between the ReCiPe method and the GSK method (Table S1 and Fig. 4c) reveals both similarities and discrepancies in solvent rankings. These differences primarily arise from variations in evaluation frameworks, numerical processing methodologies, and data sources. Although an attempt was made to establish a correspondence between the GSK classification and the Midpoint indicators in Fig. S1, complete alignment remains challenging due to fundamental differences in category definitions. Moreover, numerical processing methodologies differ between the two approaches. In Fig. 4a and b, the ReCiPe method assigns equal weighting to all Midpoint and Endpoint indicators, followed by a direct summation of impact scores, whereas the GSK method applies a square-root transformation (eqn (6)) to normalize variations across subcategories. Differences in data sources also contribute to the observed ranking discrepancies. The GSK Solvent Sustainability Guide is based on industry-specific data accumulated within GSK, using a simplified scoring system tailored to manufacturing operations, whereas the ReCiPe method provides a broader environmental perspective but remains susceptible to regional policy influences and assumptions embedded in its methodological framework.
It is important to recognize that no single green assessment method can fully address the inherent challenges of quantifying qualitative sustainability attributes, and the prioritization of solvent selection criteria may vary depending on the specific application context. This study aims to establish a multifaceted evaluation platform as a complementary approach to well-established sustainability guidelines that are widely recognized and trusted by users.
| Average evaluation metrics | PRMMT model | PAPN model | MJANN model |
|---|---|---|---|
| MAE | 0.584 | 0.472 | 0.994 |
| RMSE | 0.963 | 0.821 | 1.391 |
| RMSLE | 0.268 | 0.380 | 0.351 |
To further evaluate prediction reliability, the differences between predicted and real values were analyzed to determine the error distribution shown in Fig. 5. The probability value (p-value) was used as an indicator of the confidence in the accuracy of the model's description. The error distribution fit for all tasks within the PRMMT model resulted in p-values predominantly concentrated between 0.8 and 1, with an average exceeding 0.6, indicating a high degree of accuracy in describing prediction errors. The t-distribution was observed most frequently, suggesting that the statistical treatment of errors places greater emphasis on the tail regions, allowing for a more flexible and conservative estimation by accommodating variations in the degrees of freedom across different tasks. By mapping the error distribution to specific tasks, it is possible to determine the probability distribution of the predicted values within ±x intervals. Theoretically, restricting the range of output parameters could reduce the prediction uncertainty. However, the objective of this study is to establish a predictive framework that provides a broader range of possibilities rather than aiming for extreme precision. This aligns with the principle in pharmaceutical solvent selection, where R&D departments aim to avoid overlooking potential solvents or solvent combinations. Consequently, the PRMMT model outputs three parameters, accompanied by error distributions incorporating t-distribution, Cauchy distribution, Beta distribution, log–normal distribution, and logistic distribution. Similar error distributions are established for the PAPN and MJANN models. However, to optimize computational efficiency, only the uncertainty range from the PRMMT model is considered in subsequent applications.
To enhance robustness, a predictive framework was developed by integrating the three models. Given the PAPN model's superior accuracy in single-point temperature predictions, the framework prioritizes its predictions. The PRMMT model serves as the foundation, providing initial predictions along with corresponding uncertainty estimates. The PAPN model is then used to refine the predictions at specific temperature points, acting as correction anchors. A tolerance value (Tv) is introduced to ensure that the solubility curve predicted by the PRMMT model falls within the confidence interval of the PAPN model's single-temperature predictions. Tv represents a user-defined error margin, which can be adjusted based on the prediction confidence of the PAPN models. For example, if PAPN predictions are considered highly reliable, a lower T value can be set to enforce stricter constraints. Alternatively, an approximate predictive error of 10% (Tv = 0.1) can be used as a default tolerance for correction in the platform. The influence of different T values on predictive performance is further explored in case studies. The computational precision (step size) is defined by the number of Monte Carlo simulation samples, with 106 samples chosen to balance accuracy and computational efficiency.
For binary solvent mixtures, the PAPN-corrected single-point predictions serve as curve endpoints in the MJANN model. In real-world applications, solubility values may vary depending on measurement methodologies. This study accounts for this variability by offering users the flexibility to manually define correction points, Tv, and error distributions. In this scenario, correction points can be derived from experimental data, and the predictive error distribution is replaced by actual experimental error.
Based on the selected 30 pure solvents, a theoretical total of 435 binary solvent combinations is possible. However, due to partial or complete immiscibility of certain solvent pairs at specific temperatures, some binary mixtures were excluded from this study, and the final selection of binary solvent systems is provided in Fig. S4. The computational step size for binary solvent mixtures is another critical parameter. Given that APIs may exhibit limited solubility in mixed solvents, a step size of 0.1 (i.e., solvent fraction increments of 0.05) was chosen to accurately capture potential extreme solubility points. More precise calculations, such as a step size of 0.01, are feasible but would require significantly greater computational resources. In most cases, the endpoint solubility values for binary mixtures are obtained from model predictions. However, since some users may prefer to input their own solubility data, the framework also allows for the manual definition of binary solvent system endpoints, providing greater flexibility in practical applications.
| API name | Molecular formula | Temperature range | Sustainability considerations |
|---|---|---|---|
| Paracetamol (APAP) | C8H9NO2 | a1: 40 °C–15 °C | a1: midpoint, weighted sum STI |
| Meloxicam (MLX) | C14H13N3O4S2 | b1: 50 °C–10 °C | b1: midpoint, human carcinogenic toxicity |
| b2: 30 °C–5 °C | b2: midpoint, human carcinogenic toxicity | ||
| Piroxicam (PXC) | C15H13N3O4S | c1: 30 °C–10 °C | c1: endpoint, resources |
| c2: 30 °C–10 °C | c2: GSK methodology | ||
| Cytarabine (AraC) | C9H13N3O5 | d1: 50 °C–5 °C | d1: endpoint, human health |
| d2: 40 °C–15 °C | d2: endpoint, human health | ||
| d3: 40 °C–15 °C | d3: GSK methodology |
Subsequent analyses were conducted to evaluate green solvent selection for API crystallization. In case a1, the ReCiPe method Midpoint indicators were employed, where individual indicators were aggregated using an equal-weight summation approach to compute the STI, assigning equal weights to all parameters. The objective was to identify solvent systems with the lowest possible environmental impact. To systematically assess the sustainability of both single and mixed solvents, the sustainability rankings were categorized into ten distinct grades, with higher grades indicating superior environmental performance.
The APAP screening results, presented in Fig. 7, illustrate the classifications through an interactive computational interface. The left panel presents a 2D visualization, where probability and STI values are plotted against the count across different grades for each combination. Concurrently, the middle and right panels exhibit potential single-solvent and binary-solvent selections. In Grades 1 to 3, oxolane, acetonitrile, propan-1-ol, 1,4-dioxane, and N,N-dimethylformamide were identified as predominant solvents. These solvents are well-documented for their superior solubility performance and extensive industrial applicability; however, they are frequently associated with suboptimal green chemistry attributes. As sustainability rankings increased, solvents such as pentan-1-ol, butyl acetate, acetic acid, and methylsulfinylmethane were more frequently observed. At the highest sustainability levels (Grades 8 to 10), solvents including dichloromethane, methanol, benzene, water, heptane, toluene, hexane, and ethanol became dominant. Notably, water and ethanol emerged as particularly competitive due to their low environmental impact and high biodegradability. A holistic approach to solvent selection necessitates a multifaceted evaluation beyond solubility and environmental attributes alone. Rather than evaluating solubility or sustainability in isolation, an optimal solvent or solvent mixture should be selected based on minimizing environmental burden while maintaining adequate solubility within a target temperature range. For instance, while N,N-dimethylformamide exhibited the highest solubility potential, its substantial environmental impact relegated it to lower sustainability grades.
In consideration of binary solvent system, in Grade 1 to Grade 3, solvents such as oxolane, propan-1-ol, and pentan-1-ol remained dominant components. However, at intermediate sustainability levels, the nonlinear thermodynamic behavior of binary solvent mixtures resulted in the emergence of pentan-1-ol, hexane, benzene, and acetonitrile across multiple grades, each exhibiting relatively high probability values. At higher sustainability grades, binary solvent mixtures predominantly incorporated solvents previously identified in top-ranked single-solvent selections, such as water, ethanol, and dichloromethane. The probability distributions across different sustainability grades also exhibited some fluctuations, as these values were influenced by the accuracy of the ML model predictions. In this case study, lower sustainability-grade mixtures generally displayed higher probability values, indicating potential uncertainties in model predictions at different sustainability levels.
Compared to existing literature, the solvent systems identified by our framework follow consistent trends. For instance, solvents such as ethanol, methanol, and Propan-2-one, defined in Grades 7 to 10, have been widely reported as effective crystallization media, particularly for obtaining the stable and metastable polymorph.64–66 Ethanol, in particular, is widely used in APAP crystallization system for its strong solvating power and industrial applicability, and it also ranks as the top-performing single solvent in our framework.64,67 Green solvents such as water and isopropanol, which were highlighted at intermediate to high sustainability grades, are also commonly employed in the literature for polymorphic control and crystallization kinetics optimization.68 Furthermore, binary solvent systems, including water-alcohol combinations identified in our results (Table S8), have shown favorable performance in modulating solubility and directing polymorphic outcomes, in agreement with previous experimental studies.69,70
| Case | Single solvent | Top-ranked binary solvent combinations with predicted probabilities (in ‱) | ||
|---|---|---|---|---|
| Grade num | Solvent name | Rank | Solvent combination | |
| APAP a1 | Grade 1 | Oxolane, acetonitrile, propan-1-ol | 1 | Water (Solvent 1: 0.7) pentan-1-ol (Solvent 2: 0.3), probability: 0.00565 |
| Grade 2 | 1-Methylpyrrolidin-2-one, chloroform | 2 | Water (Solvent 1: 0.75) pentan-1-ol (Solvent 2: 0.25), probability: 0.00471 | |
| Grade 3 | 1,4-Dioxane, N,N-dimethylformamide | 3 | Pentan-1-ol (Solvent 1: 0.2) toluene (Solvent 2: 0.8), probability: 0.00377 | |
| Grade 4 | Cyclohexanone, butan-1-ol | 4 | Water (Solvent 1: 0.8) pentan-1-ol (Solvent 2: 0.2), probability: 0.00377 | |
| Grade 5 | Pentan-1-ol, butyl acetate, butan-2-one, acetic acid, ethyl acetate | 5 | Pentan-1-ol (Solvent 1: 0.15) toluene (Solvent 2: 0.85), probability: 0.00283 | |
| Grade 6 | 2-Methylpropan-1-ol propan-2-ol | 6 | Water (Solvent 1: 0.85) pentan-1-ol (Solvent 2: 0.15), probability: 0.00283 | |
| Grade 7 | Methylsulfinylmethane | 7 | Pentan-1-ol (Solvent 1: 0.1) toluene (Solvent 2: 0.9), probability: 0.00189 | |
| Grade 8 | Dichloromethane, propan-2-one | 8 | Water (Solvent 1: 0.9) pentan-1-ol (Solvent 2: 0.1), probability: 0.00189 | |
| Grade 9 | Methanol, benzene | 9 | Pentan-1-ol (Solvent 1: 0.1) ethanol (Solvent 2: 0.9), probability: 0.00188 | |
| Grade 10 | Water, heptane, toluene, hexane, ethanol | 10 | Oxolane (Solvent 1: 0.1) water (Solvent 2: 0.9), probability: 0.00110 | |
| MLX b1 | Grade 1 | Pyridine, oxolane, propan-1-ol | 1 | Water (Solvent 1: 0.55) dichloromethane (Solvent 2: 0.45), probability: 0.00064 |
| Grade 2 | 1-Methylpyrrolidin-2-one, chloroform, propyl acetate, acetonitrile | 2 | Water (Solvent 1: 0.6) dichloromethane (Solvent 2: 0.4), probability: 0.00057 | |
| Grade 3 | 1,4-Dioxane, N,N-dimethylformamide | 3 | Dichloromethane (Solvent 1: 0.35) toluene (Solvent 2: 0.65), probability: 0.00050 | |
| Grade 4 | Butan-1-ol | 4 | Water (Solvent 1: 0.65) dichloromethane (Solvent 2: 0.35), probability: 0.00050 | |
| Grade 5 | Pentan-1-ol, butyl acetate, butan-2-one, acetic acid, ethyl acetate | 5 | Dichloromethane (Solvent 1: 0.3) ethanol (Solvent 2: 0.7), probability: 0.00048 | |
| Grade 6 | Octan-1-ol | 6 | Dichloromethane (Solvent 1: 0.3) 1,2-xylene (Solvent 2: 0.7), probability: 0.00044 | |
| Grade 7 | Dichloromethane, 2-methylpropan-1-ol, cyclohexanone | 7 | Dichloromethane (Solvent 1: 0.3) toluene (Solvent 2: 0.7), probability: 0.00043 | |
| Grade 8 | Propan-2-ol, methylsulfinylmethane | 8 | Water (Solvent 1: 0.7) dichloromethane (Solvent 2: 0.3), probability: 0.00043 | |
| Grade 9 | Propan-2-one, methanol | 9 | Dichloromethane (Solvent 1: 0.25) ethanol (Solvent 2: 0.75), probability: 0.00041 | |
| Grade 10 | Water, heptane, 1,2-xylene, benzene, toluene, hexane, ethanol | 10 | Dichloromethane (Solvent 1: 0.25) 1,2-xylene (Solvent 2: 0.75), probability: 0.00037 | |
| MLX b2 | Grade 1 | Pyridine, oxolane, propan-1-ol | 1 | Water (Solvent 1: 0.55) dichloromethane (Solvent 2: 0.45), probability: 0.00065 |
| Grade 2 | 1-Methylpyrrolidin-2-one, chloroform, acetonitrile | 2 | Dichloromethane (Solvent 1: 0.4) toluene (Solvent 2: 0.6), probability: 0.00058 | |
| Grade 3 | 1,4-Dioxane, N,N-dimethylformamide | 3 | Water (Solvent 1: 0.6) dichloromethane (Solvent 2: 0.4), probability: 0.00058 | |
| Grade 4 | Butan-1-ol | 4 | Dichloromethane (Solvent 1: 0.35) 1,2-xylene (Solvent 2: 0.65), probability: 0.00052 | |
| Grade 5 | Pentan-1-ol, butyl acetate, butan-2-one, acetic acid, ethyl acetate | 5 | Dichloromethane (Solvent 1: 0.35) toluene (Solvent 2: 0.65), probability: 0.00051 | |
| Grade 6 | Octan-1-ol | 6 | Water (Solvent 1: 0.65) dichloromethane (Solvent 2: 0.35), probability: 0.00051 | |
| Grade 7 | Dichloromethane, 2-methylpropan-1-ol, cyclohexanone | 7 | Dichloromethane (Solvent 1: 0.3) ethanol (Solvent 2: 0.7), probability: 0.00048 | |
| Grade 8 | Propan-2-ol, methylsulfinylmethane | 8 | Dichloromethane (Solvent 1: 0.3) 1,2-xylene (Solvent 2: 0.7), probability: 0.00045 | |
| Grade 9 | Propan-2-one, methanol | 9 | Dichloromethane (Solvent 1: 0.3) toluene (Solvent 2: 0.7), probability: 0.00043 | |
| Grade 10 | Water, 1,2-xylene, benzene, toluene, hexane, ethanol | 10 | Water (Solvent 1: 0.7) dichloromethane (Solvent 2: 0.3), probability: 0.00043 | |
| PLX c1 | Grade 1 | Pyridine, acetonitrile, propan-1-ol | 1 | Dichloromethane (Solvent 1: 0.95) methylsulfinylmethane (Solvent 2: 0.05), probability: 0.00663 |
| Grade 2 | 2-Methylpropan-1-ol, oxolane, propan-2-ol, butan-1-ol | 2 | Dichloromethane (Solvent 1: 0.95) ethanol (Solvent 2: 0.05), probability: 0.00661 | |
| Grade 3 | 1,4-Dioxane, pentan-1-ol, methanol | 3 | Chloroform (Solvent 1: 0.05) dichloromethane (Solvent 2: 0.95), probability: 0.00660 | |
| Grade 4 | N,N-Dimethylformamide, cyclohexanone | 4 | Water (Solvent 1: 0.05) dichloromethane (Solvent 2: 0.95), probability: 0.00660 | |
| Grade 5 | Benzene, 1-Methylpyrrolidin-2-one, butan-2-one | 5 | Octan-1-ol (Solvent 1: 0.05) dichloromethane (Solvent 2: 0.95), probability: 0.00660 | |
| Grade 6 | Acetic acid, ethyl acetate | 6 | Dichloromethane (Solvent 1: 0.9) ethanol (Solvent 2: 0.1), probability: 0.00626 | |
| Grade 7 | Butyl acetate, toluene, 1,2-xylene | 7 | Chloroform (Solvent 1: 0.1) dichloromethane (Solvent 2: 0.9), probability: 0.00626 | |
| Grade 8 | Hexane, heptane, propan-2-one | 8 | Water (Solvent 1: 0.1) dichloromethane (Solvent 2: 0.9), probability: 0.00625 | |
| Grade 9 | Methylsulfinylmethane | 9 | Octan-1-ol (Solvent 1: 0.1) dichloromethane (Solvent 2: 0.9), probability: 0.00625 | |
| Grade 10 | Water, chloroform, dichloromethane, octan-1-ol, ethanol | 10 | Dichloromethane (Solvent 1: 0.85) Ethanol (Solvent 2: 0.15), probability: 0.00592 | |
| PLX c2 | Grade 1 | Chloroform, 1,4-dioxane, benzene, hexane, oxolane | 1 | Pentan-1-ol (Solvent 1: 0.6) dichloromethane (Solvent 2: 0.4), probability: 0.00278 |
| Grade 2 | Dichloromethane, pyridine, N,N-dimethylformamide | 2 | Pentan-1-ol (Solvent 1: 0.65) dichloromethane (Solvent 2: 0.35), probability: 0.00243 | |
| Grade 3 | 1-methylpyrrolidin-2-one | 3 | Pentan-1-ol (Solvent 1: 0.7) dichloromethane (Solvent 2: 0.3), probability: 0.00209 | |
| Grade 4 | Acetic acid, heptane | 4 | Pentan-1-ol (Solvent 1: 0.75) dichloromethane (Solvent 2: 0.25), probability: 0.00174 | |
| Grade 5 | Propan-2-one, acetonitrile, methanol | 5 | Water (Solvent 1: 0.8) dichloromethane (Solvent 2: 0.2), probability: 0.00139 | |
| Grade 6 | 1,2-Xylene, butyl acetate, butan-2-one, toluene, cyclohexanone | 6 | Pentan-1-ol (Solvent 1: 0.8) dichloromethane (Solvent 2: 0.2), probability: 0.00139 | |
| Grade 7 | N/A | 7 | Octan-1-ol (Solvent 1: 0.8) dichloromethane (Solvent 2: 0.2), probability: 0.00139 | |
| Grade 8 | Propan-2-ol, methylsulfinylmethane | 8 | Water (Solvent 1: 0.85) dichloromethane (Solvent 2: 0.15), probability: 0.00104 | |
| Grade 9 | Ethyl acetate, butan-1-ol, ethanol, propan-1-ol | 9 | Pentan-1-ol (Solvent 1: 0.85) dichloromethane (Solvent 2: 0.15), probability: 0.00104 | |
| Grade 10 | 2-Methylpropan-1-ol, water, pentan-1-ol, octan-1-ol | 10 | Octan-1-ol (Solvent 1: 0.85) dichloromethane (Solvent 2: 0.15), probability: 0.00104 | |
| AraC d1 | Grade 1 | Acetonitrile, propan-1-ol | 1 | Water (Solvent 1: 0.60) 1,2-xylene (Solvent 2: 0.40), probability: 0.00637 |
| Grade 2 | 2-Methylpropan-1-ol, oxolane, propan-2-ol, butan-1-ol | 2 | 1,2-Xylene (Solvent 1: 0.350) ethanol (Solvent 2: 0.650), probability: 0.00581 | |
| Grade 3 | 1,4-Dioxane, pentan-1-ol, methanol | 3 | Water (Solvent 1: 0.650) 1,2-xylene (Solvent 2: 0.350), probability: 0.00558 | |
| Grade 4 | N,N-Dimethylformamide | 4 | 1,2-Xylene (Solvent 1: 0.300) ethanol (Solvent 2: 0.700), probability: 0.00504 | |
| Grade 5 | Benzene, 1-methylpyrrolidin-2-one, butan-2-one, cyclohexanone | 5 | Water (Solvent 1: 0.700) 1,2-xylene (Solvent 2: 0.300), probability: 0.00479 | |
| Grade 6 | Acetic acid, ethyl acetate | 6 | 1,2-Xylene (Solvent 1: 0.250) ethanol (Solvent 2: 0.750), probability: 0.00427 | |
| Grade 7 | Butyl acetate, propan-2-one, toluene, 1,2-xylene | 7 | Water (Solvent 1: 0.750) 1,2-xylene (Solvent 2: 0.250), probability: 0.00401 | |
| Grade 8 | Hexane, methylsulfinylmethane | 8 | 1,2-Xylene (Solvent 1: 0.200) ethanol (Solvent 2: 0.800), probability: 0.00350 | |
| Grade 9 | Dichloromethane | 9 | Water (Solvent 1: 0.800) 1,2-xylene (Solvent 2: 0.200), probability: 0.00322 | |
| Grade 10 | Water, chloroform, octan-1-ol, ethanol | 10 | Octan-1-ol (Solvent 1: 0.800) 1,2-xylene (Solvent 2: 0.200), probability: 0.00321 | |
| AraC d2 | Grade 1 | Acetonitrile, propan-1-ol | 1 | Water (Solvent 1: 0.60) 1,2-xylene (Solvent 2: 0.40), probability: 0.00637 |
| Grade 2 | 2-Methylpropan-1-ol, oxolane, propan-2-ol, butan-1-ol | 2 | 1,2-Xylene (Solvent 1: 0.350) ethanol (Solvent 2: 0.650), probability: 0.00581 | |
| Grade 3 | 1,4-Dioxane, pentan-1-ol, methanol | 3 | Water (Solvent 1: 0.650) 1,2-xylene (Solvent 2: 0.350), probability: 0.00558 | |
| Grade 4 | N,N-Dimethylformamide | 4 | 1,2-Xylene (Solvent 1: 0.300) ethanol (Solvent 2: 0.700), probability: 0.00504 | |
| Grade 5 | Benzene, 1-methylpyrrolidin-2-one, butan-2-one, cyclohexanone | 5 | Water (Solvent 1: 0.700) 1,2-xylene (Solvent 2: 0.300), probability: 0.00479 | |
| Grade 6 | Acetic acid, ethyl acetate | 6 | 1,2-Xylene (Solvent 1: 0.250) ethanol (Solvent 2: 0.750), probability: 0.00427 | |
| Grade 7 | Butyl acetate, propan-2-one, toluene, 1,2-xylene | 7 | Water (Solvent 1: 0.750) 1,2-xylene (Solvent 2: 0.250), probability: 0.00401 | |
| Grade 8 | Hexane, methylsulfinylmethane | 8 | 1,2-Xylene (Solvent 1: 0.200) ethanol (Solvent 2: 0.800), probability: 0.00350 | |
| Grade 9 | Dichloromethane | 9 | Water (Solvent 1: 0.800) 1,2-xylene (Solvent 2: 0.200), probability: 0.00322 | |
| Grade 10 | Water, chloroform, octan-1-ol, ethanol | 10 | Octan-1-ol (Solvent 1: 0.800) 1,2-xylene (Solvent 2: 0.200), probability: 0.00321 | |
| AraC d3 | Grade 1 | Benzene, 1,4-dioxane, chloroform | 1 | Pentan-1-ol (Solvent 1: 0.500) 1,2-xylene (Solvent 2: 0.500), probability: 0.00803 |
| Grade 2 | Dichloromethane, hexane, N,N-dimethylformamide | 2 | Pentan-1-ol (Solvent 1: 0.550) 1,2-xylene (Solvent 2: 0.450), probability: 0.00725 | |
| Grade 3 | 1-Methylpyrrolidin-2-one | 3 | Pentan-1-ol (Solvent 1: 0.600) 1,2-xylene (Solvent 2: 0.400), probability: 0.00647 | |
| Grade 4 | Acetic acid | 4 | Pentan-1-ol (Solvent 1: 0.650) 1,2-xylene (Solvent 2: 0.350), probability: 0.00569 | |
| Grade 5 | Propan-2-one, acetonitrile, methanol | 5 | Pentan-1-ol (Solvent 1: 0.700) 1,2-xylene (Solvent 2: 0.300), probability: 0.00491 | |
| Grade 6 | 1,2-Xylene, butyl acetate, butan-2-one | 6 | Pentan-1-ol (Solvent 1: 0.750) 1,2-xylene (Solvent 2: 0.250), probability: 0.00414 | |
| Grade 7 | N/A | 7 | Octan-1-ol (Solvent 1: 0.750) 1,2-xylene (Solvent 2: 0.250), probability: 0.00399 | |
| Grade 8 | Propan-2-ol, methylsulfinylmethane | 8 | Pentan-1-ol (Solvent 1: 0.800) 1,2-xylene (Solvent 2: 0.200), probability: 0.00336 | |
| Grade 9 | 2-Methylpropan-1-ol, butan-1-ol, propan-1-ol | 9 | Water (Solvent 1: 0.800) 1,2-xylene (Solvent 2: 0.200), probability: 0.00322 | |
| Grade 10 | Water, pentan-1-ol, octan-1-ol | 10 | Octan-1-ol (Solvent 1: 0.800) 1,2-xylene (Solvent 2: 0.200), probability: 0.00321 | |
To ensure the reliability of the predictive outcomes, a critical threshold was established in this study, where a solvent was classified as “uncertain” if its grade ranking fell outside the top P% of solvents ranked by probability (set at 10% in this case). This approach accounts for potential classification uncertainty, acknowledging that while a solvent can be numerically assigned to a specific grade, its classification may still be subject to variability due to probabilistic ranking constraints.
The comparative analysis between MLX b1 (50–10 °C) and MLX b2 (30–5 °C), both evaluated using the same sustainability indicator, reveals a high degree of consistency in the classification of single solvents across Grades 1 to 10. Only minor differences were observed, such as the inclusion of propyl acetate in Grade 2 under MLX b1. For binary solvent selection, both cases identified highly similar solvent pairs within corresponding grade levels, with slight variations in the optimal composition ratios. Notably, the predicted probabilities for the top binary combinations were marginally higher under the lower temperature gradient in MLX b2, suggesting improved predictive confidence under a narrower cooling range. These results confirm that, within the proposed framework, when the sustainability evaluation method is held constant, the influence of temperature on solvent classification is limited. However, temperature can still affect the fine-tuning of binary solvent compositions and the associated selection probabilities.
The final solvent screening results for the MLX cases underscore the practical relevance and industrial compatibility of the proposed framework (Tables S9 and S10). Ethanol and methanol, ranked in Grades 10 and 9 respectively, have been experimentally validated for meloxicam dissolution and crystallization.71,72 Notably, ethanol-water mixtures (Grade 10) are highlighted in patent EP-1462451A1 as preferred media for Form I crystallization, offering controlled polarity and enhanced purity.73 Such alcohol-water co-solvent systems are widely used in industrial crystallization processes, where temperature control enables high yield, polymorphic stability, and improved solid properties.74 These examples confirm that high-grade solvents identified by the framework are not only environmentally favorable but also well-aligned with established industrial practices.
Significant differences also emerge in the classification of mid-tier solvents. In PLX c1, Grade 5 includes benzene, 1-methylpyrrolidin-2-one, and butan-2-one, whereas in PLX c2, Grade 5 consists of propan-2-one, acetonitrile, and methanol. Notably, benzene is assigned a relatively high grade in the Endpoint indicator but is ranked significantly lower in the GSK method, suggesting differences in risk perception between the two methodologies. Both methods, however, classify DMSO at a relatively high grade, reflecting its recognition as an environmentally preferable solvent due to its low toxicity and high biodegradability.
Water, widely regarded as a green solvent, is consistently assigned Grade 10 in both methods, reinforcing its high priority for sustainability. However, notable discrepancies exist in the classification of chlorinated solvents. In PLX c1, chloroform and dichloromethane are also categorized as Grade 10, whereas in PLX c2, dichloromethane is assigned a significantly lower ranking at Grade 2, likely due to the stricter regulatory constraints imposed on chlorinated solvents within the GSK framework.
The binary solvent selection results further emphasize the methodological divergence between the two approaches. In PLX c1, the top-ranked binary solvent combinations are predominantly characterized by a high proportion of dichloromethane mixed with small amounts of other solvents. In contrast, PLX c2 follows a different ranking trend, where pentan-1-ol and dichloromethane mixtures dominate, and solvent ratios vary more significantly.
From a probability distribution perspective, the binary solvent combination probabilities calculated in PLX c1 are notably higher than those in PLX c2 (PLX c1 maximum: 0.00663 vs. PLX c2 maximum: 0.00278). This suggests that the Endpoint indicator is more likely to identify high-probability solvent combinations, whereas the GSK method, due to its broader consideration of multiple influencing factors and smaller numerical differentials across criteria, results in lower overall probability variations among binary solvent combinations. Fig. 8 illustrates the distribution of binary solvent combinations within Grade 10 for PLX c1 (a) and PLX c2 (b). In PLX c1, a limited number of combinations, particularly those involving dichloromethane, show markedly higher probabilities. This indicates a strong preference for dichloromethane-based mixtures under the ReCiPe Endpoint indicator. In contrast, PLX c2 displays a more balanced probability distribution across several solvent systems. Although dichloromethane remains among the top candidates, the wider spread suggests that the GSK method allows greater flexibility and supports more diverse solvent selection strategies. When focusing on traditional green solvents such as water or ethanol as one component in binary mixtures, the Endpoint indicator (PLX c1) yields not only more concentrated high-probability combinations but also a greater number of qualifying binary systems within Grade 10. In contrast, the GSK-based method (PLX c2) identifies fewer combinations but distributes probability more evenly. See Tables S11 and S12 for detailed listings.
Fig. 9 compares the Grade 10 binary solvent systems with water as a fixed component under AraC d1 and AraC d3. While both methods identify common co-solvents such as 1,2-xylene and acetone, AraC d1 includes less sustainable options like dichloromethane, whereas AraC d3 favors greener solvents such as 2-methylpropan-1-ol. This reflects the broader tolerance of Endpoint-based screening versus the stricter sustainability constraints of the GSK metric. Compositionally, AraC d1 allows wider water ratio ranges, indicating greater flexibility, while AraC d3 yields narrow, sharply defined optima, suggesting higher selectivity. Both methods consistently rank water + 1,2-xylene highest, though optimal ratios differ. Systems like water + ethanol show lower probabilities and narrower ranges, reflecting limited suitability. These observations are consistent with the trend discussed in Section 3.3.3, where the GSK indicator led to a more selective and compositionally constrained solvent space compared to the more inclusive Endpoint approach. A detailed statistical summary of Grade 10 binary solvent systems containing water or ethanol as one component is provided in the SI (Tables S13 and S14).
To further validate the predictive accuracy, experimental verification was conducted, with results shown in Fig. 10. The model demonstrated a high level of predictive performance. In the binary solvent combination design, the Grade 10 combination of dichloromethane (Solvent 1: 0.05) and 1,2-xylene (Solvent 2: 0.95) (D–X combination) exhibited the highest probability, but this does not imply that it is the most sustainable choice. In subfigures (e–h) the D–X combination exhibited higher environmental impacts in categories such as ozone formation, global warming, and fossil resource scarcity compared to water–ethanol combinations at any ratio. This underscores the fact that although Endpoint 1 Human Health was selected as the evaluation criterion, it does not mean that all binary solvent combinations within the same grade exhibit identical environmental impacts. The solvent selection process should be tailored to the user's specific sustainability priorities, ensuring a balance between high predictive robustness (probability) and optimal sustainability impact within the selected evaluation framework.
Despite the robustness demonstrated by the developed methodology, the most reliable solvent selection and optimization strategy still necessitates experimental calibration to refine predictive accuracy. Pre-calibrated experimental data help control error margins, mitigating the risk of cumulative inaccuracies arising from model approximations and prediction errors. Furthermore, while machine learning-based solvent screening has demonstrated strong predictive capabilities, its accuracy remains inherently constrained by the quality and diversity of training data. Expanding high-quality experimental datasets will be critical for further enhancing the predictive reliability of the model.
Additionally, the choice of sustainability assessment methodologies significantly influences the final solvent rankings and recommendations. Different evaluation frameworks and solvent-specific sustainability priorities, such as toxicity concerns, resource consumption, or process safety considerations, may yield varying rankings for the same solvents. Future research will focus on integrating commonly used sustainability assessment frameworks into the platform and exploring multi-objective optimization approaches. By incorporating a broader set of sustainability indicators, the solvent evaluation framework can be expanded to ensure that solvent selection accounts for both industrial applicability and environmental impact.
Building upon the platform developed in this study, future research will further focus on the digitalization of solid–liquid separation processes, integrating prediction, design, and optimization into the comprehensive, intelligent solvent selection and process optimization SolECOs platform. This development will not only enhance solvent selection efficiency but also improve the overall effectiveness of crystallization and separation processes. By incorporating real-time process monitoring and adaptive optimization, the platform will evolve into a data-driven intelligent tool, offering more precise, efficient, and sustainable solutions for pharmaceutical and chemical process design.
000 data points covering 1186 APIs and 30 solvent systems with thermodynamically informed machine learning models to support solvent-related decision-making in crystallization processes.
The modeling framework includes a polynomial regression-based multi-task learning network (PRMMT) for temperature-dependent solubility profiling, a point-adjusted prediction network (PAPN) for single-temperature correction, and a modified Jouyban–Acree neural network (MJANN) for binary solvent prediction. These models enable interpretable and uncertainty-aware predictions across a wide range of crystallization conditions. To further support environmentally informed decision-making, SolECOs incorporates comprehensive sustainability evaluations based on both the ReCiPe 2016 life cycle impact framework and the GSK Solvent Sustainability Guide, allowing users to balance solubility performance with environmental priorities.
The entire workflow is implemented in an interactive graphical interface, facilitating user-friendly data input, model execution, and visualization of solubility curves, confidence intervals, and sustainability indicators. Case studies involving representative APIs, including paracetamol, meloxicam, piroxicam, and cytarabine, validate the robustness and applicability of this module across varying crystallization scenarios. As a foundational part of the broader SolECOs platform, this module demonstrates how data-driven modeling and sustainability metrics can be integrated to guide solvent selection in early-stage pharmaceutical process development.
| APAP | Paracetamol |
| AraC | Cytarabine |
| CAMD | Computer-aided molecular design |
| JA Model | Jouyban–Acree model |
| MJANN | Modified Jouyban–Acree-based neural network |
| MAE | Mean absolute error |
| ML | Machine learning |
| MLX | Meloxicam |
| MSE | Mean squared error |
| PR Model | Polynomial regression (PR) model |
| PRMMT | Polynomial regression model-based multi-task learning network |
| PXC | Piroxicam |
| R2 | Coefficient of determination |
| RMSE | Root mean square error |
| RMSLE | Root mean squared logarithmic error |
| STI | Sustainability throughput index |
| p-Value | Probability value (confidence level of predictive error distribution fitting) |
| T v | Tolerance value |
The supplementary Information includes detailed descriptions of the model development methodology, case study configurations, simulation workflows, and additional supporting tables and figures. See DOI: https://doi.org/10.1039/d5gc04176g.
Additional materials and datasets can be made available by the corresponding author upon reasonable request.
| This journal is © The Royal Society of Chemistry 2025 |