A machine learning-assisted design for adjusting the solubility of ibuprofen-related binary compounds: a data driven approach†
Abstract
Purpose: monitoring the solubilities of pharmaceuticals is a critically important bottleneck for their development, since it influences their efficacy and bioavailability. To overcome this challenge, we leverage a machine learning (ML) technique to forecast and optimize solubility in compounds related to ibuprofen. Method: our comprehensive dataset, comprising over 1126 data points acquired from the literature, was analyzed using molecular descriptors extracted from molecular electrostatic potentials (MEPs), Lipinski's rule of five, and hydrogen bonding parameters. EdgeCov, linear, and random forest regression – three of the best ML models, achieved remarkable predictive power, with R2 values ranging from 0.86 to 0.92 and root mean square errors (RMSEs) between 0.002 and 0.34. Results: with compounds exceeding 80 g L−1, solubility mapping revealed a significant correlation between hydroxyl groups and enhanced solubility. Our study illustrates the potential for ML-driven design to streamline pharmaceutical development, predicting aqueous solubility prior to manufacturing and conserving valuable resources. By identifying appropriate molecular attributes, our approach enables the rational design of solubility-optimized pharmaceuticals, promoting bioavailability and therapeutic efficacy. Conclusion: this innovative framework accelerates the discovery of effective, solubility-optimized medications with broad implications for pharmaceutical research and development.