Interpretable machine learning models for predicting the antitumor effects of metal and metal oxide nanomaterials†
Abstract
Understanding the toxic behavior of metal and metal oxide nanoparticles (M/MOx NPs) is essential for effective tumor diagnosis and treatment, yet generalizing findings remains challenging due to limited data, sampling variability, unreported complexities, low model accuracy, and a lack of interpretability. To address these issues and minimize extensive experimentation, we combined quantum chemistry calculations with published toxicity data to develop a machine learning model achieving over 90% accuracy in cross-validation. Utilizing 39 descriptors extracted from 152 articles, our dataset comprises 2765 instances covering various nanoparticle types, detection methods, and cell types. We enhanced data representation with the Jaccard similarity coefficient and employed Feature Importance and Shapley Additive Explanations (SHAP) to identify key factors influencing cytotoxicity, such as concentration, exposure time, zeta potential, diameter, COSMO area (CA), coating, testing methods, cell types, metal electronegativity, HOMO energy, and molecular weight. Additionally, we analyzed the interactions among these features and their influence on predictions, synthesized novel metal oxide nanoparticles, and assessed their physicochemical properties and anti-tumor toxicity. Cytotoxicity experiments with newly synthesized nanoparticles further validated the model's accuracy and generalizability, revealing hidden relationships and enabling predictions for previously unseen samples. This approach supports preliminary computer-aided screenings, significantly reducing the need for labor-intensive experimentation.