Improved environmental chemistry property prediction of molecules with graph machine learning†
Abstract
Rapid prediction of environmental chemistry properties is critical for the green and sustainable development of the chemical industry and drug discovery. Machine learning methods can be applied to learn the relations between chemical structures and their environmental impact. Graph machine learning, by learning the representations directly from molecular graphs, may have better predictive power than conventional feature-based models. In this work, we leveraged graph neural networks to predict the environmental chemistry properties of molecules. To systematically evaluate the model performance, we selected a representative list of datasets, ranging from solubility to reactivity, and compared them directly to commonly used methods. We found that the graph model achieved near state-of-the-art accuracy for all tasks and, for several, improved the accuracy by a large margin over conventional models that rely on human-designed chemical features. This demonstrates that graph machine learning can be a powerful tool to perform representation learning for environmental chemistry. Further, we compared the data efficiency of conventional feature-based models and graph neural networks, providing guidance for model selection dependent on the size of datasets and feature requirements.
- This article is part of the themed collections: Machine Learning and Artificial Intelligence: A cross-journal collection and 2023 Green Chemistry Hot Articles