Mechanism-driven interpretable modeling of hydrogen solubility in organic compounds via a hierarchical transformer
Abstract
The acceleration of global warming due to fossil fuel combustion has spurred the strategic transition to hydrogen as a zero-carbon energy carrier. However, accurate prediction of hydrogen dissolution behavior in the hydrogen energy industry chain remains a key bottleneck restricting the development of hydrogen storage materials and optimization of chemical processes. In response to the high cost limitations of traditional experimental methods and the inefficient universality of thermodynamic models, this study developed a machine learning framework integrating feature engineering and interpretability analysis for predicting the solubility of H2 in organic compounds. By constructing a high-dimensional dataset of organic compounds, it innovatively integrated critical property parameters, molecular descriptors, and functional group fingerprint features, and used the Boruta algorithm to select 14 key features to eliminate multicollinearity. Four machine learning models – convolutional neural network (CNN), cascade forward neural network (CFNN), adaptive neuro-fuzzy inference system (ANFIS), and a novel hierarchical regression transformer (HiRegFormer) – were systematically evaluated. The results showed that the HiRegFormer model demonstrated outstanding performance, with an R2 of 0.9855 and an RMSE of 0.0069. Its hybrid encoder architecture simultaneously captured local molecular interactions and global thermodynamic patterns. SHAP interpretability analysis quantified the dominant roles of pressure and temperature on solubility. This study provides a data-mechanism dual-driven intelligent tool for hydrogen dissolution equilibrium prediction, which has significant engineering guidance value for the rational design of hydrogen storage materials and the optimization of hydrogenation reactors.

Please wait while we load your content...