Jump to main content
Jump to site search

Thermochemistry of Gas-phase and Surface Species via LASSO-assisted Subgraph Selection


Graph theory-based regression techniques, such as group additivity, have widely been implemented for fast estimation of thermochemistry of large molecules. The essence of these techniques lies in graphs that molecules are decomposed to. These graphs are selected based on heuristics and as a result, they may not give optimal accuracy and are hard to choose for non-nearest-neighbor electronic effects such as ring strain, steric hindrance, and resonance structures. Here, we explore LASSO, a feature selection algorithm, to select the optimal set of graph descriptors for predicting the standard enthalpy of formation, ∆fH°. We gather hydrocarbon gas-phase data from the NIST Webbook and the Burcat’s databases. We find that models using LASSO-based graph descriptors from the exhaustively enumerated graph descriptor space predict ∆fH° more accurately than the traditional group additivity. We compare our framework with state-of-the-art machine-learning models for the QM9 data set. The mean absolute error of 1.39 kcal/mol is comparable to published machine learning models. To cope with the computational cost of complete enumeration, we present: (1) a semi-supervised LASSO learning method and (2) an adsorbate subgraph mining algorithm. The former prunes the graph descriptor space on-the-fly during the LASSO regression and is applied to a gas-phase hydrocarbon data set. The latter enumerates a truncated graph descriptor space from adsorbate graphs of surface science data. For lignin monomer adsorbates on Pt(111), considered here as an illustrative example, descriptors selected from the adsorbate subgraph space result in mean absolute error and root mean square error of 2.08 and 3.03 kcal/mol, respectively. We discuss a simple method that identifies outliers in descriptor space that result in large model errors so the accuracy can be improved with addition of suitable data.

Back to tab navigation

Publication details

The article was received on 18 Dec 2017, accepted on 13 Feb 2018 and first published on 13 Feb 2018

Article type: Paper
DOI: 10.1039/C7RE00210F
Citation: React. Chem. Eng., 2018, Accepted Manuscript
  •   Request permissions

    Thermochemistry of Gas-phase and Surface Species via LASSO-assisted Subgraph Selection

    G. H. Gu, P. Plechac and D. Vlachos, React. Chem. Eng., 2018, Accepted Manuscript , DOI: 10.1039/C7RE00210F

Search articles by author