Interpretable machine-learning prediction of DFT energies per atom and identification of magic numbers in coinage-metal nanoclusters (N ≤ 55) from the open quantum cluster database
Abstract
Atomically precise coinage-metal nanoclusters (Cu, Ag, and Au) exhibit size-dependent stability critical for catalysis, plasmonics, and photocatalysis, yet first-principles screening becomes prohibitive beyond ∼55 atoms. We use the open quantum cluster database (QCD; 4381 Cu/Ag/Au clusters, N ≤ 55, PBE/PAW) to build an interpretable machine-learning framework for the DFT energy per atom (EDFT/N). Nine geometric descriptors (radius of gyration, asphericity, compactness, and bounding-box dimensions) are combined with cluster size N, metal identity, and three QCD-derived electronic features. On a stratified 70/15/15 split, LightGBM attains a test MAE of 0.0144 eV atom−1 and R2 = 0.996, with stable 5-fold cross-validation (0.0142 ± 0.0004 eV per atom); ExtraTrees yielded near-equivalent accuracy. A geometry-only variant trained without electronic inputs retains an MAE = 0.0148 eV per atom (only 3% above the full model), demonstrating that energies can be ranked from coordinates alone. Per-metal second-difference (Δ2E) analysis with adaptive thresholds identifies universal peaks at N = 8 and 34 across all three metals; N = 32 and 38 are resolved only for Au, which is consistent with relativistic stabilization, whereas Au shows an anomalous Δ2E at N = 20 that is absent in Cu and Ag. SHAP analysis reveals that metal identity and cluster size dominate predictions, whereas geometric descriptors govern the ∼10 meV per atom differences that determine magic-number locations. Size-grouped cross-validation shows that interpolation within the QCD is highly accurate, but extrapolation across size domains is substantially harder (MAE ≈ 0.10 eV per atom), bounding the model's scope. The complete open-source pipeline is released to support FAIR data practices.

Please wait while we load your content...