Functional Clustering as a Correction Framework for Regression Models Under Small-Data Constraints: Predicting Optical Limiting in Phthalocyanines
Abstract
Functional clustering corrects poorly performing regression models under small-data constraints, transforming raw predictions with 32–140% error into accurate estimates of 10–25% mean absolute percentage error. The method jointly optimizes cluster assignments and local correction functions, with the number of clusters determined by the Bayesian Information Criterion. Applied to five CORRELATO regression models predicting optical limiting in 25 phthalocyanines, including models with negative R2, functional clustering reduces global prediction error to 10–25%. Within well‑populated clusters (≥5 compounds), leave‑one‑out cross‑validation yields median errors of 14–36%, demonstrating generalizability. Clusters with 3–4 compounds show instability, defining a quantitative applicability domain. Decision trees (92–96% agreement) provide interpretable IF‑THEN rules for assigning new compounds without rerunning clustering. Feature importance reveals that cluster assignment depends on descriptors (β0, Δα0, Eg) that differ systematically from those entering the raw regressions, indicating the method uncovers latent physicochemical structure rather than regression artifacts. The method is general and can be applied to any regression problem in materials chemistry where sample sizes are limited. The open‑source Python code provided in the Supporting Information allows researchers to correct their own poorly performing models without reprogramming, enabling rapid prescreening of compound libraries for optical limiting and other functional properties.
Please wait while we load your content...