Functional Clustering as a Correction Framework for Regression Models Under Small-Data Constraints: Predicting Optical Limiting in Phthalocyanines

Abstract

Functional clustering corrects poorly performing regression models under small-data constraints, transforming raw predictions with 32–140% error into accurate estimates of 10–25% mean absolute percentage error. The method jointly optimizes cluster assignments and local correction functions, with the number of clusters determined by the Bayesian Information Criterion. Applied to five CORRELATO regression models predicting optical limiting in 25 phthalocyanines, including models with negative R2, functional clustering reduces global prediction error to 10–25%. Within well‑populated clusters (≥5 compounds), leave‑one‑out cross‑validation yields median errors of 14–36%, demonstrating generalizability. Clusters with 3–4 compounds show instability, defining a quantitative applicability domain. Decision trees (92–96% agreement) provide interpretable IF‑THEN rules for assigning new compounds without rerunning clustering. Feature importance reveals that cluster assignment depends on descriptors (β0, Δα0, Eg) that differ systematically from those entering the raw regressions, indicating the method uncovers latent physicochemical structure rather than regression artifacts. The method is general and can be applied to any regression problem in materials chemistry where sample sizes are limited. The open‑source Python code provided in the Supporting Information allows researchers to correct their own poorly performing models without reprogramming, enabling rapid prescreening of compound libraries for optical limiting and other functional properties.

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
20 May 2026
Accepted
26 Jun 2026
First published
29 Jun 2026

Phys. Chem. Chem. Phys., 2026, Accepted Manuscript

Functional Clustering as a Correction Framework for Regression Models Under Small-Data Constraints: Predicting Optical Limiting in Phthalocyanines

A. Tolbin, Phys. Chem. Chem. Phys., 2026, Accepted Manuscript , DOI: 10.1039/D6CP01866A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements