The infinite-dimensional nature of spectroscopy and why models succeed, fail, and mislead

Abstract

Machine learning (ML) models have achieved strikingly high accuracies in spectroscopic classification tasks, often without a clear proof that those models used chemically meaningful features. Existing studies have linked these results to data preprocessing choices, noise sensitivity, and model complexity, but no unifying explanation is available so far. In this work, we show that these phenomena arise naturally from the intrinsic high dimensionality of spectral data. Using a theoretical analysis grounded in the Feldman–Hájek theorem and the concentration of measure, we show that even infinitesimal distributional differences, caused by noise, normalisation, or instrumental artefacts, may become perfectly separable in high-dimensional spaces. Through a series of specific experiments on synthetic and real fluorescence spectra, we illustrate how models can achieve near-perfect accuracy even when chemical distinctions are absent, and why feature-importance maps may highlight spectrally irrelevant regions. We provide a rigorous theoretical framework, confirm the effect experimentally, and conclude with practical recommendations for building and interpreting ML models in spectroscopy.

Graphical abstract: The infinite-dimensional nature of spectroscopy and why models succeed, fail, and mislead

Article information

Article type
Paper
Submitted
29 Mar 2026
Accepted
06 Apr 2026
First published
13 Apr 2026
This article is Open Access
Creative Commons BY license

Analyst, 2026, Advance Article

The infinite-dimensional nature of spectroscopy and why models succeed, fail, and mislead

U. Michelucci and F. Venturini, Analyst, 2026, Advance Article , DOI: 10.1039/D6AN00346J

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements