Hierarchical clustering and optimal interval combination (HCIC): a knowledge-guided strategy for consistent and interpretable spectral variable interval selection

Abstract

Variable selection is crucial for the accuracy of spectral analysis and is typically formulated as an optimization problem using regression techniques. However, these data-driven methods may overlook physical laws or mechanisms, leading to the deselection of physically relevant variables. To address this, we propose a hierarchical clustering and optimal interval combination (HCIC) strategy, guided by domain knowledge, in which physical principles and mechanisms inform algorithm design to capture more physically relevant feature structures. In the first step, spectral variable hierarchical clustering (SVHC) is employed to determine correlations between adjacent variables, generating non-uniform intervals. Each interval corresponds to distinct patterns that reflect underlying molecular interactions, such as peak shifts, functional group contributions, and even non-reaction background signals. Secondly, a Bayesian linear regression-based optimal interval combination (BLR-OIC) strategy is applied to identify the most effective interval combinations, capturing and exploiting the synergistic effects among functional bands or functional groups. We conduct extensive experiments on publicly available and proprietary databases to validate the efficacy of the proposed algorithm. The results demonstrate not only improved predictive performance compared to benchmarks but also greater interpretability and consistent variable selection.

Graphical abstract: Hierarchical clustering and optimal interval combination (HCIC): a knowledge-guided strategy for consistent and interpretable spectral variable interval selection

Article information

Article type
Paper
Submitted
15 Dec 2024
Accepted
16 Apr 2025
First published
29 Apr 2025

Anal. Methods, 2025, Advance Article

Hierarchical clustering and optimal interval combination (HCIC): a knowledge-guided strategy for consistent and interpretable spectral variable interval selection

P. Wu, T. Chen, M. Wang, L. Xing, X. Zou and H. Li, Anal. Methods, 2025, Advance Article , DOI: 10.1039/D4AY02250E

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements