Dimensionality reduction of COSMO-RS molecular descriptor using functional principal component analysis (FPCA) for organic solvent mapping
Abstract
Within the context of a transition towards greener and safer solvents, we describe a framework facilitating solvent screening. Traditional approaches rely on experimental solubility data or computational methods such as COSMO-RS. In parallel, similarity maps can be helpful to explore alternative molecules similar to working solvents. For developing solvent maps, Principal Component Analysis (PCA) offers limited applicability when dealing with complex molecular descriptors such as the σ-potential derived from COSMO-RS theory. In this study, we propose the application of Functional Principal Component Analysis (FPCA) as a more suitable dimensionality reduction technique for solvent mapping, leveraging the functional nature of σ-potentials. A database of 1588 solvents was analyzed, extending previous reported datasets with the inclusion of industrially relevant and sustainable candidates. FPCA enables a two-dimensional representation of the solvent space with minimal information loss (0.5%), directly associating the principal components with electron donor and acceptor characteristics. In this space, solvent clustering naturally emerges, facilitating the identification of structurally and functionally similar solvents. Three case studies are presented to illustrate the practical implications of the approach. Overall, this methodology provides a suitable framework for solvent substitution, whether as a preliminary screening step or as a part of computer-aided solvent design tools, contributing to more sustainable chemical practices.

Please wait while we load your content...