Open Access Article
Lapo Renai†
*a,
Jens Heemskerk†a,
Frederic Béenbc and
Saer Samanipour
ade
aVan’t Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, 1090 GD, Amsterdam, The Netherlands. E-mail: l.renai@uva.nl
bAmsterdam Institute for Life and Environment, Vrije Universiteit Amsterdam, 1081HV, Amsterdam, The Netherlands
cKWR Water Research Institute, 3433BB, Nieuwegein, The Netherlands
dUvA Data Science Center, University of Amsterdam, Amsterdam, The Netherlands
eQueensland Alliance for Environmental Health Sciences (QAEHS), 20 Cornwall Street, Woolloongabba, QLD 4102, Australia
First published on 8th June 2026
Liquid chromatography high-resolution mass spectrometry enables broad chemical detection by comprehensive accurate mass to charge ratio measurement of the components in complex samples; yet, analytical design constrains chemical space measurability. A meta-analysis of 236 methods and over 75
000 measured compounds reveals strong convergence toward reversed-phase separations, limiting the coverage of sample chemical diversity. This “measurability trap” narrows the observable chemical space and can lead to the underrepresentation of many environmentally and biologically relevant compounds.
While HRMS acquisition can register thousands of signals within a single analysis, only a subset of these features reflects compounds that are effectively retained and detected under the chosen chromatographic conditions, and therefore carries meaningful chemical information. This selectivity–measurability bias can create a false perception (i.e., a measurability “trap”) of analytical comprehensiveness, particularly among non-specialist end-users, despite underlying physicochemical constraints, with important implications for exposure assessment. To clearly communicate the actual LC trade-off in holistically capturing the chemical space, we performed a meta-analysis of a large data repository storing methods and retention time information (RepoRT) on small molecules, including exposure-relevant chemicals and metabolites (Fig. 1).7 RepoRT currently covers 438 method entries and over 17
083 unique compounds measured across 49 different LC stationary phases under variable mobile phase conditions (e.g., eluent combination, modifiers, and flow rates) compiled from publicly available datasets and peer-reviewed studies. Importantly, many of the reported chemicals were measured in a targeted manner using authentic reference standards, providing reliable chromatographic information across diverse LC setups. Thus, poorly retained compounds (e.g., sugars and organic acids) on incompatible stationary phases are reported alongside successful retention using alternative separations. Such a dataset representatively defines the practical boundaries of accessible chemical space: targeted methods can probe the extremes of measurability, whereas non-targeted coverage depends on effective retention under the method's selectivity. To systematically investigate chemical space coverage in LC, both the variability of analytical methods and the physicochemical diversity of the measured compounds must be considered simultaneously. Accordingly, the compiled collection from RepoRT was organized into two complementary datasets: one describing homogeneous and comparable LC instrumental configurations (i.e., refined method metadata, n = 236) and one capturing the chemical descriptors of analytes (retention time entries, n = 78
226) retained under those analytical methods (Section S1 of the SI). Together, these datasets reflect how method design and chemical properties jointly define the observable chemical space, although they do not represent all theoretically achievable selectivity modes and compounds. Method meta-data reported eight distinct column chemistry types across the setups, classified by the USP code (Fig. 2a; see also Table S1). The USP system categorizes LC columns according to stationary-phase chemistry.8 L1 (C18) dominates the dataset, accounting for 78% of all setups (n = 186), followed by L122 and L11 (≃8% each; n = 19 and 18). All remaining column types individually represent only ≃1–2%. This distribution confirms the strong predominance of reversed-phase LC (RPLC) selectivity in RepoRT. With C18, phenyl, C8, and pentafluorophenyl phases collectively representing 89% of setups, retention is largely governed by hydrophobic interactions, biasing the measurable chemical space toward moderately polar and non-polar compounds (partition coefficient (X
log
P) between −1 and 6).4,9,10 Only 11% of the methods employ hydrophilic interaction chromatography (HILIC) stationary phases (bare silica, zwitterionic, and alkylamide), suggesting that retention data of polar and very polar analytes are underrepresented. Operational parameters also exhibit high convergence, typically utilizing 100–150 mm columns, standard UHPLC flow rates (0.2–0.4 mL min−1),11 and aqueous/organic gradients with 0.1% formic acid (Fig. S1). Such homogeneity reinforces a systematic bias toward RPLC-compatible compounds. This also reflects in the targeted scope of the reported methods, with 90.3% of setups reporting fewer than 500 analytes, despite seeking higher theoretical single-run capacity (Fig. 2b).
![]() | ||
| Fig. 1 Meta-analysis workflow evaluating the chemical space coverage of the curated LC methods available in the RepoRT repository. | ||
This suggests that current methodological uniformity leaves much of the theoretical peak capacity and chemical space unexploited. However, a closer look at the distribution of the >78 k measured compounds reveals an unexpected trend. Given that RepoRT reports mainly small molecules (mean exact mass 312 Da; Fig. 2c), the X
log
P range covered by the majority of compounds measured across setups is remarkably evenly distributed between polar and nonpolar structures (−10 < X
log
P < 10; Fig. 2d). How, then, can such a substantial fraction of polar and hydrophilic compounds appear to be captured under predominantly reversed-phase conditions? Examination of a representative subset of frequently analyzed compounds in RepoRT (Fig. S2a) shows that several highly polar molecules—such as hexose (X
log
P = −2.6), mannitol (X
log
P = −3.1), and quinic acid (X
log
P = −2.4)—are reported as retained under RPLC setups. Such observations may be representative of applications in targeted analysis, as well as of a missing or incomplete exclusion of early-eluting features in non-targeted protocols. Nevertheless, applying a dead volume threshold to filter out poorly retained species (Section S2 of the supplementary information (SI), Fig. S2b) substantially reduces the apparent RPLC coverage for polar compounds. The presence of these analytes in RPLC datasets indicates that non-targeted chemical coverage in the reversed-phase domain can be easily overestimated, despite HILIC being a more robust method for retaining these highly polar compounds. While such retention data are valuable for defining method boundaries, from a non-targeted perspective, unknown compounds eluting in the dead volume or undergoing breakthrough are unlikely to be reliably detected, as poor retention results in low-quality, noisy MS signals and limited discovery potential.12 To explore relationships between methods and chemical coverage, the two RepoRT datasets were analyzed by principal component analysis (PCA, Section S3 of the SI). The RPLC- and HILIC-based setups generated two distinct clusters in the PC scores' plot described by a moderate explained variance (31.1%, Fig. 3a). This trend is consistent with the contribution of the first three components, largely driven by stationary-phase typology, with clear separation between C18 and zwitterionic columns along PC1 and phenyl-based RPLC along PC3 (Fig. S3 and S4a). Particle size and flow rate further contribute, with larger particles associated with HILIC and higher flow rates with RPLC under UHPLC conditions. PC2 is mainly influenced by eluent composition, delineating RPLC through acidic aqueous phases with strong organic modifiers (e.g., acetonitrile). K-means clustering (Fig. S4b) and the centroid similarity heatmap (Fig. S5) were used to interpret the variables driving this limited separation. Cluster 1 (n = 19) mainly comprises L122 columns and HILIC-specific eluents. Cluster 2 (n = 209) confirms the dominant RPLC group, including C18, C8, and phenyl phases, and shows broader internal variability (low centroid similarity). Clusters 3–5 are distinguished by alternative stationary phases, the use of unconventional organic modifiers (e.g., isopropanol, acetone), and different buffer systems (ammonium formate or phosphate). As expected, the reported compounds across setups highlight that the vast majority of the RepoRT-represented chemical space is dominated by RPLC (Fig. 3b, RPLC dead-volume entries removed). HILIC compounds occupy a partially distinct but strongly overlapping region, indicating that both modes predominantly capture similar physicochemical domains. Most chemical variability is captured along PC1 (65.5%), which positively correlates the increase in exact mass with the increase in sites generating polar interactions (acid–base descriptors and topological polar surface area (TPSA)), but inversely with X
log
P (Fig. S6). This confirms what has been previously demonstrated on unrealistic chemical coverage under RPLC conditions. PC2 (26.85%) further refines this distribution by capturing the combined variation of exact mass and X
log
P, but it does not substantially resolve RPLC and HILIC chemical space overlap. This convergence likely reflects methodological constraints, such as the limited flexibility (i.e., less tunable retention behavior) of broad-gradient HILIC methods, resulting in repeated sampling of the same physicochemical regions rather than true orthogonal expansion of measurability. A better view of these constraints is provided by normalized retention vs. X
log
P and TPSA (Fig. S7), showing that HILIC captures a large fraction of semi-polar and moderately apolar compounds, resulting in substantial overlap with RPLC within the central polarity domain. This depicts how HILIC is often implemented as a complementary “inverse” of RPLC (i.e., switching mobile phase composition) without fully exploiting its distinct separation mechanisms.
Rather than extending measurability, both selectivity modes concentrate on the intermediate descriptor space. Exact-mass distributions reinforce this pattern: RPLC spans a broad mass range, including >1000 Da compounds, whereas HILIC is largely confined below 1000 Da, regardless of polarity (Fig. S8). Overall, no substantial expansion of chemical coverage is observed between RPLC and HILIC. In principle, HILIC would be expected to shift measurability toward highly polar chemicals; yet, such a pronounced displacement is not evident, due in part to the imbalanced methods' reporting, the limited representation of optimized HILIC data, and biases toward available analytical standards also contribute to this trend. To contextualize the chemical space covered by curated LC methods, the RepoRT compounds were projected against the U.S. EPA CompTox Chemistry Dashboard (≃800 k chemicals representing an approximation of the exposome chemical space) in the same physicochemical descriptor space (Fig. 3c and Fig. S9). The RepoRT compounds occupy only a confined subregion of the broader CompTox chemical space, Fig. 3c. While substantial overlap exists in the central PC domain, large areas of CompTox characterized by high polarity (high TPSA and H-bond capacity), extreme hydrophobicity (high X
log
P), and very large molecular weights remain entirely unrepresented. Assuming that RepoRT is a good sample of chemical LC-HRMS measurability, the detectable chemical space does not cover the maximum sample diversity, but results in a projection constrained by a poorly exploited selectivity. Due to methodological convergence, analyses repeatedly capture well-characterized regions of chemical space, while others remain largely inaccessible.1 Although combining RPLC with HILIC is often proposed as a strategy to enhance orthogonality, our meta-analysis suggests that, within the currently reported methods, this expansion remains modest. Currently, the available data on HILIC do not substantially displace coverage toward the highly polar domain.13 Although incorporating additional chromatographic modes (e.g., SFC or IC) can extend chemical space coverage, the resulting gains remain incremental relative to the vast theoretical chemical universe, with no combination of current approaches achieving comprehensive measurability due to RP-centered compound variability, inter-platform correlation biases, and implementation incompatibilities for such orthogonal multidimensional workflows.14 This constraint is not purely physicochemical but also methodological: analytical practice is biased toward compounds that can be confirmed with reference standards. As a result, reported chemical space largely reflects known compounds, while unknown features, defining the frontier of measurability, remain underrepresented. Rather than pursuing unreliable comprehensive coverage, method-specific measurability domains should be explicitly defined and quantified, and chemical space coverage considered alongside sensitivity and mass accuracy as a key performance metric (e.g., by predicting fractional coverage and mapping the measurable structural/physicochemical property boundaries).5,13 Further future best practices should prioritize systematic reporting of unknown features and the development of continuously updated repositories capturing evolving tentative structures beyond currently recognized compounds.15 Only by documenting both what is observed and what remains unseen, non-targeted analysis and exposomics can move beyond the measurability trap toward a genuinely exploratory strategy. Such chemical coverage-directed expansion of chromatographic diversity and repository may redefine these measurable domains and modify the trends currently observed in LC-HRMS chemical-space accessibility.
Supplementary information (SI): supporting tables, figures, and meta-analysis additional details are available in this document. See DOI: https://doi.org/10.1039/d6cc02811j.
Footnote |
| † These authors contributed equally to this work. |
| This journal is © The Royal Society of Chemistry 2026 |