Zixuan
Zhang
a,
Xiaogang
Lu
*a,
Meng
Jin
ab,
Runli
Gao
a and
Hongmei
Wang
*a
aState Key Laboratory of Chemistry for NBC Hazards Protection, Beijing 102205, China. E-mail: luxg2018@sina.com; hongmei_ricd@yeah.net
bSchool of Chemical and Pharmaceutical Engineering, Hebei University of Science &Technology, Shijiazhuang, 050000, China
First published on 3rd June 2025
The chemical identification of precursor synthesis pathways is crucial for enforcing the Chemical Weapons Convention (CWC) by facilitating the forensic tracking of organophosphorus nerve agents. This study introduces the initial systematic impurity-profiling platform for methylphosphonothioic dichloride, a critical precursor of V-series CWC-controlled substances. Our analysis identified 58 unique compounds, offering valuable insights using comprehensive two-dimensional gas chromatography/time-of-flight mass spectrometry in conjunction with advanced chemometric workflows. We devised a hierarchical analytical approach: (1) unsupervised pattern recognition (HCA/PCA) revealed the inherent clustering of two primary synthetic pathways, (2) oPLS-DA modeling achieved 100% classification accuracy (R2 = 0.990) with 15 VIP-discriminating features, and (3) rigorous validation through permutation tests (n = 2000) and external samples (n = 12) demonstrated 100% prediction accuracy. Notably, traceability was established at impurity levels as low as 0.5%, exceeding the OPCW verification standards. The established impurity database, in combination with the dual-mode chemometric approach, provides a robust framework for identifying chemical warfare-related precursors.
Although chemical warfare agents have been prohibited, ongoing threats persist.10,11 Characteristic impurities are crucial for tracing toxins, particularly in identifying the synthetic pathways of chemical warfare agents,12,13 which is highly significant in chemical safety. The investigation of precursors is essential for attributing synthetic pathways to compounds, as key precursor compounds can offer conclusive evidence for tracking toxic agents.14 In 2010 Fraga et al.15 integrated the bioinformatics tool XCMS with chemometrics methods to analyze liquid chromatography-mass spectrometry (LC-MS) data from methylphosphonyl dichloride (DC) samples for source traceability. Subsequently, in 2018, Fraga et al.16 extended their analysis of DC synthesis and identified that commercially synthesized DC, methylphosphonic difluoride (DF), or methylphosphonic acid (MPA) was derived from DC. In 2021, Lu et al.14 conducted a comprehensive study of the potential classification of the crucial tabun precursor, N,N-dimethylphosphoramidic dichloride (DMPADC), using GC-MS for 27 samples from three synthetic pathways.
Methylphosphonothioic dichloride (MPTDC, CAS# 676-98-2) is commonly used as a primary raw material for phosphorylation reagents in organic synthesis. It serves various purposes such as fungicides, flame retardants, and surfactants.17,18 This compound is a vital precursor for organophosphorus nerve agents, particularly the V-series.19,20 Owing to its classification as a recognized precursor for chemical weapons and its limited non-weapon applications, it was categorized as a Schedule 2 compound by the Organisation for the Prohibition of Chemical Weapons (OPCW).21 Despite its significant role, information on the chemical properties of MPTDC is scarce. Therefore, conducting chemical attribution research on MPTDC is essential for forensic tracing of toxic agents and for supporting efforts to prevent the illicit use of chemical warfare agents.
In this study, two synthetic routes were explored. Gas chromatography/time-of-flight mass spectrometry (GC × GC-TOFMS) was used for monitoring and analysis. Chemometrics was integrated to apply both supervised and unsupervised models. Impurities suitable for identification were classified using the obtained chemical information, achieving the chemical identification of MPTDC (Fig. 1).
Three parallel samples were synthesized using the designated methods, achieving purities ranging from 80% to 92%. Route a entailed synthesizing DC from DMMP and chlorosulfoxide, followed by the production of MPTDC using DC and and P2S5.22 This solvent-free route was utilized. Route b initiated with the production of methyl dichlorophosphine from phosphorus trichloride and iodomethane under anhydrous and oxygen-free conditions. Subsequently, methyl phosphorothioformyl dichloride was created from methyl dichlorophosphine and sublimated sulfur, catalyzed by AlCl3 in a nitrogen atmosphere using toluene as the solvent.22
The synthesized samples were analyzed in the electron ionization (EI) mode using an Agilent 7693 A-8890 GC/5977B mass spectrometer. The detector was equipped with a DB-5MS chromatographic column (30 m × 0.25 mm × 0.25 μm, Agilent Technologies). A 1 μL sample was injected in splitless mode at 250 °C. Helium served as the carrier gas with a flow rate of 1 mL min−1. The GC column oven started at 50 °C and then heated at a rate of 15 °C min−1 to 280 °C, where it was held constant for 2 min. The total analysis time for the sample was 15.33 min, with a solvent delay of 2.50 min. The mass spectrometer scanned the range of 50–500 m/z at a speed of 3.21 scans per s. The transmission line, ion source, and quadrupole temperatures were maintained at 280 °C, 230 °C, and 150 °C, respectively.
Compounds were identified by comparing the mass spectra with the NIST 17 (v2.3) and OCAD (v.20_2018) libraries, supplemented by relative retention indices. When reliable matches were lacking, molecular structures were inferred through mass spectrometry fragment analysis. Peak area normalization was employed to determine the relative compound proportions. Potential CAS numbers were assessed to eliminate interferences such as solvent peaks and column bleeds. For qualitative assessment, peak centers were utilized for low-concentration compounds, while peak edges were employed for high-concentration compounds. Validated CAS numbers specific to synthetic pathways and not present in control samples were organized into a peak table, with samples as rows and impurities as columns.
The system automatically screened compounds detected in the peak list, excluding solvents and starting materials, based on their presence in the blank samples. Compounds appearing more than twice in the samples and with a match score exceeding 800 from the NIST library matches were prioritized.
The results of GC × GC-TOFMS facilitated enhanced understanding of these compounds. Fig. 5 illustrates notable distinctions in the detected compounds between the two pathways. The GC × GC-TOFMS spectra utilized a third dimension (comprising color shades or heights) to depict signal intensity, with the X and Y coordinates denoting the retention times of the two columns, Rt1 and Rt2 (retention time, min), respectively.
Table 1 presents 58 CASs screened to identify potential candidates with high matches for MPTDC attribution. Certain impurities were found to be crucial in distinguishing between the two synthesis pathways. Specifically, methyl phosphate compounds were uniquely associated with Route a, likely due to the initial material, DMMP. The presence of these methyl phosphate compounds indicates that the sample originated from Route a. Moreover, impurities common to both routes include specific sulfur compounds (e.g., compounds 12 and 26) and organophosphorus compounds (e.g., compounds 15 and 53). These compounds are generated during synthesis when sulfur elements are introduced, particularly through the use of elemental sulfur and P2S5. While these shared impurities may not directly indicate the route, they confirm the sample's association with this chemical system and offer insights into the general synthetic approach.
Entry | Name | MWa | Rt1a | CAS numberb |
---|---|---|---|---|
a MW: molecular weight; Rt1: retention time 1. b CAS number: “—” for compounds with no CAS number. | ||||
1 | 2-Methyl-3-buten-2-ol | 86.07 | 38.88 | 115-18-4 |
2 | 3-Methyl-2-buten-1-ol | 86.07 | 42.12 | 556-82-1 |
3 | 2-Butene-1,4-diol | 88.05 | 35.07 | 110-64-5 |
4 | 2-Propanone, 1-cyclopropyl- | 98.07 | 37.57 | 4160-75-2 |
5 | 1,3,5-Trioxepane | 104.04 | 37.73 | 5981-06-6 |
6 | Aminomethanesulfonic acid | 110.10 | 28.92 | 13881-91-9 |
7 | 1,2-Pentadiene, 4-methoxy-4-methyl- | 112.08 | 36.07 | 49833-91-2 |
8 | Methylmalonic acid | 118.02 | 40.57 | 516-05-2 |
9 | Thiopropionic acid, S-ethyl ester | 118.04 | 35.23 | 2432-42-0 |
10 | Dimethyl methylphosphonate | 124.03 | 5.32 | 756-79-6 |
11 | N-Nitroso-N-methyl-3-aminopropionic acid | 132.05 | 39.37 | 10478-42-9 |
12 | S-Methyl-L-cysteine | 135.03 | 47.29 | 1187-84-4 |
13 | Phosphoric acid, trimethyl ester | 140.02 | 6.55 | 512-56-1 |
14 | Methyl dichlorophosphate | 147.92 | 5.47 | 677-24-7 |
15 | O,O,O-Trimethyl thiophosphate | 156.00 | 7.80 | 152-18-1 |
16 | Phosphorothioic acid, O,O,S-trimethyl ester | 156.00 | 11.22 | 152-20-5 |
17 | Phosphorochloridothioic acid, O,O-dimethyl ester | 159.95 | 7.38 | 2524-03-0 |
18 | 3,3-Dimethylglutaric acid | 160.07 | 40.23 | 4839-46-7 |
19 | DL-Ethionine | 163.06 | 40.57 | 67-21-0 |
20 | Phosphorodichloridothioic acid, O-methyl ester | 163.90 | 6.55 | 2523-94-6 |
21 | 1,4′-Bipiperidine | 168.16 | 6.48 | 4897-50-1 |
22 | Phosphorodithioic acid, O,O,S-trimethyl ester | 171.98 | 13.30 | 2953-29-9 |
23 | Phosphorodithioic acid, O,S,S-trimethyl ester | 171.98 | 16.47 | 22608-53-3 |
24 | 2-Ethylhexanal ethylene glycol acetal | 172.14 | 46.51 | — |
25 | Diethyl succinate | 174.09 | 33.48 | 123-25-1 |
26 | Methylsulfonyl-methanesulfonyl chloride | 191.93 | 11.13 | 22317-89-1 |
27 | Tri(propylene glycol) methyl ether | 206.15 | 39.63 | 25498-49-1 |
28 | 2,4-Di-tert-butylphenol | 206.17 | 22.07 | 96-76-4 |
29 | 3,5-Dimethoxycinnamic acid | 208.07 | 45.37 | 16909-11-8 |
30 | 4-Morpholinopropanesulfonic acid | 209.07 | 26.29 | 1132-58-2 |
31 | 9-Ethyl-9H-carbazol-3-ylamine | 210.11 | 32.82 | 132-32-1 |
32 | Benzoxazol-5-amine, 2-(2-pyridyl)- | 211.07 | 31.01 | 58431-37-6 |
33 | 2,5-Diethoxy-3-methyl-3-vinylhexane | 214.19 | 33.32 | — |
34 | Diphenylmethylphosphine oxide | 216.07 | 31.07 | 2129-89-7 |
35 | 1-(3,4-Dichlorophenyl)-3-methylurea | 219.07 | 8.13 | 3567-62-2 |
36 | 2-Hydroxy-10,11-dihydro-5H-dibenzo[a,d]cyclohepten-5-one | 224.08 | 34.32 | 17910-73-5 |
37 | 2-Methoxyacridin-9-ol | 225.08 | 20.73 | 857574-55-1 |
38 | 4-Methoxyacridin-9-ol | 225.08 | 23.82 | 73663-88-4 |
39 | 2,4-Dimethyl-6-(1-phenylethyl)phenol | 226.31 | 42.48 | 92673-75-1 |
40 | 3,4′-Isopropylidenediphenol | 228.11 | 19.87 | 46765-25-7 |
41 | Clomazon | 239.07 | 36.23 | 81777-89-1 |
42 | Trichlorfon | 255.92 | 8.29 | 52-68-6 |
43 | 6-Ethoxy-4-methyl-3-phenylcoumarin | 280.11 | 24.82 | 263365-04-4 |
44 | Mefexamide | 280.18 | 27.38 | 1227-58-8 |
45 | 2,6′-Dimethoxy-2′-hydroxychalcone | 284.10 | 26.03 | 1435451-87-8 |
46 | 3,4′-Dimethoxy-2-hydroxychalcone | 284.10 | 15.88 | 18778-37-5 |
47 | 2,5-Dimethoxy-2′-hydroxychalcone | 284.10 | 15.07 | 5452-99-3 |
48 | 6,2′-Dimethoxy-3-hydroxyflavone | 298.08 | 28.17 | 1203801-43-7 |
49 | 6-Isopropoxy-9-oxoxanthene-2-carboxylic acid | 298.08 | 23.73 | 33458-93-4 |
50 | 1,4-Piperazinediethanesulfonic acid | 302.37 | 19.38 | 5625-37-6 |
51 | 5-Benzoyl-4-hydroxy-2-methoxybenzenesulfonic acid | 308.03 | 24.57 | 4065-45-6 |
52 | 2-Propenethioamide, 3-[3,5-bis(1,1-dimethylethyl)-4-hydroxyphenyl]-2-cyano-, (2E)- | 316.16 | 7.13 | 148741-30-4 |
53 | Bithionol | 353.88 | 18.73 | 97-18-7 |
54 | 3,4,2′,4′,6′-Pentamethoxychalcone | 358.14 | 32.29 | 76650-20-9 |
55 | 2′,3,4,5,6′-Pentamethoxychalcone | 358.39 | 21.87 | 944447-14-7 |
56 | Thiencarbazone-methyl | 390.03 | 23.88 | 317815-83-1 |
57 | 4′,5,7-Trihydroxy 3,3′,6,8-tetramethoxyflavone | 390.09 | 18.62 | 58130-91-9 |
58 | 14-O-Acetyldaunomycinone | 456.10 | 7.87 | 29984-41-6 |
To simplify guidelines, we suggest that analysts initially screen for route-specific impurities. For instance, identifying methyl phosphate compounds specific to Route a can promptly indicate the route. If these compounds are absent, subsequent analysis may concentrate on common impurities to validate the sample's connection with either route. This sequential method streamlined the analysis, offering dependable data to differentiate between the two routes.
Among the compounds detected, we identified derivatives of aromatic compounds introduced into the solvents. These include compounds 45, 46, 48, and 54, which are derivatives of the chalcone found in toluene.27–29 Additionally, compounds 43 and 49 exhibit a benzoxy heterocyclic structure. Route a utilized no solvents, while Route b utilized toluene as the solvent. Consequently, impurity derivatives found in toluene, such as oxides and other alkoxy-substituted compounds, can serve as characteristic compounds for subsequent data and model development. This highlights the significance of investigating factors such as solvents.
During our thorough analysis of impurities in the MPTDC synthesis process, we identified compounds such as sildenafil, bromosporine, and tecloftalam whose presence could not be explained by the synthetic routes used. Despite employing this method and solvent blanks in our analytical protocol, the appearance of these compounds remains puzzling and requires further investigation. We suspect that these unexpected impurities may stem from various sources, such as unexpected by-products of synthesis reactions and environmental contamination. Environmental factors, including storage and transfer methods, need to be considered during the experimental process.5,30,31 Impurities were introduced at each experimental step. The potential of these impurities to act as CASs is a subject of debate and requires further examination for more precise attribution. Long-chain olefins and alkanenitrile compounds such as octadecanitrile were found in most samples but were not identified as CASs in this study. Their presence is likely due to their use as plasticizers and surfactants,32 possibly introduced during sample handling. Interestingly, these compounds were absent in samples transferred using glass pipettes, indicating that their introduction was linked to specific handling procedures rather than the inherent chemical composition of the samples.33,34
Hierarchical cluster analysis (HCA) is a clustering algorithm that constructs nested trees by evaluating similarity between data points, commonly using Euclidean distance. After applying different pretreatment methods to all the samples, normalization yielded the most satisfactory classification. Within the hierarchical framework of the analysis, each sample type initially formed its own cluster, which then progressively merged with other clusters. Cluster analysis results were visualized as dendrograms at varying levels of granularity. These dendrograms reveal that samples from a particular class consistently have nearest neighbors from the same class.
The HCA dendrogram (Fig. 6) from the CAS data of each MPTDC sample demonstrates that the samples can be precisely grouped using the CAS associated with their synthetic pathways. The vertical axis indicates sample similarity or distance, with lower values denoting higher similarity or closer proximity, while the horizontal axis represents the MPTDC samples and illustrates how they merge into clusters based on similarities. Notably, all samples were correctly classified, achieving 100% accuracy. Route a samples (a-1, a-2, and a-3) clustered together, and Route b samples (b-1, b-2, and b-3) formed a separate cluster, clearly distinguishing between the two routes. This highlights the effective discriminatory capability of CASs between the two synthetic pathways.
Subsequently, we utilized principal component analysis (PCA) on the dataset in an unsupervised manner. PCA is a robust multivariate analysis method that reduces numerous correlated variables into a few principal components (PCs), capturing most of the dataset's variation. In the two-dimensional PCA plot for the MPTDC samples (Fig. 7), the X-axis (PC1) represents the first principal component, explaining 25.1% of the total variance, while the Y-axis (PC2) represents the second principal component, explaining 40.7% of the total variance. The percentages indicate the proportion of total variance explained by each principal component. Each data point in the plot corresponds to an individual MPTDC sample, with Route a samples shown in pink and Route b samples in green. The ellipses show 95% confidence intervals for each route. As shown in Fig. 7, the PCA model effectively distinguished the samples from the two routes, with a clear separation between the pink and green clusters. This indicates that the PCA model successfully captured the differences in CASs between the two synthetic routes.
The variable importance plot of 58 CASs of MPTDC samples is shown in Fig. 8, ranking features based on their contributions to classification accuracy (mean decrease accuracy, MDA). The selected characteristic compounds for the two synthetic pathways added substantial value to the model, clearly differentiating Route a from Route b. Notably, 13 compounds had MDA values below 0.01, signifying their limited impact on the model's classification accuracy.
Based on the constructed data matrix, we also utilize a classification model supervised by orthogonal partial least squares discriminant analysis (oPLS-DA). This method integrates PLS-DA with orthogonal signal filtering to enhance the separation of information unrelated to predefined categories from the original matrix. As a result, oPLS-DA optimizes the distinction between sample groups by reducing within-group differences and emphasizing between-group variances, making it highly effective in differentiating two sample groups. Based on these findings, we constructed the oPLS-DA model using the data. The results demonstrated that the oPLS-DA score plot (Fig. 9) enables a clear classification of the two sample classes, validating the robustness of our feature compound selection and endorsing the feasibility of route attribution.
Supervised classification models are prone to overfitting; that is, the model performs well on training data but poorly on unknown samples. Therefore, it is crucial to thoroughly validate the reliability of supervised classification models. Permutation testing is a widely used method for model evaluation, involving the random disruption of group labels for each sample, followed by modeling and prediction. R2X and R2Y represent the explanatory power of the model in the X and Y matrices, respectively, while Q2 indicates the predictive ability of the model. Ideally, the model's performance is better when R2 and Q2 values are closer to 1. The validation of the oPLS-DA model, depicted in Fig. 10, stabilized Q2 = 0.785 and R2 = 0.990 after multiple training sessions. These results indicate that the model is both reliable and robust, with strong predictive capabilities.
In this study, an external test set was used to validate the performance of the oPLS-DA model. The test set comprised six unknown samples of MPTDC. As shown in Fig. 11, these samples were classified with 100% accuracy. The predicted values for both the training set and the test set confirmed that the validation samples were correctly assigned to their respective categories, demonstrating the high accuracy and robustness of the oPLS-DA model. The practical detection capability was confirmed using external samples with impurities as low as 0.5% abundance,35 All markers remained distinguishable with >95% peak intensity reproducibility (RSD < 15%, n = 6). The empirical detection limit was established at 0.5% abundance, based on consistent identification in external validation samples.
Future research will focus on investigating the stability of CASs in the synthesized MPTDC samples, which is crucial for understanding the long-term applicability of chemical attribution methods. Extending these findings, we aim to enhance our understanding of the traceability of other related compounds. This will further enhance the scope of chemical attribution analyses, providing new insights and tools for identifying and tracking toxic substances and their precursors.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5ay00870k |
This journal is © The Royal Society of Chemistry 2025 |