Open Access Article
Manh Huy Nguyen
ab,
Thanh Dam Nguyen
ab,
Hong Anh Duong
ab and
Hung Viet Pham
*b
aResearch Centre for Environmental Technology and Sustainable Development (CETASD), Faculty of Chemistry, VNU University of Science, Vietnam National University, Hanoi, 334 Nguyen Trai, Thanh Xuan, Hanoi 100000, Vietnam
bKey Laboratory of Analytical Technology for Environmental Quality and Food Safety Control (KLATEFOS), VNU University of Science, Vietnam National University, Hanoi, 334 Nguyen Trai, Thanh Xuan, Hanoi 100000, Vietnam. E-mail: vietph@vnu.edu.vn
First published on 18th May 2026
In this study, an integrated hurdle modelling workflow combining ATR-FTIR and machine learning was developed for the simultaneous quantification of levodopa (LD), carbidopa (CD), and benserazide (BZ) in anti-Parkinson's medication. A calibration set of 103 synthetic pellets was prepared with wide concentration ranges for the three active pharmaceutical ingredients (APIs), 0–80% w/w for LD, 0–40% w/w for BZ; and 0–8% w/w for CD, and analysed by ATR-FTIR and HPLC-DAD reference method. Nine pellets prepared from three commercial medications were used for external evaluation and calibration transfer. A two-stage hurdle model was employed for each analyte, including a classifier for presence/absence and a regressor fitted only on positive samples to predict concentration. The modelling workflow comprised four modules, in which the Savitzky–Golay first derivative, combined with the standard normal variate (SNV) on mean spectra, was the most suitable preprocessing method. Logistic regression for LD and BZ, and supported vector classification (SVC) for CD were selected as optimal classifiers, while random forest (RF) regression in the half spectral region (675–1800 cm−1) was the best overall regressor for all APIs. When applied to commercial samples, the model classifiers correctly identified the presence/absence of all APIs, and calibration transfer using only six additional commercial pellets significantly reduced bias for all three APIs and achieved a high AGREE green score (0.75) at the same time. These results show that ATR-FTIR, combined with a carefully designed hurdle model workflow, can provide a rapid and green screening tool for multi-API in anti-Parkinson's medications.
Because these medicines are administered chronically and often by vulnerable patients, strict quality control is required. This requirement is particularly important in low- and middle-income countries like Vietnam, where substandard and falsified medicines have been reported with increasing frequency and can lead to therapeutic failure or toxicity. A variety of analytical techniques have been proposed for the determination of LD, CD and BZ in pharmaceutical products. High-performance liquid chromatography (HPLC) is recommended in major pharmacopoeias as the reference method owing to its high sensitivity and selectivity.11–17 HPLC-UV and LC-MS/MS methods can provide excellent trueness and precision, but these methods usually involve several steps of sample preparation and the use of noticeable volumes of organic solvents. Although efforts have been made to develop greener HPLC approaches using aqueous mobile phases,18 analysis times remain relatively long, making such methods less suitable for rapid screening of large numbers of samples.
Recently, vibrational spectroscopy combined with chemometrics has attracted considerable attention as a route towards rapid, low-solvent and potentially greener analysis.19 Among these techniques, attenuated total reflection Fourier transform infrared spectroscopy (ATR-FTIR) is particularly attractive for solid oral dosage forms due to non-destructive, fast measurements and minimal sample preparation.20–24 As a result, ATR-FTIR is compatible with process analytical technology concepts and can, in principle, be integrated into manufacturing or quality-control workflows.25 Several studies have shown that FTIR or ATR-FTIR, when coupled with multivariate calibration methods such as principal component analysis (PCA), partial least squares (PLS) or principal component regression (PCR), can be used for quantitative analysis of active ingredients and impurities in tablets, powders and biological samples.26–31 In many cases, the analytical performance with ATR-FTIR has been sufficient for screening purposes and, in some situations, for partial replacement of conventional wet-chemical methods.
In the field of pharmaceutical quality control, ATR-FTIR and related techniques have already been applied to a variety of tasks. Rapid quantification of paracetamol in tablets and detection of substandard or falsified products have been reported using reflectance IR in combination with multivariate models.32 Other studies have relied on FTIR/ATR-FTIR, combined with more advanced chemometric tools, to differentiate between expired and compliant tablets or to characterise counterfeit products containing undeclared active substances.33 These examples confirm that ATR-FTIR, when combined with chemometrics, can provide a fast and non-destructive means of screening pharmaceutical products within a green-analysis framework.
For LD based pharmacotherapy, a range of analytical procedures alternative to HPLC has also been investigated, including UV-vis spectrophotometry,34,35 fluorescence spectroscopy,36 capillary electrophoresis,18,37–39 and electroanalytical methods.40–42 UV-vis methods combined with PLS or PCR, as well as voltametric procedures assisted by multivariate calibration, have been reported for binary and ternary mixtures containing levodopa and related drugs.34,35 With respect to ATR-FTIR, simultaneous determination of LD and CD in LD–CD formulations has been described in solution using multivariate models built on selected wavenumber regions.43 These works demonstrate that ATR-FTIR has potential for analysing LD/CD systems. However, they mainly concerned binary mixtures, relatively simple matrices or solution phase measurements, and did not address the transfer of calibration models from laboratory mixtures to commercial tablets with complex excipient compositions.
In parallel, a second line of development has focused on classification applications rather than quantitative analysis. ATR-FTIR and FTIR have been employed, often in combination with pattern recognition methods and modern classifiers, to distinguish compliant from non-compliant products, detect expired lots, or flag counterfeit medicines. Most of these studies focus on qualitative discrimination or on quantifying a single active ingredient. The simultaneous determination of several active substances in Parkinson's tablets, in the presence of complex excipient backgrounds and with clear differences between calibration and commercial samples, has not been studied in depth.
A further challenge arises when models constructed under controlled laboratory conditions are transferred to routine analysis. Differences in excipient type and ratio, particle size, compaction force, moisture content and contact conditions can all influence ATR-FTIR spectra of tablets. Calibration models built on synthetic mixtures with a single excipient profile may therefore lose predictive ability when they are applied to tablets from different manufacturers. This situation can be described as calibration transfer and domain mismatch. Recent work in pharmaceutical spectroscopy has emphasised the importance of model updating and calibration transfer strategies in order to mitigate these effects while maintaining the validity of predictions.
In view of these considerations, the present study was conducted to develop an integrated ATR-FTIR/machine learning workflow for simultaneous identification and quantification of LD, CD and BZ in Parkinson's tablets. A two-stage hurdle structure was used so that the presence/absence of each analyte and its concentration can be treated separately. The workflow included replicate level quality control, systematic comparison of preprocessing pipelines, nested cross-validation for model selection and assessment of calibration transfer from synthetic to commercial tablets. By addressing both the analytical and modelling aspects, the study aims to provide a framework that can be adapted to other multi-API tablet systems in pharmaceutical quality control.
Common tablet excipients, including microcrystalline cellulose, mannitol, magnesium stearate, povidone, talc and maize starch, were supplied by the Vietnam National Institute of Drug Quality Control (Hanoi, Vietnam). These substances were used to prepare model mixtures that mimic the excipient matrix of commercial products. Hydrochloric acid (37%), sodium dihydrogen phosphate (NaH2PO4), phosphoric acid (85%) and ethanol (analytical grade) were purchased from Merck (Germany). Deionised water was produced by a Milli-Q system (Millipore, USA) and was used for the preparation of mobile phases and standard solutions in HPLC analyses.
Four types of synthetic pellets were produced, including excipients-only (only 1 sample), single API formulations (17 samples, including 3 LD-only samples, 7 BZ-only samples, 7 CD-only samples), binary combinations (59 samples, including 25 LD/BZ samples, 25 LD/CD samples, and 9 BZ-CD samples), and ternary mixtures (26 samples). In this way, a wide concentration range was obtained within a controlled excipient profile, i.e., LD: 0–80% w/w; BZ: 0–40% w/w; and CD: 0–8% w/w. These concentration ranges were chosen to ensure coverage of at least 50–150% of the API content in commercial tablet formulations. The nominal and HPLC-DAD measured API contents in these synthetic pellets are presented in Table S1 in the SI. For the synthetic pellets, a zero concentration of a given API corresponds to a formulation in which that API was not added to the blend; in these samples, the corresponding HPLC-DAD peak was below the lower limit of the calibration range and was therefore treated as true absence.
Each pellet was analysed at four different positions on the surface, and two replicate spectra were recorded at each position. The same protocol was applied to synthetic and commercial pellets to ensure comparability between domains. In total, 824 spectra of synthetic pellets and 72 spectra of commercial pellets were acquired.
The method showed linearity in the range 10–500 mg L−1. Intra-day repeatability and inter-day precision, expressed as relative standard deviations of peak areas, were below 1% and 5%, respectively. Accuracy was evaluated by standard addition experiments in excipient matrices, with recoveries ranging from 92% to 105%.
For pellet analysis, an accurately weighed portion of each compressed sample was transferred into a volumetric flask and dissolved in 5 mL of 0.1 M H3PO4. The resulting solution was diluted to bring the concentrations within the calibration range and filtered through a 0.22 µm PTFE membrane before injection into the HPLC system. The concentrations measured by HPLC-DAD were used as reference values for all modelling steps.
To reflect this structure, a two-stage hurdle model was adopted. For each analyte, a classifier was first applied to distinguish presence (concentration y > 0) from absence (y = 0). A regression model was then fitted using only the samples with y > 0. During prediction, a zero value was assigned when the classifier indicated absence; otherwise, the output of the regressor was reported as the predicted concentration. In this way, classification and regression tasks were separated, and the effect of structural zeros was handled explicitly.
The overall modelling workflow was organised into four modules: (i) quality control of spectral replicates and aggregation of representative spectra for each sample; (ii) systematic screening of preprocessing pipelines; (iii) selection and optimisation of classifiers and regressors within the hurdle framework; and (iv) application of the final model to commercial tablets with and without calibration transfer (Fig. 1). Repeated nested cross-validation with grouping by sample ID (3 repeats × 5 outer folds) was employed in modules 2 and 3 to separate model selection from performance estimation and to prevent information leakage between replicates within the same tablet.
![]() | ||
| Fig. 1 Flowchart of the integrated ATR-FTIR and machine learning workflow for simultaneous quantification of LD, CD, and BZ in Parkinson's medication. | ||
After outlier rejection, mean and median spectra were calculated for each pellet ID. These representative spectra, together with the corresponding reference concentrations, formed two cleaned datasets (mean based and median based) that were used in the subsequent evaluation of preprocessing pipelines. The mean-based and median-based dataset is provided in the SI data file.
Performance was assessed by repeated nested cross-validation with grouping by pellet ID (3 repeats × 5 outer folds). In the inner loop, the regularisation parameter C for logistic regression and the number of latent variables for PLS were tuned (Table S2). In the outer loop, classification and regression metrics were calculated. For the classifier, balanced accuracy (BAcc), F1 score and area under the ROC curve (AUC) were recorded. For the regression, three error measures were used, including RMSE0 for truly zero samples, RMSEpos for all truly positive samples, and RMSEpos,TP for positive samples correctly classified as positive.
To compare pipelines, a rank was assigned to each configuration within each outer fold based on a composite criterion that combined the three classification metrics and the three regression errors, as described in SI Text S1. Mean ranks and their standard deviations were then calculated across the 15 outer tests for each pipeline. Pipelines with low mean rank and low variability were regarded as more favourable. The pipeline that offered the best compromise between classification and regression across the three analytes was selected for use in module 3.
Classifier performance was evaluated in the outer folds using BAcc, F1 score and AUC. For each analyte region combination, a combined rank was calculated as a weighted sum of the ranks for BAcc and F1 score, with a higher weight assigned to balanced accuracy (Text S2). This calculation reflected the role of the classifier as the entry gate to the hurdle model, where balanced treatment of present and absent classes is important. The classifiers with the lowest overall rank for a given analyte and spectral region were retained for further consideration.
The regressors were evaluated in the outer folds using RMSE0, RMSEpos and RMSEpos,TP. For each configuration, composite ranks were calculated from these three error measures, and the averages and standard deviations of the ranks were determined across outer folds (Text S3). Models were compared separately for each analyte and also at a global level. Special attention was paid to the difference between single output and multi output modes and to the stability of the rankings.
In this step, the combined model was optimised end to end. During the inner loop of the nested cross-validation, hyperparameters of both the classifier and regressor components, as well as the classification thresholds, were adjusted simultaneously according to a multi-criteria objective (Table S4). The latter included constraints on RMSEpos in addition to the classification metrics. For each parameter, a consensus value was obtained by taking the median (for continuous parameters) or the most frequent value (for categorical parameters) across the outer folds. These consensus settings were then used to refit the hurdle model on the full synthetic dataset before application to commercial samples.
For commercial samples, two scenarios were examined. In the first scenario, the model was refitted on the full synthetic dataset and applied directly to the ATR-FTIR spectra of three selected commercial pellets (S1–S3) from three brands without any further adjustment. Before refitting, the classification thresholds were updated using out of fold predictions on the synthetic domain and by maximising Youden's J statistics.
In the second scenario, a calibration transfer strategy based on repeated product balanced splits was used. The ATR-FTIR spectra from six remaining commercial pellets (two additional batches per product brand) served as a small transfer set. 100 random splits were generated. In each split, one spectrum from each batch per product brand was assigned to the calibration transfer training subset (three spectra in total), and the remaining three batches were used as the calibration transfer test subset. The synthetic dataset (103 samples) and the three additional training samples were combined and used to update the hurdle model. The updated model was then applied to the three additional test samples and to the external evaluation samples.
For each analyte and each commercial batch, prediction errors with respect to HPLC UV reference values were calculated. For the direct application scenario, single point estimates were obtained for S1–S3. For the calibration transfer scenario, the distributions of predicted values across the 100 splits were summarised by the median and the interquartile range. Recall, precision and the confusion matrix of the classifier component were also examined to verify that errors in presence/absence assignment did not dominate overall prediction performance.
O stretching), 1515 cm−1 (N–H bending), 1120–1260 cm−1 and 720–870 cm−1 (C–O stretching and out of plane C–H bending). A broad band near 3300 cm−1 was attributed to O–H stretching of the phenolic group. On the other hand, the CD spectrum showed a similar overall shape, but with shifts to slightly lower wavenumbers. Prominent peaks were seen at approximately 1705 cm−1 (C
O), 1620 cm−1 (N–H) and 1220 cm−1 (C–O). Moreover, several intense bands were present in the BZ spectrum, including 1660 cm−1 (amide C
O stretching), 1600 cm−1 (aromatic C
C stretching) and 1450 cm−1 (C–N–H bending). A broad absorption between 3200 and 3400 cm−1 arose from overlapping O–H and N–H stretching vibrations, which indicated strong hydrogen bonding. In the excipient mixture, characteristic bands were located at ∼3400 cm−1 (O–H stretching of cellulose and starch), 2850–2920 cm−1 (C–H stretching) and strong peaks near 1450 cm−1 and 1050 cm−1 (C–O–C and C–O–H vibrations). These spectral features confirmed that the fingerprint and half regions contain the main signals that distinguish the three APIs from the excipient background.
Overall, there was substantial spectral overlap, particularly between LD and CD in the 1100–1600 cm−1 region, arising from their similar hydroxyl, amine, and carboxyl groups. Moreover, excipients, mainly cellulose-based materials, exhibit strong absorptions within 900–1200 cm−1, overlapping with key vibrational regions of the APIs and introducing additional background interference. Such complexity limits the applicability of conventional univariate approaches based on single-peak intensity measurements. Therefore, advanced chemometric and machine learning strategies are required to resolve overlapping information and capture subtle multivariate relationships embedded in the full spectral dataset.
As a result, 54 replicates (6.55%) were rejected, corresponding to 36 of 103 tablet IDs from which at least one replicate was removed, whereas 67 IDs retained all replicates. Most affected IDs lost only one or two replicates, and among them, 22 (40.74%) exceeded the T2 limit, 20 (37.04%) exceeded the Q limit, and 12 (22.22%) exceeded both limits. This pattern indicates that the two outlier criteria capture complementary types of spectral abnormalities. After outlier removal, mean and median spectra were computed for each pellet ID. These representative spectra were used as input for module 2.
On average across pipelines, mean spectra outperformed median spectra with mean global rank was 6.71 and 10.29, respectively (Table S5). Both SNV and MSC were evaluated in combination with the different SG filter options, and the final choice of SG first derivative followed by SNV on mean spectra reflects a compromise between baseline correction, band-shape preservation and robustness of quantitative predictions under the present ATR-FTIR conditions. This observation suggested that mean spectra retain more chemical information after effective outlier removal. Besides, pipelines based on SG second derivatives showed superior classification metrics in several cases but resulted in substantially higher regression errors, particularly for LD. This trade-off was consistent with the known tendency of higher-order derivatives to emphasise shape differences at the expense of noise amplification.
At the analyte level, SG first-derivative + SNV was optimal for LD and BZ, yielding the lowest RMSEpos,TP (9.377 and 3.384, respectively) and good classification metrics (BAcc > 0.92, F1 > 0.95, and AUC > 0.98 for both analytes). For CD, raw + MSC performed marginally better than SG first-derivative + SNV (RMSEpos,TP = 1.269 vs. 1.337, respectively), but the difference was small compared to the advantages of the latter pipeline for LD and BZ. Finally, the pipeline SG first derivative + SNV applied to mean spectra yielded the lowest global rank (3.80) and acceptable stability (SD of 1.253); therefore, it was adopted as the standard preprocessing for module 3.
| Analyte | Spectral region | Best classifier | Sensitivity | Specificity | BAcc | F1 | AUC |
|---|---|---|---|---|---|---|---|
| LD | Fingerprint | RF | 0.963 | 0.920 | 0.942 | 0.968 | 0.993 |
| LD | Half | Logistic | 0.951 | 0.960 | 0.955 | 0.967 | 0.993 |
| LD | Full | RF | 0.987 | 0.920 | 0.953 | 0.981 | 1.000 |
| BZ | Fingerprint | Logistic | 0.986 | 0.971 | 0.979 | 0.986 | 0.998 |
| BZ | Half | Logistic | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| BZ | Full | RF | 1.000 | 0.975 | 0.988 | 0.992 | 1.000 |
| CD | Fingerprint | SVC | 0.972 | 0.844 | 0.908 | 0.940 | 0.961 |
| CD | Half | RF | 0.933 | 0.876 | 0.905 | 0.942 | 0.958 |
| CD | Full | RF | 0.919 | 0.876 | 0.898 | 0.934 | 0.956 |
The obtained results, as shown in Table 1, indicated that no single classifier was optimal across all analytes and regions. For LD, RF was the most reasonable classification model across the fingerprint and full regions, with BAcc ≈ 0.94–0.95 and AUC close to 1.0, whereas in the half region, logistic regression achieved the best rank, with slightly higher BAcc (0.955) despite RF having a little better F1 and AUC. This outcome suggested that, under the chosen preprocessing, the decision boundary for LD present/absent in the half region was approximately linear, while the two other regions appeared to benefit more from the ability to capture additional nonlinear interactions by RF. For BZ, logistic regression was clearly the foremost classifier, achieving perfect performance (AUC = F1 = BAcc = 1.000) in the half region and also the best in the full region, although RF performed only marginally better in the full region (BAcc = 0.9875 vs. 0.9808). This result suggested that BZ was the easiest analyte to classify, with the presence/absence classes well separated after preprocessing, allowing a simple linear classifier to suffice.
In contrast, CD was the most challenging analyte for classification. In the fingerprint region, SVC was the optimal classifier (BAcc = 0.9084, F1 = 0.9401, AUC = 0.9611), while RF was the best-performing model in the half- and full-region with BAcc of 0.905 and 0.898, respectively. This finding demonstrated that SVC could explain a locally non-linear decision structure present in the CD fingerprint spectral region, but when a larger number of variables was included, interactions across multiple spectral bands were better captured by RF. The lower AUC and BAcc values observed for CD were attributed to its lowest concentration and to greater overlap of its spectral bands with those of LD, the largest content. All nine best region-classifier combinations per API were used to select the regressors.
At the analyte level, however, different modelling behaviours were observed. For LD, the best performance was obtained with RF single-output in the half region (RMSEpos,TP = 8.635), although PLS single-output in the fingerprint region produced a comparable result (RMSEpos,TP = 8.655), suggesting that both linear and non-linear components contributed to the spectral–concentration relationship. For BZ, RF was clearly dominant, with the three best configurations all corresponding to RF single-output models in the half, full, and fingerprint regions (RMSEpos,TP = 3.013, 3.103, and 3.144, respectively), thereby confirming the strong advantage of non-linear modelling for this API. In contrast, CD was best modelled by latent-variable linear methods, with the lowest RMSEpos,TP obtained for PLS single-output in the full region (1.353), whereas the globally best RF configuration gave a slightly higher value (1.479). Taken together, the results of step 3.2 identified RF single-output in the half region as the most reasonable choice, while also demonstrating that the full hurdle model should be fine-tuned in step 3.3.
The fine-tuning results in step 3.3 showed that the outer-fold performance of the final model changed only marginally relative to that of the best configuration identified in step 3.2. Specifically, the RMSEpos,TP for LD remained essentially unchanged (8.624 vs. 8.635), while only minor variations were observed for BZ and CD. This finding indicates that the configuration selected in step 3.2 was already close to optimal, and that step 3.3 primarily served as a final stage of parameter stabilisation and threshold normalisation rather than as a source of substantial improvement in predictive accuracy. A further notable observation was the analyte-dependent difference in threshold and hyperparameter stability. The greatest threshold fluctuation across folds was observed for LD, whereas CD showed markedly greater stability. At the same time, the optimal RF regression hyperparameters differed among analytes. For example, larger values of min_samples_leaf were generally selected for LD, indicating a greater need for regularisation, whereas for CD, larger values of max_features were favoured, suggesting that its quantitative information was distributed across a broader range of spectral variables (Table 2). These findings further supported the use of a single-output regression structure, as they indicated that, despite being processed within a common analytical framework, the three analytes retained different predictive characteristics.
| Analyte | Preprocessing pipeline | Spectral type | Spectral region | Classifier | Regressor | ||
|---|---|---|---|---|---|---|---|
| Model | Hyperparameter | Model | Hyperparameter | ||||
| BZ | SG first derivative + SNV | Mean | Half | Logistic | C = 6.639 | RF | max_depth = 21.5 |
| max_features = 0.5 | |||||||
| min_samples_leaf = 2.0 | |||||||
| n_estimators = 426.0 | |||||||
| CD | SG first derivative + SNV | Mean | Half | RF | max_depth = 21.5 | RF | max_depth = 35.0 |
| max_features = 0.2 | max_features = 0.8 | ||||||
| min_samples_leaf = 4.0 | min_samples_leaf = 2.0 | ||||||
| n_estimators = 1135.0 | n_estimators = 531 | ||||||
| LD | SG first derivative + SNV | Mean | Half | Logistic | C = 13.163 | RF | max_depth = 25.0 |
| max_features = 0.3 | |||||||
| min_samples_leaf = 6.0 | |||||||
| n_estimators = 406 | |||||||
![]() | ||
| Fig. 3 Parity plots of (a) LD, (b) CD, and (c) BZ on full calibration set (103 samples) at the optimal conditions of the two-stage hurdle model. | ||
For quantification, the predicted LD content ranged from 37.37 to 58.44% w/w, corresponding to errors of −1.61% to 14.74% relative to HPLC-DAD results (Table 3). For BZ, two products were correctly identified as BZ-free, whereas the remaining sample was predicted to be 10.56% w/w, with an error of 10.58%. In the two LD/CD products, the predicted CD contents were 4.14 and 4.16% w/w, corresponding to errors of −17.69% and −17.79%, respectively. Overall, acceptable agreement was obtained for LD/BZ formulations, whereas larger deviations were observed for LD/CD formulations, with LD tending to be overestimated and CD underestimated. This pattern was likely related to closer matrix similarity between the synthetic calibration set and the LD/BZ commercial products than between the synthetic samples and the LD/CD products. Accordingly, although the model showed promising performance for screening, the observed errors remained large for direct replacement of HPLC-DAD, particularly in formulations containing LD and CD.
| Analyte | Sample | S1 | S2 | S3 |
|---|---|---|---|---|
| a Labelled content = mass of API claimed on label/mass of a tablet × 100 (%).b Bias = (concentration predicted by model − concentration measured by HPLC-DAD)/concentration measured by HPLC-DAD × 100 (%). | ||||
| LD | Labelled contenta (% w/w) | 50.89 | 36.38 | 48.48 |
| HPLC-DAD (% w/w) | 51.97 | 37.98 | 48.71 | |
| Direct model (% w/w) | 58.44 | 37.37 | 55.89 | |
| Biasb (%) | 12.45 | −1.61 | 14.74 | |
| CD | Labelled contenta (% w/w) | 4.85 | 5.09 | |
| HPLC-DAD (% w/w) | 5.03 | 0 | 5.06 | |
| Direct model (% w/w) | 4.14 | 0 | 4.16 | |
| Biasb (%) | −17.69 | — | −17.79 | |
| BZ | Labelled contenta (% w/w) | 9.10 | ||
| HPLC-DAD (% w/w) | 0 | 9.55 | 0 | |
| Direct model (% w/w) | 0 | 10.56 | 0 | |
| Biasb (%) | — | 10.58 | — | |
| Analyte | Sample | S1 | S2 | S3 |
|---|---|---|---|---|
| LD | Calibration transfer (% w/w) | 54.10 | 34.41 | 55.40 |
| IQR (% w/w) | 53.21–54.33 | 33.2–34.47 | 51.93–57.15 | |
| Bias (%) | 4.09 | −9.39 | 13.74 | |
| CD | Calibration transfer (% w/w) | 4.48 | 0 | 4.53 |
| IQR (% w/w) | 4.38–4.52 | 0–0 | 4.34–4.79 | |
| Bias (%) | −10.88 | 0 | −10.56 | |
| BZ | Calibration transfer (% w/w) | 0 | 9.97 | 0 |
| IQR (% w/w) | 0–0 | 9.97–10.59 | 0–0 | |
| Bias (%) | 0 | 4.41 | 0 |
![]() | ||
| Fig. 4 Correlation plot between the analytical results of ATR-FTIR/hurdling model and HPLC-DAD method for a set of three tablet samples. | ||
Although most errors were substantially reduced, some extreme deviations remained, especially for LD in sample S3, indicating that formulation-specific domain mismatch was not fully captured with only three transfer samples per split. However, the considerable reduction in CD bias after transfer indicates that this analyte benefited particularly from recalibration with real samples. Taken together, these results show that calibration transfer has improved cross-domain robustness and brought the ATR-FTIR/machine learning approach closer to a level suitable for use as a screening or secondary method to support HPLC-DAD in the quality control of Parkinson's disease tablets containing LD/BZ or LD/CD.
Under these conditions, both methods yielded AGREE scores above 0.5, whereas the ATR-FTIR approach showed clearly superior greenness relative to HPLC-DAD (0.75 vs. 0.57, Fig. 5). For ATR-FTIR, criterion 3 received a score of 0 because measurements were performed in an off-line configuration rather than in situ, although this limitation could be mitigated with portable FTIR systems. Criterion 8 also received a score of 0 because inclusion of the 109 calibration samples increased the average analysis time for the three evaluated samples to 150 min; however, this penalty is expected to decrease as sample throughput increases, given that the actual measurement time per sample was only 4 min. The HPLC-DAD score differed only slightly from that reported previously18 (0.57 vs. 0.58), mainly because the present assessment was based on analysis of a 30 mg pellet rather than a 500 mg tablet and explicitly incorporated the calibration curve. Taken together, these results indicate that, despite the relatively large sample requirements for model construction, the ATR-FTIR/machine learning strategy developed in the present study remains a rapid and environmentally profitable approach for pharmaceutical screening and quality-control applications.
![]() | ||
| Fig. 5 Greenness picogram of (a) ATR-FTIR/machine learning and (b) HPLC-DAD methods for simultaneous determination of LD, CD, and BZ scored by AGREE calculator tool. | ||
Future development should therefore focus on expanding the calibration domain by adding synthetic tablets with various excipient profiles, as well as additional commercial brands and production batches. With such extensions, the present workflow could evolve into a broadly applicable framework for multi-API quantitative solid pharmaceutical quality control.
Direct application of the final model to commercial pellets yielded excellent classification of API presence/absence, whereas quantitative performance varied by analyte and formulation. When a repeated-product balanced calibration transfer strategy was implemented with only six additional commercial pellets, median biases were reduced for all three APIs, with a particularly pronounced improvement for CD. The results demonstrated that ATR-FTIR, combined with a carefully validated hurdle model framework, can provide a rapid and green screening tool for multi-API Parkinson's medications and suggested a general strategy that may be extended to other complex solid dosage forms once broader calibration domains and more extensive calibration transfer schemes are implemented.
| This journal is © The Royal Society of Chemistry 2026 |