Kai
Zhao†
a,
Xiting
Peng†
a,
Ran
Tao†
a,
Yuchen
Su
c,
Shiyue
Huang
c,
Hang
Fu
d,
Xiaonan
Wang
*ab and
Shanying
Hu
*a
aDepartment of Chemical Engineering, Tsinghua University, Beijing 100084, P. R. China. E-mail: wangxiaonan@tsinghua.edu.cn; hxr-dce@tsinghua.edu.cn
bInstitute for Carbon Neutrality, Tsinghua University, Beijing 100084, P. R. China
cDepartment of Computer Science, University of Electronic Science and Technology of China, Chengdu, Sichuan 610056, P. R. China
dSchool of Resources and Environment, Nanchang University, Nanchang, 330031, P. R. China
First published on 19th December 2025
Understanding the core scientific information of plastic-related chemicals is critical for addressing chemical risks in plastic pollution. However, existing databases exhibit substantial information gaps in key dimensions such as chemical composition, toxicity, and functional properties. Bridging these gaps is essential for enabling robust scientific assessment and evidence-based policy making. This study establishes an integrated artificial intelligence (AI)-based technical framework to address these deficiencies. First, a large language model (LLM)-based workflow was developed to parse chemical composition data (resulting in a high-granularity dataset of 20
618 chemical entries), enhancing both granularity and coverage. Second, lightweight machine learning (ML) models were developed to efficiently impute missing toxicity labels for seven toxicity indicators. Third, fuzzy search (common sequence method) and exact search (rule-based additive identifier) methods were implemented to enable bottom-up identification of functional labels for plastic additives. Finally, the relationship between functional attributes and toxicity was systematically analyzed, offering new analytical perspectives and methodological support for identifying “chemicals of concern” among plastic additives. To establish an effective science–policy interface, sustained efforts from a broad range of stakeholders are required to enhance the development of high-quality data across the full life cycle of plastic-related chemicals.
Green foundation1. To provide the foundational scientific information needed to reduce chemical hazards in plastics, we constructed a comprehensive database of plastic-related chemicals powered by an artificial intelligence-based technical framework. The curated dataset enables informed substitution, supports the design of safer additives and strengthens upstream decision-making for greener plastic life cycles.2. The resulting plastic-related chemical database contains 20 3. The incorporation of high-quality life-cycle data for a broader range of plastic-related chemicals will underpin the green transition of the plastics industry. Meanwhile, the artificial intelligence-based technical framework shows potential for extension to broader hazardous chemical assessments, laying the groundwork for a science-policy interface in green chemistry. |
In contrast to other chemicals, plastic additives are deliberately incorporated to impart specific functionalities and reduce production costs.20,21 Additives constitute a significant mass fraction in plastic products (for example, accounting for 20–30% of the weight in polyvinyl chloride (PVC))22 and encompass more than 5000 distinct types22 (Fig. 1). By 2050, the global annual production of plastic additives is projected to reach approximately 99 million tonnes (Mt).2,22 To further enhance the functionality, environmental performance, and economic efficiency of plastic products, a range of novel plastic additives has been developed.23,24 Plastic additives can be broadly classified into 20–30 major categories based on their primary functional attributes.5,20,23–25 Each of these categories can be further subdivided into multiple subcategories according to key chemical structures5,20,23–25 (Fig. 1 and Table S1). During the manufacturing stage, the incorporation of additives depends on specific formulation strategies, which are tailored to the type of primary polymer and the desired end-use properties.25 While additives within a major functional category are generally irreplaceable, selection of subcategories can be optimized according to raw material supply, cost, and regulations, without sacrificing performance.20,23–25 Studies have demonstrated that selecting and optimizing additives from subcategories within the same functional group can both improve economic returns and significantly reduce environmental impacts.5,22 Therefore, classifying additives by function represents the first step toward achieving systematic management of plastic additives.
![]() | ||
| Fig. 1 Summary of plastic-related chemicals. More details on the category of additives are available in Table S1 and the SI. | ||
Although several databases related to plastic-associated chemicals (e.g., PlasticMAP1 and PlastChem26) have been developed to identify chemicals of concern,1,26–33 significant limitations remain in the data available for plastic additives. For instance, in the PlastChem database,26 which currently includes the most extensive set of entries, 3370 chemicals (21%) lack any information on chemical composition, and the compositions of mixtures are not structurally standardized. Moreover, 9736 entries (60%) provide no functional category information, with additive classification restricted to broad functional categories and lacking fine-grained descriptors. In addition, 7788 chemicals (48%) contain no toxicity data, and another 4553 entries (28%) report only a single toxicity-related indicator. These omissions hinder effective management of plastic additives. Without toxicity data, essential for assessing the environmental and human health risks of additives,34,35 relationships between toxicity and functional properties cannot be identified, limiting systematic understanding of key additive categories. Despite the indispensable role of plastic additives in enabling plastic performance, the current lack of scientific information on additives remains a major barrier to addressing chemical-related challenges in plastics.36
Herein, we established an integrated AI framework to overcome these informational obstacles of plastic-related chemicals. An LLM-based workflow was developed to parse chemical composition and build a higher-granularity database. ML models for predicting seven toxicity endpoints were developed to support the identification of toxicity of plastic-related chemicals. Both fuzzy search (common sequence method) and exact search (rule-based additive identifier) were implemented for bottom-up identification of the functional categories. All three components of the framework are designed with a single purpose, which is to enrich and complete scientific information for plastic-related chemicals so that the database can better support the identification of chemicals of concern and inform policy decision making. Overall, this study provides foundational scientific insights for governing and mitigating pollution issues from plastic-related chemicals.
Furthermore, detailed data collection encompassing three levels of functional labels was conducted. The first-level labels include (1) additives, (2) monomers, and (3) unknowns or NIAS. For additives, additional refinement was performed to assign second- and third-level functional labels. In this study, a broad definition of plastic additives was adopted: any chemicals that impart functional properties to plastics or facilitate plastic processing were considered additives.20,23,24,37 Based on primary functional attributes, additives were classified into 24 seconds-level categories by their major functional role (Table S1 in annex 1), which were then subdivided into 83 third-level categories according to their chemical structures (Table S2 in annex 1). The functional labels are derived from classification criteria that are widely accepted within the plastic additives industry. First-level functional labels were sourced from existing plastic chemical databases and have been fully classified. From a broad range of literature and data sources,20,23,24,37 second- and third-level functional labels for 763 chemicals were curated (indicated by a value of 1 in the “existing or predicted” column of annex 2), which were used for functional attribute prediction.
Following the first round of enrichment, 4846 chemicals remained unmatched with any records and lacked definitive chemical composition. These unmatched entries typically represented polymers, mixtures, or reaction products. To further resolve and identify these records, we implemented an LLM-based data processing workflow (Fig. 2a). The first component is the design of prompt engineering. The constructed prompt engineering framework decomposes the task of parsing chemical compositions into three structured subtasks: identifying the category of the chemical substance (mixture, reaction, polymer), analyzing the main components of mixtures, and determining the common name of the chemical substance after disaggregation (details in section 2.2 of annex 1). The output of each stage is formatted in JSON to enable clear and reproducible downstream processing. This design minimizes linguistic ambiguity, restricts the output format, and adopts task-specific question structures to ensure consistent performance of LLMs when processing diverse chemical descriptions. Then, the name of each unmatched entry among the 4846 chemicals was input into an LLM (GPT-4Turbo) to execute the prompt engineering procedure. For mixtures, each identified component was recorded as an individual sub-entry in the database. The newly parsed components were then queried against PubChem to retrieve their respective chemical information. Each sub-entry contains standardized molecular structure information and is assigned an ID as an independent pure substance under its parent record following the rule “parent record ID_subentry sequence” (for example, Data ID: 75_1). These sub-entries are used for subsequent structure retrieval and property prediction. At the same time, the ID and chemical name of the parent record (for example, Data ID: 75) are retained to preserve contextual information, but they are not used for further analysis. This ensures that the same chemical substance is not analyzed more than once.
![]() | ||
| Fig. 2 (a) Database construction methods and LLM-based workflow for parsing chemical composition; (b) overview of the plastic-related chemicals database. | ||
Meanwhile, we evaluated and validated the performance of the LLM-based workflow. A set of 40 records was randomly selected for manual validation of the results obtained from the LLM-based workflow. These entries covered four naming categories of mixtures (clear, ambiguous, complex, others) and three levels of compositional complexity (two components, three components, and four or more components), ensuring representative coverage. The performance of the workflow was assessed using precision, recall, F1 score, and accuracy. Detailed calculation methods are presented in Table S4 and eqn (1)–(4) in section 2.3 of annex 1. This evaluation focused on whether the LLM successfully identified the relevant substances described in the mixture name (such as monomers, reactants, or major components).
Manual retrieval of the true chemical components of the mixtures was also performed for further verification. A three-step retrieval strategy was applied to the same sampled records using multiple authoritative chemical databases, including SciFinder, European Chemicals Agency (ECHA) Registration Dossiers, ChemicalBook, and the Chemical Encyclopedia. These data sources were selected because they cover regulated substances, industrial chemicals, and substances with documented structural information. First, when a CAS number was available (for example, Data ID: 40 and Data ID: 75), it was used as the primary search keyword. This allowed direct verification of whether the substance corresponded to a specific compound recorded in regulatory databases, an identifiable mixture of esters, or other known chemical components. Second, for entries without CAS numbers (for example, Data ID: 11
237 and Data ID: 11
552), systematic names and industrial descriptions were used to search databases such as SciFinder and ChemicalBook. This approach enabled partial reconstruction of the structural information for certain substances such as resin esters, fatty acid esters, and inorganic salts, while some materials could not be identified due to generic commercial naming conventions. Finally, the retrieved information was compared with the component lists predicted by the large language model, with details provided in section 2.4 of annex 1.
The classification models were implemented as a three-layer multilayer perceptron (MLP), where molecular feature vectors served as inputs, followed by two fully connected hidden layers with Rectified Linear Unit (ReLU) activations and dropout regularization. The output layer produced either a single logit for the single-task configuration or seven logits simultaneously for the multi-task configuration. Binary cross-entropy loss and the Adam optimizer were used for training, with mini-batch updates and early stopping based on validation loss. Hyperparameters were tuned through random search, and five-fold stratified cross-validation was applied for model evaluation. The ML model can be obtained from the open-source code.
To optimize model performance, we systematically explored multiple molecular representations: extended-connectivity fingerprints (ECFP),38 RDKit topological fingerprints (RDKFP), and molecular ACCess System keys (MACCS).39 When combining multiple fingerprint types, we applied principal component analysis (PCA) to reduce noise and prevent overfitting (Fig. S4). In addition to traditional fingerprints, we also evaluated embeddings from two self-supervised graph models: GROVER,40 a graph transformer trained on over 10 million molecular graphs to capture local and global structure, and MolCLR,41 which uses contrastive learning on augmented molecular graphs to distinguish similar from dissimilar molecules. The detailed parameterization of each molecular representation and the exact feature dimensions used for model input are summarized in section 3.2 of annex 1. We either extracted hidden-layer embeddings from these models as fixed molecular descriptors for the MLP classifier, or fine-tuned them directly on our toxicity prediction tasks.
To provide a more rigorous evaluation of the final model, we performed a new random split of the dataset, holding out 10% of the data as an entirely independent test set. The remaining 90% of the data were used for training and validation following the same procedure as before.
To investigate the relationship between predicted toxicity endpoints and molecular structures, we performed a substructure-level enrichment analysis on the model's predicted labels. This analysis aimed to identify substructures that are statistically overrepresented in toxic compounds, providing mechanistic insight into the model's decision-making process. The analysis was conducted across the entire database of molecules, using the predicted toxicity labels generated by our model. Specifically, we selected the ECFP segment from the input feature vector for analysis. For each ECFP bit, a 2 × 2 contingency table (Table 1) was constructed to assess its association with toxicity.
| Substructure present | Substructure absent | |
|---|---|---|
| Toxic | N1 | N2 |
| Non-toxic | N3 | N4 |
The table entries represent the number of molecules in each category. The odds ratio (OR) and probability value (p value) were computed for each bit, and the p-values were adjusted using the Benjamini–Hochberg (BH) False Discovery Rate (FDR) method.
Based on the predicted values, seven toxicity indicators for each chemical were aggregated to assess its overall toxicity risk. Chemicals with a final aggregated score of 0 were classified as low-risk, those with a score of 1 as medium-risk, and those with a score greater than 1 as high-risk. For each third-level functional category, the average toxicity values of all its member chemicals were calculated to represent that category's toxicity profile. For every toxicity indicator, we constructed an 83 × 2 contingency table of third-level functional labels versus presence or absence of toxicity and applied Pearson's chi-squared test of independence. As the main output of this test, we reported the p value, which measures the probability of observing a chi-squared statistic at least as extreme as the one obtained, under the null hypothesis that functional class and the toxicity endpoint are independent. We additionally calculated Cramer's V as an effect size to characterize the strength of the association between functional classes and each toxicity endpoint.
![]() | (1) |
![]() | (2) |
![]() | (3) |
![]() | (4) |
Subsequently, unlabeled chemicals (j′) were labeled by comparing their structural similarity to the conserved sequences. For each third-level functional category, similarity was computed only at positions where
(eqn (5)). The degree of similarity between the conserved sequence and the target molecule's structure was quantified using a local Jaccard similarity approach (eqn (6)). For each unlabeled molecule, its similarity (
) to all third-level functional categories was calculated, and the top three categories with the highest similarity scores were identified. In the final database, if the target molecule already had an assigned functional label, this original label was retained; otherwise, its label was supplemented with the highest-scoring predicted category.
![]() | (5) |
![]() | (6) |
To systematically evaluate the performance of the fuzzy search method, we randomly selected 40 samples from the fuzzy search results for manual review. These entries covered a range of chemical categories and degrees of structural complexity. For each chemical, we retrieved documented functional uses, structural characteristics, and industrial applications from PubChem, the Hazardous Substances Data Bank (HSDB), the ECHA dossiers, ChemicalBook, technical data sheets, and relevant literature. The outputs generated by the fuzzy search model were then compared against the collected evidence, and each entry was assigned a manually curated score based on expert assessment. The scoring scheme consisted of three levels. A score of 1 indicated that the secondary and tertiary functional labels produced by the fuzzy search were consistent with the primary documented uses of the substance. A score of 0.5 indicated that the predicted functions were related to the recorded uses at the level of broad application categories. A score of 0 indicated that the functional labels generated by the fuzzy search were clearly inconsistent with the documented uses.
For each unlabeled chemical, molecular validity was first assessed based on its SMILES representation to ensure correct parsing into a valid single-component molecular structure. Next, each functional category was associated with a specific set of predefined SMARTS patterns, which were matched to each candidate molecule. The matching logic included direct substructure presence checks (e.g., phthalates), coexistence of multiple patterns (e.g., epoxy derivatives contain both an epoxide group and a long-chain ester group), exclusion of specific pattern (e.g., sebacic acid esters exclude cyclic esters), and stereochemical constraints (e.g., maleic acid esters conform to specific cis/trans configurations). If a molecule met the rule set for a given category, it was assigned the corresponding third-level functional label. Each category was processed independently, allowing molecules to match multiple categories when applicable.
To evaluate the performance of the rule-based exact search method, we randomly selected two chemicals from each tertiary functional label supported by the exact search for manual inspection. For each chemical, the assessment was conducted in three steps. First, we examined whether the molecular structure identified by the exact search conformed to the structural rules defined by the SMARTS patterns (structural score). Second, using the same procedure as in the manual validation of the fuzzy search, we evaluated whether the exact search result was indeed used as a plasticizer (plasticizer function score). Finally, based on both the structural score and the plasticizer function score, an overall score was assigned: a score of 1 indicated that both the structure and function were correct, a score of 0.5 indicated correct structure but mismatched function, and a score of 0 indicated inconsistency in both structure and function.
138 plastic-related chemicals, sourced from technical reports (17
071 entries), books (763 entries), enterprise product disclosures (220 entries), and scientific literature (84 entries). The LLM-based workflow successfully parsed composition information for 4846 records that previously lacked clear compositional data, ultimately adding 7326 new entries. The increase in the number of entries is because each original record describing a mixture was disaggregated into multiple entries that represent its primary components to the extent possible. For example, the entry “Phosphoric acid, mixed esters with [1,1′-biphenyl]-4,4′-diol and phenol” (Data ID: 75) was processed into three separate records with PubChem names: Phosphoric Acid (Data ID: 75_1), 4,4′-dihydroxybiphenyl (Data ID: 75_2), and Phenol (Data ID: 75_3). The original entry described a complex mixture of esterification products (such as monoesters, diesters, and triesters) along with unreacted starting materials, making it difficult to represent with a precise chemical structure. In the absence of specific information on the degree of esterification, the LLM-based approach simplified the entry by representing it in terms of its unreacted components, which approximately reflect the chemical composition of the mixture.
The precision, recall, F1 score, and accuracy of the LLM-based workflow for parsing chemical composition reached 0.92, 0.78, 0.85, and 0.74, respectively (Fig. 4a and Table S5 in annex 1). These results indicate that the workflow performs well in identifying the correct components of mixtures, achieving high precision with relatively few false-positive identifications. The performance stratification across naming patterns and compositional complexity (Fig. 4b and Fig. S1–S3 in annex 1) further demonstrates the robustness of the approach: mixtures with clear or well-structured naming patterns consistently achieved perfect or near-perfect precision, whereas entries with ambiguous or otherwise irregular naming patterns showed greater variability. It is noteworthy that even for mixtures containing three components or four or more components, the precision and F1 scores remained high when their naming definitions were clear. For mixtures belonging to complex naming categories, the LLM-based workflow demonstrated strong extraction performance (precision = 1; F1 score = 1). Error cases were analyzed for the two types with noticeably lower precision: the ambiguous naming category with two components (ambiguous-2) and the others naming category with three components (others-3). For ambiguous-2, the errors occurred because the model did not output all acids and polyols as separate components; instead, it returned only a mixed ester or fatty acid ester. For others-3, names such as “all salts of ⋯” indicate multiple metal salts, while the model returned only the organic compound and did not list all metal cations. These results suggest that the model performs robustly for well-defined nomenclature but faces challenges in multi-layered or ambiguous naming patterns.
![]() | ||
| Fig. 4 (a) Performance of LLM-based workflow for parsing chemical composition. (b) Heatmap of LLM-based extraction performance (precision) across naming categories and component complexities. | ||
Manual verification and comparison of the true chemical components of mixtures showed that 62.5% of the mixtures lacked CAS numbers or sufficient naming detail, which made it impossible to retrieve definitive structural information from databases (Table S6). At the same time, the limitation of manual verification became evident, as reconstructing a single mixture entry required more than one hour of expert effort. In contrast, the proposed LLM-based workflow played an important role by rapidly identifying structural characteristics or reactants associated with the textual names. Although it cannot perform full structural elucidation, the LLMs provided an efficient preliminary interpretation. Among entries with retrievable information, 13% were correct predictions, 27% captured correct side-chain structures but incorrect overall structures, 20% showed simplified representations or missing components, and 40% were identified as reaction components. This substantially reduced manual workload and supported downstream classification, filtering, and property prediction. These results demonstrate the practical value of the LLM-assisted database construction framework, particularly for large-scale datasets where traditional manual verification is often incomplete or infeasible.
Ultimately, a database comprising a total of 20
618 plastic-related chemical entries was established. For each entry, the chemical name, data source, molecular structure (as available in PubChem), and 17 basic property fields were recorded. The database contains 14
813 entries with clearly defined molecular formulas, which form the core dataset for prediction of toxicity and functional label. The remaining 5805 entries without molecular formulas were attributed to the following categories: unknown reaction products (346 entries), complex polymers (1147 entries), complex mixtures of reaction products and polymers (942 entries), mixtures that cannot be clearly represented by defined molecular compositions (1198 entries), complex mixtures with unknown causes (991 entries), and other records where molecular composition could not be determined due to various complexities (1181 entries). Among the chemicals with defined molecular formulas, 13
527 entries represent pure substances, while 1286 entries correspond to components derived from 613 mixtures. Of these chemicals, 41.2% were classified as additives, and 1.2% as monomers. Notably, 57.6% of the chemicals still lack clearly defined function information.
Fig. 5a presents the average performance of six representative molecular representation methods under five-fold cross-validation (complete results are provided in Table S8). Overall, the model combining three types of structural fingerprints with dimensionality reduction (ECFP + RDKFP + MACCS + PCA) achieved the best performance, with an average validation AUC (area under curve) exceeding 0.8, followed by the MACCS-only model. These results suggest that integrating multiple types of structural fingerprints can effectively enhance the model's ability to capture diverse structural features. Additionally, PCA-based dimensionality reduction further eliminates redundant information, thereby improving predictive accuracy. In contrast, pretrained models such as GROVER and MolCLR demonstrated overall inferior performance in this task compared to traditional fingerprint-based representations. This may be attributed to the fact that pretrained models are primarily designed to capture global structural patterns and graph-level semantics during training. When applied to the relatively limited labeled toxicity data in this study, these methods lacked sufficient fine-tuning capacity, which constrained the effectiveness of feature transfer. Additionally, certain toxicity indicators depend heavily on specific functional groups within molecules, and fingerprint-based representations offer a more direct and explicit encoding of such information.
In this study, the purpose of the toxicity prediction component is not to develop a new state-of-the-art quantitative structure–activity relationship (QSAR) model, but to efficiently impute missing toxicity labels for plastic-related chemicals in our database. In line with this goal, a MLP classifier built on top of molecular fingerprints offers several practical advantages. The model is lightweight, fast to train, and straightforward to deploy across multiple endpoints, and its outputs are simple probabilities for the toxic class that can be easily interpreted by users without a machine learning background and directly used for chemical screening, risk flagging, or priority ranking. We also experimented with more complex pretrained graph encoders, such as GROVER and MolCLR, but these models did not outperform the fingerprint-based MLP under our cross-validation protocol, which further supports the choice of a compact architecture for this task. Moreover, the training data are derived from the PlasticMAP database, a domain-specific resource curated for plastic-related substances whose label definitions and chemical space differ from those of generic toxicity benchmarks. Our focus is therefore on achieving reliable performance and computational efficiency within this specialized domain, rather than maximizing accuracy on broad benchmark datasets.
Fig. 5b–h displays the Area Under the Receiver-Operating Characteristic Curve (AUC-ROC) curves for the best-performing models on each toxicity indicator. All models achieved AUC values close to 1 on the training set, while validation set AUCs ranged from 0.78 to 0.86. Following the final train/validation/test split, the model maintained comparable performance on this independent test partition, with AUC values broadly consistent with those observed during cross-validation (Table S9). Overall, the ML models demonstrated strong predictive performance across multiple toxicity tasks, substantially outperforming the random classification baseline and effectively distinguishing between toxic and non-toxic classes. Notably, for toxicity indicators C and M, the validation AUCs reached 0.85 and 0.83, respectively, indicating high predictive accuracy and reliability. Furthermore, the substructure-level enrichment analysis revealed several structural motifs that were strongly associated with predicted toxicity. To visualize these associations, we constructed substructure-level volcano plots (Fig. S7). Substructures with an OR value greater than 2 and a q value less than 0.05 were considered toxicological markers. These substructures exhibit strong associations with toxicity, as evidenced by their high OR values and statistically significant p values after FDR correction.
The rule-based additive identifier is suitable for additive categories with clear structural characteristics, enabling high-confidence molecular recognition. Among the 21 third-level functional labels related to plasticizers, 4 categories share their defining patterns with other additives (e.g., stearates plasticizers, whose characteristic structures are also common in heat stabilizers), and 4 categories lack distinctive structural features (e.g., chlorinated plasticizers, where the chlorine atom is prevalent across a wide range of chemicals). The remaining 13 categories are classified as exact search supported categories (Fig. 7a). Their typical SMARTS patterns are detailed in section 5.1 of annex 1. From 83 labeled chemicals, exact search found 395 new matches across those 13 categories (Table S20). For instance, based on the SMARTS pattern of ortho-phthalates, 111 unlabeled chemicals were identified via exact search (Fig. 7b). The exact search results for each category (showing the top 50 matched molecules per category) are publicized in Fig. S9–S21 of annex 1. Based on the manual sampling evaluation, 96% of the samples were correctly identified under the structural score in the exact search results. Among the incorrect cases, one anthraquinone-type dye was misclassified as a terephthalate plasticizer due to its structural similarity to ester-based compounds. Considering both the structural score and the plasticizer function score, the proportion of correct predictions produced by the exact search method was 71%, with detailed results provided in Table S21 of annex 1. The decrease relative to the structural score alone is mainly attributable to the lack of explicit evidence in the literature. For example, several structurally correct esters (such as divinyl adipate, mixed adipate esters, and certain long-chain esters) do not have clearly documented uses as plasticizers, although their structural features are consistent with known plasticization mechanisms. long-chain esters, for instance, can intercalate between polymer chains, weaken intermolecular interactions, and provide greater mobility through their flexible aliphatic segments. These effects lower the glass transition temperature of the polymer and increase its flexibility. Although the exact search method achieved higher correctness than the fuzzy search, its applicability and execution speed are more limited. It is also important to note that the functional labels obtained through both fuzzy and exact search require additional verification before being used in practical applications.
For example, Fig. 8 highlights acid binding agents (A22-C80), which exhibit notable toxicity signals in the assessment. If regulatory decisions were based solely on toxicity without accounting for functional attributes, the entire class of associated chemicals could be indiscriminately categorized as “chemicals of concern”. However, acid binding agents play a critical role in plastic processing by neutralizing residual or process-generated acidic impurities, thereby significantly enhancing polymer thermal stability, material compatibility, and long-term performance. Imposing blanket restrictions without regard for their function could introduce greater systemic risks, including polymer degradation, catalyst deactivation, and equipment corrosion. Therefore, for acid binding agents and other additive categories with defined roles, an eco-design approach is more appropriate, prioritizing the development of low-toxicity, low-impact alternative chemicals, and gradually phasing out traditional additives.
In contrast, certain additive categories exhibit significant variation in toxicity among their subtypes. For instance, phthalate plasticizers (A01-C00) and chlorinated plasticizers (A01-C18) show markedly higher toxicity levels compared to other plasticizer subcategories (Fig. 8). In the case of plasticizers, plastic flexibility requirements can still be met by scaling up production and application of lower-toxicity alternatives. Furthermore, for additive categories with substantial internal toxicity variation, a combined policy approach, including command-and-control, fiscal, and market-based instruments, is recommended to promote a systematic transition toward greener alternatives. Command-and-control policies can set mandatory targets for safer additives through legislation. Fiscal policies may impose environmental taxes on “chemicals of concern” and offer subsidies for green alternatives. Market-based instruments could include incentive mechanisms such as allowing manufacturers adopting green additives to offset a portion of their carbon emissions, thereby linking green additive adoption with emissions trading systems and strengthening the market momentum for industry-wide green transitions.
The functional–toxicological relationship presented in Fig. 8 and Table S22 offers intuitive evidence for identifying the function of plastic additives that should be prioritized for substitution or strategic retention. No direct correlation is observed between second-level functional labels (A01–A24) and toxicity. The box plots reveal a high density of outliers across various toxicity endpoints, indicating that specific functional structures may impact toxicity. Among the third-level functional labels (C00–C82), the Pearson's chi-squared tests were used to examine the null hypothesis that functional class and each toxicity endpoint are independent. For CMR, C, R, STOT_RE and AqTox, Pearson's chi-squared test strongly rejected the null hypothesis of independence (p value < 0.001), with Cramer's V values between 0.46 and 0.64, indicating moderate to strong associations between functional classes and these toxicity labels (Table S23). In contrast, for M and RespSens we did not reject the null hypothesis of independence at the current sample size (p value > 0.3, Cramer's V around 0.33), suggesting weaker or less clearly detectable functional patterns for these endpoints. The heatmap (Fig. S22) of positive prediction fractions further shows that several third-level functional labels carry a disproportionately high burden of multiple toxicity indicators, supporting the conclusion that function is a necessary and informative perspective for identifying chemicals of concern. Although functional attributes do not directly determine toxicity, there may exist quantifiable pattern relationships between functional demands and toxicological characteristics. These findings provide valuable insights for the future development of function-driven green chemical design strategies and the establishment of classification-based regulatory frameworks grounded in functional attributes.
For the AI framework proposed in this study, employing more capable frontier LLMs and ML models trained on larger high-quality datasets can enhance chemical component identification. Although the current fuzzy framework expands coverage, it assumes equal contribution of all fingerprint bits. In practice, structural features differ in discriminative power, and uniform weighting can dilute signals from common motifs. Quantifying feature importance through SHAP values and employing them as positional weights represents a promising avenue.42 Emphasizing high-contribution positions would focus the method on informative substructures, improving both predictive accuracy and interpretability. The resulting attributions would clarify which motifs determine functional labels and guide rational substitution by distinguishing indispensable fragments from redundant ones. Integrating contribution-based analysis is therefore a promising direction for methodological refinement.
Despite its practical utility for rapid toxicity label imputation within plastic-related chemical datasets, the ML component proposed in this study is not intended to function as a general-purpose hazard assessment tool. Its applicability to broader toxicological domains remains limited, as the model is tailored specifically to the annotation rules and data curation structure of PlasticMAP. The ML model is also limited by the lack of an independent external validation dataset. Future research could focus on developing toxicity prediction models with broader generalizability across diverse chemical domains. A direction is to integrate cross-domain and multi-source toxicological evidence by combining molecular structural features with in vitro assay data and text descriptions related to chemical reactivity. At the same time, it is necessary to unify the standards for relevant toxicity data in the field of plastic chemicals and build a benchmark dataset. Methodologically, explainable AI approaches may enhance interpretability and transparency, and active learning strategies can support continuous improvement as new data become available. In addition, diffusion-based molecular representations, multi-fidelity learning schemes, and multimodal foundation models capable of jointly encoding chemical structures and toxicological text information represent promising avenues for improving model transferability and applicability across chemical hazard assessment contexts. Future research should build upon more mechanistically grounded toxicological frameworks. In particular, integrating molecular initiating events (MIEs), key events (KEs), and adverse outcome (AO) pathways offers a promising direction for enhancing both scientific interpretability and regulatory relevance. Within the scope of this study, supplementing toxicity labels using lightweight ML models and conducting explainable analyses to identify structural features contributing to high toxicity already meets the intended research objectives. Further efforts to model the relationship between molecular structure and the probability of reaching a specific AO are a necessary direction for future research, but they fall outside the scope of the present work. We provide the reported MIE, KE, and AO information corresponding to the seven toxicity endpoints used in this study in section 3.7 of annex 1 (Tables S10–S17). Systematically inferring the potential MIEs of plastic-related chemicals from their molecular structures, followed by mapping different MIEs onto the corresponding KE-to-AO networks, will strengthen the scientific evidence chain for hazard identification and help refine future risk assessment frameworks.
For plastic-related chemicals, the next step in advancing foundational scientific knowledge is to incorporate full life-cycle information to better define “chemicals of concern” criteria.43 Toxicity reflects only the intrinsic hazard of a chemical, without capturing its environmental impact across the entire life cycle.5 Before a chemical enters the use phase, environmental burdens can arise from upstream processes such as crude oil extraction, intermediate synthesis, and final product formulation. Existing studies have shown that plastic additives contribute substantially to the cradle-to-gate life cycle impacts of PVC plastics.22 Therefore, expanding the availability of life-cycle environmental impacts for plastic-related chemicals is essential. Recently reported ML approaches that predict life cycle assessment outcomes from molecular structure provide a promising technical pathway.44–47 Additionally, understanding the material metabolism of chemicals across their life cycle is equally important. Current “chemicals of concern” criteria often focus on toxicity per unit mass without considering exposure levels or usage volumes, which may result in biased conclusions. Additives retained in plastics through recycling may persist in the socio-economic system as legacy substances, potentially causing long-term harm.8–10 As such, integrating regional or global-scale chemical flow and stock data is crucial for determining whether a chemical class should be considered “of concern”. To support more systematic and dynamic risk assessments, efforts should be made to couple plastic-related chemical data with detailed chemical material flow databases.
Furthermore, the specific applications of additives within the socio-economic system, as captured in plastic-related chemical databases, remain to be clarified. Only around 1700 chemicals have been linked to primary polymers in existing databases.1 Primary polymers serve as critical carriers for plastic additives, and their compatibility is quantitatively defined through plastic formulations. Plastic formulations link application scenarios to additive selection, combining primary polymers and additives to meet functional requirements. These formulations are generally derived from experimental evaluations and applied in real-world manufacturing practices. A given plastic product often has many viable formulation options (see Table S24 of annex 1 for illustrative cases), due to variability in additive performance. For example, plasticizer performance can be quantified using a “plasticization efficiency coefficient”, whereas flame retardants are commonly evaluated based on the limiting oxygen index of the polymer-additive composite. Linking additives to a comprehensive formulation and function database would greatly enhance scientific understanding of their real-world applications. Leveraging LLMs to automatically extract formulation data and performance indicators from literature, patents, and textbooks represents a technically feasible solution.48,49 This approach not only enables high-throughput data processing but also ensures the standardized extraction of information across diverse textual sources. Ultimately, integrating these data-driven approaches will enable more comprehensive and efficient management of plastic additives, driving future advancements in sustainable plastic material design and regulatory frameworks.
The code of ML model for this study is available on GitHub (https://github.com/MadderFlowers/plastic_toxicity).
Footnote |
| † These authors contributed equally to this paper. |
| This journal is © The Royal Society of Chemistry 2026 |