Toward comprehensive scientific information on plastic-related chemicals powered by artificial intelligence

Kai Zhao a, Xiting Peng a, Ran Tao a, Yuchen Su c, Shiyue Huang c, Hang Fu d, Xiaonan Wang *ab and Shanying Hu *a
aDepartment of Chemical Engineering, Tsinghua University, Beijing 100084, P. R. China. E-mail: wangxiaonan@tsinghua.edu.cn; hxr-dce@tsinghua.edu.cn
bInstitute for Carbon Neutrality, Tsinghua University, Beijing 100084, P. R. China
cDepartment of Computer Science, University of Electronic Science and Technology of China, Chengdu, Sichuan 610056, P. R. China
dSchool of Resources and Environment, Nanchang University, Nanchang, 330031, P. R. China

Received 13th September 2025 , Accepted 16th December 2025

First published on 19th December 2025


Abstract

Understanding the core scientific information of plastic-related chemicals is critical for addressing chemical risks in plastic pollution. However, existing databases exhibit substantial information gaps in key dimensions such as chemical composition, toxicity, and functional properties. Bridging these gaps is essential for enabling robust scientific assessment and evidence-based policy making. This study establishes an integrated artificial intelligence (AI)-based technical framework to address these deficiencies. First, a large language model (LLM)-based workflow was developed to parse chemical composition data (resulting in a high-granularity dataset of 20[thin space (1/6-em)]618 chemical entries), enhancing both granularity and coverage. Second, lightweight machine learning (ML) models were developed to efficiently impute missing toxicity labels for seven toxicity indicators. Third, fuzzy search (common sequence method) and exact search (rule-based additive identifier) methods were implemented to enable bottom-up identification of functional labels for plastic additives. Finally, the relationship between functional attributes and toxicity was systematically analyzed, offering new analytical perspectives and methodological support for identifying “chemicals of concern” among plastic additives. To establish an effective science–policy interface, sustained efforts from a broad range of stakeholders are required to enhance the development of high-quality data across the full life cycle of plastic-related chemicals.



Green foundation

1. To provide the foundational scientific information needed to reduce chemical hazards in plastics, we constructed a comprehensive database of plastic-related chemicals powered by an artificial intelligence-based technical framework. The curated dataset enables informed substitution, supports the design of safer additives and strengthens upstream decision-making for greener plastic life cycles.

2. The resulting plastic-related chemical database contains 20[thin space (1/6-em)]618 entries covering 18[thin space (1/6-em)]138 chemicals, with 14[thin space (1/6-em)]813 having defined molecular formulas, predicted toxicity and functional labels. Analysis of functional–toxicity relationships indicates that function is a necessary perspective for identifying chemicals of concern.

3. The incorporation of high-quality life-cycle data for a broader range of plastic-related chemicals will underpin the green transition of the plastics industry. Meanwhile, the artificial intelligence-based technical framework shows potential for extension to broader hazardous chemical assessments, laying the groundwork for a science-policy interface in green chemistry.


1. Introduction

Plastics are composed of primary polymers and a variety of chemicals, including plastic additives, residual monomers, and non-intentionally added substances (NIAS).1 Owing to their multifunctionality and cost-effectiveness, plastics have become indispensable materials in society.2 However, the escalating issue of plastic pollution has prompted the Intergovernmental Negotiating Committee (INC) to initiate negotiations for a globally binding agreement aimed at addressing plastic pollution from a life-cycle perspective.3,4 Chemicals associated with plastics are present throughout their entire life cycle,5–7 and some substances, such as bisphenol A (BPA)8 and di(2-ethylhexyl) phthalate (DEHP),9,10 have been proven to pose risks to both environmental and human health,11,12 while also disrupting recycling processes.13,14 As such, the risks posed by plastic-associated chemicals have emerged as one of the core challenges in tackling plastic pollution.15–19

In contrast to other chemicals, plastic additives are deliberately incorporated to impart specific functionalities and reduce production costs.20,21 Additives constitute a significant mass fraction in plastic products (for example, accounting for 20–30% of the weight in polyvinyl chloride (PVC))22 and encompass more than 5000 distinct types22 (Fig. 1). By 2050, the global annual production of plastic additives is projected to reach approximately 99 million tonnes (Mt).2,22 To further enhance the functionality, environmental performance, and economic efficiency of plastic products, a range of novel plastic additives has been developed.23,24 Plastic additives can be broadly classified into 20–30 major categories based on their primary functional attributes.5,20,23–25 Each of these categories can be further subdivided into multiple subcategories according to key chemical structures5,20,23–25 (Fig. 1 and Table S1). During the manufacturing stage, the incorporation of additives depends on specific formulation strategies, which are tailored to the type of primary polymer and the desired end-use properties.25 While additives within a major functional category are generally irreplaceable, selection of subcategories can be optimized according to raw material supply, cost, and regulations, without sacrificing performance.20,23–25 Studies have demonstrated that selecting and optimizing additives from subcategories within the same functional group can both improve economic returns and significantly reduce environmental impacts.5,22 Therefore, classifying additives by function represents the first step toward achieving systematic management of plastic additives.


image file: d5gc04822b-f1.tif
Fig. 1 Summary of plastic-related chemicals. More details on the category of additives are available in Table S1 and the SI.

Although several databases related to plastic-associated chemicals (e.g., PlasticMAP1 and PlastChem26) have been developed to identify chemicals of concern,1,26–33 significant limitations remain in the data available for plastic additives. For instance, in the PlastChem database,26 which currently includes the most extensive set of entries, 3370 chemicals (21%) lack any information on chemical composition, and the compositions of mixtures are not structurally standardized. Moreover, 9736 entries (60%) provide no functional category information, with additive classification restricted to broad functional categories and lacking fine-grained descriptors. In addition, 7788 chemicals (48%) contain no toxicity data, and another 4553 entries (28%) report only a single toxicity-related indicator. These omissions hinder effective management of plastic additives. Without toxicity data, essential for assessing the environmental and human health risks of additives,34,35 relationships between toxicity and functional properties cannot be identified, limiting systematic understanding of key additive categories. Despite the indispensable role of plastic additives in enabling plastic performance, the current lack of scientific information on additives remains a major barrier to addressing chemical-related challenges in plastics.36

Herein, we established an integrated AI framework to overcome these informational obstacles of plastic-related chemicals. An LLM-based workflow was developed to parse chemical composition and build a higher-granularity database. ML models for predicting seven toxicity endpoints were developed to support the identification of toxicity of plastic-related chemicals. Both fuzzy search (common sequence method) and exact search (rule-based additive identifier) were implemented for bottom-up identification of the functional categories. All three components of the framework are designed with a single purpose, which is to enrich and complete scientific information for plastic-related chemicals so that the database can better support the identification of chemicals of concern and inform policy decision making. Overall, this study provides foundational scientific insights for governing and mitigating pollution issues from plastic-related chemicals.

2. Methodology

2.1 Data collection

A broad range of data was collected and curated from four primary sources: reports of plastic-related chemical database,1,26,32,33 scientific literature, books,20,23,24,37 and industry. Initially, chemical and toxicity information from existing plastic chemical databases (e.g., PlasticMAP) was integrated. In addition, two further dimensions of data were incorporated to enable forward-looking assessments of plastic additives that may see large-scale future use: (1) additives in current industrial use, based on product information from Chinese enterprises; and (2) novel additive chemicals reported in the scientific literature, with molecular structures extracted from recent publications on plasticizers and flame retardants (the corresponding data sources are documented in annex 2).

Furthermore, detailed data collection encompassing three levels of functional labels was conducted. The first-level labels include (1) additives, (2) monomers, and (3) unknowns or NIAS. For additives, additional refinement was performed to assign second- and third-level functional labels. In this study, a broad definition of plastic additives was adopted: any chemicals that impart functional properties to plastics or facilitate plastic processing were considered additives.20,23,24,37 Based on primary functional attributes, additives were classified into 24 seconds-level categories by their major functional role (Table S1 in annex 1), which were then subdivided into 83 third-level categories according to their chemical structures (Table S2 in annex 1). The functional labels are derived from classification criteria that are widely accepted within the plastic additives industry. First-level functional labels were sourced from existing plastic chemical databases and have been fully classified. From a broad range of literature and data sources,20,23,24,37 second- and third-level functional labels for 763 chemicals were curated (indicated by a value of 1 in the “existing or predicted” column of annex 2), which were used for functional attribute prediction.

2.2 LLM-based workflow for parsing chemical composition

After completing the data collection, we aimed to enrich the database with detailed chemical composition information. We utilized the PUG REST API provided by PubChem (https://pubchem.ncbi.nlm.nih.gov/) to retrieve molecular data corresponding to each chemical entry. Using substance names as input queries, we obtained molecular structures (including molecular formula and Simplified Molecular Input Line Entry System (SMILES)), molecular identifiers (such as International Chemical Identifier (InChI) and Chemical Abstracts Service (CAS) numbers), and a set of basic properties (see Table S3 in annex 1 for the complete list of retrieved properties and their definitions).

Following the first round of enrichment, 4846 chemicals remained unmatched with any records and lacked definitive chemical composition. These unmatched entries typically represented polymers, mixtures, or reaction products. To further resolve and identify these records, we implemented an LLM-based data processing workflow (Fig. 2a). The first component is the design of prompt engineering. The constructed prompt engineering framework decomposes the task of parsing chemical compositions into three structured subtasks: identifying the category of the chemical substance (mixture, reaction, polymer), analyzing the main components of mixtures, and determining the common name of the chemical substance after disaggregation (details in section 2.2 of annex 1). The output of each stage is formatted in JSON to enable clear and reproducible downstream processing. This design minimizes linguistic ambiguity, restricts the output format, and adopts task-specific question structures to ensure consistent performance of LLMs when processing diverse chemical descriptions. Then, the name of each unmatched entry among the 4846 chemicals was input into an LLM (GPT-4Turbo) to execute the prompt engineering procedure. For mixtures, each identified component was recorded as an individual sub-entry in the database. The newly parsed components were then queried against PubChem to retrieve their respective chemical information. Each sub-entry contains standardized molecular structure information and is assigned an ID as an independent pure substance under its parent record following the rule “parent record ID_subentry sequence” (for example, Data ID: 75_1). These sub-entries are used for subsequent structure retrieval and property prediction. At the same time, the ID and chemical name of the parent record (for example, Data ID: 75) are retained to preserve contextual information, but they are not used for further analysis. This ensures that the same chemical substance is not analyzed more than once.


image file: d5gc04822b-f2.tif
Fig. 2 (a) Database construction methods and LLM-based workflow for parsing chemical composition; (b) overview of the plastic-related chemicals database.

Meanwhile, we evaluated and validated the performance of the LLM-based workflow. A set of 40 records was randomly selected for manual validation of the results obtained from the LLM-based workflow. These entries covered four naming categories of mixtures (clear, ambiguous, complex, others) and three levels of compositional complexity (two components, three components, and four or more components), ensuring representative coverage. The performance of the workflow was assessed using precision, recall, F1 score, and accuracy. Detailed calculation methods are presented in Table S4 and eqn (1)–(4) in section 2.3 of annex 1. This evaluation focused on whether the LLM successfully identified the relevant substances described in the mixture name (such as monomers, reactants, or major components).

Manual retrieval of the true chemical components of the mixtures was also performed for further verification. A three-step retrieval strategy was applied to the same sampled records using multiple authoritative chemical databases, including SciFinder, European Chemicals Agency (ECHA) Registration Dossiers, ChemicalBook, and the Chemical Encyclopedia. These data sources were selected because they cover regulated substances, industrial chemicals, and substances with documented structural information. First, when a CAS number was available (for example, Data ID: 40 and Data ID: 75), it was used as the primary search keyword. This allowed direct verification of whether the substance corresponded to a specific compound recorded in regulatory databases, an identifiable mixture of esters, or other known chemical components. Second, for entries without CAS numbers (for example, Data ID: 11[thin space (1/6-em)]237 and Data ID: 11[thin space (1/6-em)]552), systematic names and industrial descriptions were used to search databases such as SciFinder and ChemicalBook. This approach enabled partial reconstruction of the structural information for certain substances such as resin esters, fatty acid esters, and inorganic salts, while some materials could not be identified due to generic commercial naming conventions. Finally, the retrieved information was compared with the component lists predicted by the large language model, with details provided in section 2.4 of annex 1.

2.3 ML models for predicting chemical toxicity

To comprehensively identify potential high-risk chemicals, we developed ML models to predict the toxicity of chemicals lacking annotations in our database (Fig. 3). The models were trained using data from the PlasticMAP database.1 To formulate the task as a binary classification task, toxicity labels in the source dataset were unified. Labels 0.5 and 1 were unified as “toxic” (1), while the original 0 remained “non-toxic”. We trained separate models for seven key toxicity indicators: carcinogenicity (C), mutagenicity (M), reproductive toxicity (R), carcinogenicity/mutagenicity/reproductive toxicity (CMR), specific target organ toxicity from repeated exposure (STOT_RE), aquatic toxicity (AqTox), and respiratory sensitization (RespSens). The class distributions for each indicator are summarized in Table S7 of annex 1. Other indicators such as persistence, bioaccumulation, and toxicity (PBT) and endocrine-disrupting properties (EDC) were excluded due to insufficient training data.
image file: d5gc04822b-f3.tif
Fig. 3 A technical workflow for the prediction of toxicity and functional labels.

The classification models were implemented as a three-layer multilayer perceptron (MLP), where molecular feature vectors served as inputs, followed by two fully connected hidden layers with Rectified Linear Unit (ReLU) activations and dropout regularization. The output layer produced either a single logit for the single-task configuration or seven logits simultaneously for the multi-task configuration. Binary cross-entropy loss and the Adam optimizer were used for training, with mini-batch updates and early stopping based on validation loss. Hyperparameters were tuned through random search, and five-fold stratified cross-validation was applied for model evaluation. The ML model can be obtained from the open-source code.

To optimize model performance, we systematically explored multiple molecular representations: extended-connectivity fingerprints (ECFP),38 RDKit topological fingerprints (RDKFP), and molecular ACCess System keys (MACCS).39 When combining multiple fingerprint types, we applied principal component analysis (PCA) to reduce noise and prevent overfitting (Fig. S4). In addition to traditional fingerprints, we also evaluated embeddings from two self-supervised graph models: GROVER,40 a graph transformer trained on over 10 million molecular graphs to capture local and global structure, and MolCLR,41 which uses contrastive learning on augmented molecular graphs to distinguish similar from dissimilar molecules. The detailed parameterization of each molecular representation and the exact feature dimensions used for model input are summarized in section 3.2 of annex 1. We either extracted hidden-layer embeddings from these models as fixed molecular descriptors for the MLP classifier, or fine-tuned them directly on our toxicity prediction tasks.

To provide a more rigorous evaluation of the final model, we performed a new random split of the dataset, holding out 10% of the data as an entirely independent test set. The remaining 90% of the data were used for training and validation following the same procedure as before.

To investigate the relationship between predicted toxicity endpoints and molecular structures, we performed a substructure-level enrichment analysis on the model's predicted labels. This analysis aimed to identify substructures that are statistically overrepresented in toxic compounds, providing mechanistic insight into the model's decision-making process. The analysis was conducted across the entire database of molecules, using the predicted toxicity labels generated by our model. Specifically, we selected the ECFP segment from the input feature vector for analysis. For each ECFP bit, a 2 × 2 contingency table (Table 1) was constructed to assess its association with toxicity.

Table 1 Contingency table for substructure–toxicity association analysis
  Substructure present Substructure absent
Toxic N1 N2
Non-toxic N3 N4


The table entries represent the number of molecules in each category. The odds ratio (OR) and probability value (p value) were computed for each bit, and the p-values were adjusted using the Benjamini–Hochberg (BH) False Discovery Rate (FDR) method.

Based on the predicted values, seven toxicity indicators for each chemical were aggregated to assess its overall toxicity risk. Chemicals with a final aggregated score of 0 were classified as low-risk, those with a score of 1 as medium-risk, and those with a score greater than 1 as high-risk. For each third-level functional category, the average toxicity values of all its member chemicals were calculated to represent that category's toxicity profile. For every toxicity indicator, we constructed an 83 × 2 contingency table of third-level functional labels versus presence or absence of toxicity and applied Pearson's chi-squared test of independence. As the main output of this test, we reported the p value, which measures the probability of observing a chi-squared statistic at least as extreme as the one obtained, under the null hypothesis that functional class and the toxicity endpoint are independent. We additionally calculated Cramer's V as an effect size to characterize the strength of the association between functional classes and each toxicity endpoint.

2.4 Fuzzy and exact search for predicting functional labels

The prediction of functional labels relies on the assumption that additives within the same third-level functional category share characteristic chemical structures or functional groups. Therefore, by identifying the common characteristic structures or functional groups within manually labeled data for each third-level category, these patterns can be extrapolated to assign functional labels to other chemicals. Two approaches were proposed to support functional label identification (Fig. 3). The fuzzy search method can be fully automated, making it well-suited for rapid, large-scale labeling. In contrast, the exact search method requires manually defined labeling criteria but provides higher prediction accuracy.
2.4.1 Fuzzy search (common sequence method). The MACCS fingerprint is a 166-bit vector, with each bit corresponding to a predefined chemical substructure or feature (for example, the 84th bit represents an amine group). If a molecule contains the corresponding feature, the bit is set to 1; otherwise, it is set to 0. We proposed a statistical consistency-based screening method to identify MACCS fingerprint bits with high occurrence frequencies within each third-level functional category. Assume that N chemicals belong to a given third-level functional category (k). The molecular fingerprint (f) of each chemical (j) can be represented as shown in eqn (1), where L = 166. We set the frequency threshold θ = 0.75, meaning that a given bit (i) must be set to 1 in at least 75% of the samples to be considered part of the conserved sequence. For each bit position, the occurrence frequency of bit 1 across all samples is calculated as Pi (eqn (2)). Finally, the conserved MACCS fingerprint sequence (Ck) for each third-level functional category is obtained using eqn (3) and (4).
 
image file: d5gc04822b-t1.tif(1)
 
image file: d5gc04822b-t2.tif(2)
 
image file: d5gc04822b-t3.tif(3)
 
image file: d5gc04822b-t4.tif(4)

Subsequently, unlabeled chemicals (j′) were labeled by comparing their structural similarity to the conserved sequences. For each third-level functional category, similarity was computed only at positions where image file: d5gc04822b-t5.tif (eqn (5)). The degree of similarity between the conserved sequence and the target molecule's structure was quantified using a local Jaccard similarity approach (eqn (6)). For each unlabeled molecule, its similarity (image file: d5gc04822b-t6.tif) to all third-level functional categories was calculated, and the top three categories with the highest similarity scores were identified. In the final database, if the target molecule already had an assigned functional label, this original label was retained; otherwise, its label was supplemented with the highest-scoring predicted category.

 
image file: d5gc04822b-t7.tif(5)
 
image file: d5gc04822b-t8.tif(6)

To systematically evaluate the performance of the fuzzy search method, we randomly selected 40 samples from the fuzzy search results for manual review. These entries covered a range of chemical categories and degrees of structural complexity. For each chemical, we retrieved documented functional uses, structural characteristics, and industrial applications from PubChem, the Hazardous Substances Data Bank (HSDB), the ECHA dossiers, ChemicalBook, technical data sheets, and relevant literature. The outputs generated by the fuzzy search model were then compared against the collected evidence, and each entry was assigned a manually curated score based on expert assessment. The scoring scheme consisted of three levels. A score of 1 indicated that the secondary and tertiary functional labels produced by the fuzzy search were consistent with the primary documented uses of the substance. A score of 0.5 indicated that the predicted functions were related to the recorded uses at the level of broad application categories. A score of 0 indicated that the functional labels generated by the fuzzy search were clearly inconsistent with the documented uses.

2.4.2 Exact search (rule-based additive identifier). We also implemented a rule-based approach to identify third-level functional categories with clearly defined structural characteristics. Recognizing that not all third-level functional categories are suitable for this method, we focused on plasticizers, the highest-volume and most structurally diverse class of plastic additives, as a case study. We grouped the 21 plasticizer subcategories into three types: structurally non-distinct category (those whose structures are not uniquely associated with plasticizing function), structurally ambiguous category (those with no distinctive substructure), and exact search supported categories. For the exact search supported categories, we developed a set of expert-defined SMILES arbitrary target specification (SMARTS) patterns that capture characteristic structural motifs found in various plasticizer subtypes. These include phthalate, terephthalate, isophthalate, adipic acid esters, azelaic acid esters, fumaric acid esters, citric acid esters, trimellitate, itaconic acid esters, maleic acid esters, oleate, sebacic acid esters, and epoxy derivatives.

For each unlabeled chemical, molecular validity was first assessed based on its SMILES representation to ensure correct parsing into a valid single-component molecular structure. Next, each functional category was associated with a specific set of predefined SMARTS patterns, which were matched to each candidate molecule. The matching logic included direct substructure presence checks (e.g., phthalates), coexistence of multiple patterns (e.g., epoxy derivatives contain both an epoxide group and a long-chain ester group), exclusion of specific pattern (e.g., sebacic acid esters exclude cyclic esters), and stereochemical constraints (e.g., maleic acid esters conform to specific cis/trans configurations). If a molecule met the rule set for a given category, it was assigned the corresponding third-level functional label. Each category was processed independently, allowing molecules to match multiple categories when applicable.

To evaluate the performance of the rule-based exact search method, we randomly selected two chemicals from each tertiary functional label supported by the exact search for manual inspection. For each chemical, the assessment was conducted in three steps. First, we examined whether the molecular structure identified by the exact search conformed to the structural rules defined by the SMARTS patterns (structural score). Second, using the same procedure as in the manual validation of the fuzzy search, we evaluated whether the exact search result was indeed used as a plasticizer (plasticizer function score). Finally, based on both the structural score and the plasticizer function score, an overall score was assigned: a score of 1 indicated that both the structure and function were correct, a score of 0.5 indicated correct structure but mismatched function, and a score of 0 indicated inconsistency in both structure and function.

3. Results

3.1 Overview of plastic-related chemical database and performance of LLM-based workflow for parsing chemical composition

Through the above multi-source data collection and LLM-based processing workflow, a comprehensive plastic-related chemical database was established (Fig. 2b). The database compiled a total of 18[thin space (1/6-em)]138 plastic-related chemicals, sourced from technical reports (17[thin space (1/6-em)]071 entries), books (763 entries), enterprise product disclosures (220 entries), and scientific literature (84 entries). The LLM-based workflow successfully parsed composition information for 4846 records that previously lacked clear compositional data, ultimately adding 7326 new entries. The increase in the number of entries is because each original record describing a mixture was disaggregated into multiple entries that represent its primary components to the extent possible. For example, the entry “Phosphoric acid, mixed esters with [1,1′-biphenyl]-4,4′-diol and phenol” (Data ID: 75) was processed into three separate records with PubChem names: Phosphoric Acid (Data ID: 75_1), 4,4′-dihydroxybiphenyl (Data ID: 75_2), and Phenol (Data ID: 75_3). The original entry described a complex mixture of esterification products (such as monoesters, diesters, and triesters) along with unreacted starting materials, making it difficult to represent with a precise chemical structure. In the absence of specific information on the degree of esterification, the LLM-based approach simplified the entry by representing it in terms of its unreacted components, which approximately reflect the chemical composition of the mixture.

The precision, recall, F1 score, and accuracy of the LLM-based workflow for parsing chemical composition reached 0.92, 0.78, 0.85, and 0.74, respectively (Fig. 4a and Table S5 in annex 1). These results indicate that the workflow performs well in identifying the correct components of mixtures, achieving high precision with relatively few false-positive identifications. The performance stratification across naming patterns and compositional complexity (Fig. 4b and Fig. S1–S3 in annex 1) further demonstrates the robustness of the approach: mixtures with clear or well-structured naming patterns consistently achieved perfect or near-perfect precision, whereas entries with ambiguous or otherwise irregular naming patterns showed greater variability. It is noteworthy that even for mixtures containing three components or four or more components, the precision and F1 scores remained high when their naming definitions were clear. For mixtures belonging to complex naming categories, the LLM-based workflow demonstrated strong extraction performance (precision = 1; F1 score = 1). Error cases were analyzed for the two types with noticeably lower precision: the ambiguous naming category with two components (ambiguous-2) and the others naming category with three components (others-3). For ambiguous-2, the errors occurred because the model did not output all acids and polyols as separate components; instead, it returned only a mixed ester or fatty acid ester. For others-3, names such as “all salts of ⋯” indicate multiple metal salts, while the model returned only the organic compound and did not list all metal cations. These results suggest that the model performs robustly for well-defined nomenclature but faces challenges in multi-layered or ambiguous naming patterns.


image file: d5gc04822b-f4.tif
Fig. 4 (a) Performance of LLM-based workflow for parsing chemical composition. (b) Heatmap of LLM-based extraction performance (precision) across naming categories and component complexities.

Manual verification and comparison of the true chemical components of mixtures showed that 62.5% of the mixtures lacked CAS numbers or sufficient naming detail, which made it impossible to retrieve definitive structural information from databases (Table S6). At the same time, the limitation of manual verification became evident, as reconstructing a single mixture entry required more than one hour of expert effort. In contrast, the proposed LLM-based workflow played an important role by rapidly identifying structural characteristics or reactants associated with the textual names. Although it cannot perform full structural elucidation, the LLMs provided an efficient preliminary interpretation. Among entries with retrievable information, 13% were correct predictions, 27% captured correct side-chain structures but incorrect overall structures, 20% showed simplified representations or missing components, and 40% were identified as reaction components. This substantially reduced manual workload and supported downstream classification, filtering, and property prediction. These results demonstrate the practical value of the LLM-assisted database construction framework, particularly for large-scale datasets where traditional manual verification is often incomplete or infeasible.

Ultimately, a database comprising a total of 20[thin space (1/6-em)]618 plastic-related chemical entries was established. For each entry, the chemical name, data source, molecular structure (as available in PubChem), and 17 basic property fields were recorded. The database contains 14[thin space (1/6-em)]813 entries with clearly defined molecular formulas, which form the core dataset for prediction of toxicity and functional label. The remaining 5805 entries without molecular formulas were attributed to the following categories: unknown reaction products (346 entries), complex polymers (1147 entries), complex mixtures of reaction products and polymers (942 entries), mixtures that cannot be clearly represented by defined molecular compositions (1198 entries), complex mixtures with unknown causes (991 entries), and other records where molecular composition could not be determined due to various complexities (1181 entries). Among the chemicals with defined molecular formulas, 13[thin space (1/6-em)]527 entries represent pure substances, while 1286 entries correspond to components derived from 613 mixtures. Of these chemicals, 41.2% were classified as additives, and 1.2% as monomers. Notably, 57.6% of the chemicals still lack clearly defined function information.

3.2 Toxicity prediction performance of plastic-related chemicals

The results indicated that multi-task learning exhibited inferior overall performance compared to single-task models (Fig. S5). This may be attributed to underlying mechanistic heterogeneity among different toxicity indicators, which could diminish the effectiveness of shared feature representations in multi-task modeling. Consequently, single-task models were ultimately selected as the primary prediction framework. Fig. S6 summarizes our random hyperparameter search across fingerprint length and radius, optimizer parameters, learning rate and scheduling strategies, network architecture settings, dropout rate, and molecular representation methods.

Fig. 5a presents the average performance of six representative molecular representation methods under five-fold cross-validation (complete results are provided in Table S8). Overall, the model combining three types of structural fingerprints with dimensionality reduction (ECFP + RDKFP + MACCS + PCA) achieved the best performance, with an average validation AUC (area under curve) exceeding 0.8, followed by the MACCS-only model. These results suggest that integrating multiple types of structural fingerprints can effectively enhance the model's ability to capture diverse structural features. Additionally, PCA-based dimensionality reduction further eliminates redundant information, thereby improving predictive accuracy. In contrast, pretrained models such as GROVER and MolCLR demonstrated overall inferior performance in this task compared to traditional fingerprint-based representations. This may be attributed to the fact that pretrained models are primarily designed to capture global structural patterns and graph-level semantics during training. When applied to the relatively limited labeled toxicity data in this study, these methods lacked sufficient fine-tuning capacity, which constrained the effectiveness of feature transfer. Additionally, certain toxicity indicators depend heavily on specific functional groups within molecules, and fingerprint-based representations offer a more direct and explicit encoding of such information.


image file: d5gc04822b-f5.tif
Fig. 5 Performance of ML models for toxicity prediction. (a) Comparison of validation AUC values of different molecular representation methods on different toxicity indicators; AUC-ROC curves for the best-performing ML models corresponding to AqTox indicator (b), CMR (c), C (d), R (e), M (f), STOT_RE (g), and RespSens (h).

In this study, the purpose of the toxicity prediction component is not to develop a new state-of-the-art quantitative structure–activity relationship (QSAR) model, but to efficiently impute missing toxicity labels for plastic-related chemicals in our database. In line with this goal, a MLP classifier built on top of molecular fingerprints offers several practical advantages. The model is lightweight, fast to train, and straightforward to deploy across multiple endpoints, and its outputs are simple probabilities for the toxic class that can be easily interpreted by users without a machine learning background and directly used for chemical screening, risk flagging, or priority ranking. We also experimented with more complex pretrained graph encoders, such as GROVER and MolCLR, but these models did not outperform the fingerprint-based MLP under our cross-validation protocol, which further supports the choice of a compact architecture for this task. Moreover, the training data are derived from the PlasticMAP database, a domain-specific resource curated for plastic-related substances whose label definitions and chemical space differ from those of generic toxicity benchmarks. Our focus is therefore on achieving reliable performance and computational efficiency within this specialized domain, rather than maximizing accuracy on broad benchmark datasets.

Fig. 5b–h displays the Area Under the Receiver-Operating Characteristic Curve (AUC-ROC) curves for the best-performing models on each toxicity indicator. All models achieved AUC values close to 1 on the training set, while validation set AUCs ranged from 0.78 to 0.86. Following the final train/validation/test split, the model maintained comparable performance on this independent test partition, with AUC values broadly consistent with those observed during cross-validation (Table S9). Overall, the ML models demonstrated strong predictive performance across multiple toxicity tasks, substantially outperforming the random classification baseline and effectively distinguishing between toxic and non-toxic classes. Notably, for toxicity indicators C and M, the validation AUCs reached 0.85 and 0.83, respectively, indicating high predictive accuracy and reliability. Furthermore, the substructure-level enrichment analysis revealed several structural motifs that were strongly associated with predicted toxicity. To visualize these associations, we constructed substructure-level volcano plots (Fig. S7). Substructures with an OR value greater than 2 and a q value less than 0.05 were considered toxicological markers. These substructures exhibit strong associations with toxicity, as evidenced by their high OR values and statistically significant p values after FDR correction.

3.3 Performance of the fuzzy search and exact search methods for functional label prediction

The fuzzy search method was applied across all third-level functional categories to extract “common sequences” (detailed results are provided in Table S18 of annex 1). A key advantage of this method lies in its broad applicability, as it imposes no restrictions on the types of functional categories it can be used for. However, its performance depends on the quality of the labeled data and the representativeness of the molecular structures. The method also offers high flexibility, as the strictness of the “common” definition can be tuned by adjusting the threshold parameter θ. Fig. 6a shows the frequency histogram of Jaccard similarity scores between unlabeled chemicals and the common sequences of known categories. The distribution is approximately normal, with a mean of 0.71 and a standard deviation of 0.20. Using a cutoff of ≥0.8 as the high-similarity region, 4860 chemicals were matched. The most frequently matched categories included antioxidants, plasticizers, and heat stabilizers (Fig. 6b). The similarity scores for third-level functional labels identified by fuzzy search are shown in Fig. S8 of annex 1. Based on the manual sampling evaluation, the proportion of correct predictions produced by the fuzzy search method was 45% (detailed results provided in Table S19 of annex 1).
image file: d5gc04822b-f6.tif
Fig. 6 Results of fuzzy search for functional labels. (a) Histogram of similarity scores for plastic additive second- and third-level functional labels with Gaussian fitting; (b) distribution of high-similarity pairs (≥0.8) categorized by second-level functional labels.

The rule-based additive identifier is suitable for additive categories with clear structural characteristics, enabling high-confidence molecular recognition. Among the 21 third-level functional labels related to plasticizers, 4 categories share their defining patterns with other additives (e.g., stearates plasticizers, whose characteristic structures are also common in heat stabilizers), and 4 categories lack distinctive structural features (e.g., chlorinated plasticizers, where the chlorine atom is prevalent across a wide range of chemicals). The remaining 13 categories are classified as exact search supported categories (Fig. 7a). Their typical SMARTS patterns are detailed in section 5.1 of annex 1. From 83 labeled chemicals, exact search found 395 new matches across those 13 categories (Table S20). For instance, based on the SMARTS pattern of ortho-phthalates, 111 unlabeled chemicals were identified via exact search (Fig. 7b). The exact search results for each category (showing the top 50 matched molecules per category) are publicized in Fig. S9–S21 of annex 1. Based on the manual sampling evaluation, 96% of the samples were correctly identified under the structural score in the exact search results. Among the incorrect cases, one anthraquinone-type dye was misclassified as a terephthalate plasticizer due to its structural similarity to ester-based compounds. Considering both the structural score and the plasticizer function score, the proportion of correct predictions produced by the exact search method was 71%, with detailed results provided in Table S21 of annex 1. The decrease relative to the structural score alone is mainly attributable to the lack of explicit evidence in the literature. For example, several structurally correct esters (such as divinyl adipate, mixed adipate esters, and certain long-chain esters) do not have clearly documented uses as plasticizers, although their structural features are consistent with known plasticization mechanisms. long-chain esters, for instance, can intercalate between polymer chains, weaken intermolecular interactions, and provide greater mobility through their flexible aliphatic segments. These effects lower the glass transition temperature of the polymer and increase its flexibility. Although the exact search method achieved higher correctness than the fuzzy search, its applicability and execution speed are more limited. It is also important to note that the functional labels obtained through both fuzzy and exact search require additional verification before being used in practical applications.


image file: d5gc04822b-f7.tif
Fig. 7 Results of exact search for functional labels. (a) Distribution of third-level functional labels among plasticizer-related chemicals based on whether they support exact search; (b) examples of manually labeled data, SMARTS patterns, and exact search results for phthalate and terephthalate plasticizers.

4. Discussion

Establishing an effective science–policy interface is essential for meeting the broad scientific and technical demands that arise throughout treaty negotiations and implementation, particularly in the development of science-based criteria for identifying “chemicals of concern”.36 However, in determining which chemicals fall under the “of concern” category, functional attributes, an essential factor that should be prioritized, are often overlooked. The global socio-economic system continues to exhibit a clear and sustained demand for plastics as a foundational functional material.2 The function-driven nature of plastics inherently necessitates the use of specific additives that fulfill indispensable performance requirements. Therefore, it is more rational to formulate management strategies that first ensure the fulfillment of material functionality, and then incorporate assessments of “chemicals of concern” attributes.

For example, Fig. 8 highlights acid binding agents (A22-C80), which exhibit notable toxicity signals in the assessment. If regulatory decisions were based solely on toxicity without accounting for functional attributes, the entire class of associated chemicals could be indiscriminately categorized as “chemicals of concern”. However, acid binding agents play a critical role in plastic processing by neutralizing residual or process-generated acidic impurities, thereby significantly enhancing polymer thermal stability, material compatibility, and long-term performance. Imposing blanket restrictions without regard for their function could introduce greater systemic risks, including polymer degradation, catalyst deactivation, and equipment corrosion. Therefore, for acid binding agents and other additive categories with defined roles, an eco-design approach is more appropriate, prioritizing the development of low-toxicity, low-impact alternative chemicals, and gradually phasing out traditional additives.


image file: d5gc04822b-f8.tif
Fig. 8 Toxicity and functional label mapping of plastic additives.

In contrast, certain additive categories exhibit significant variation in toxicity among their subtypes. For instance, phthalate plasticizers (A01-C00) and chlorinated plasticizers (A01-C18) show markedly higher toxicity levels compared to other plasticizer subcategories (Fig. 8). In the case of plasticizers, plastic flexibility requirements can still be met by scaling up production and application of lower-toxicity alternatives. Furthermore, for additive categories with substantial internal toxicity variation, a combined policy approach, including command-and-control, fiscal, and market-based instruments, is recommended to promote a systematic transition toward greener alternatives. Command-and-control policies can set mandatory targets for safer additives through legislation. Fiscal policies may impose environmental taxes on “chemicals of concern” and offer subsidies for green alternatives. Market-based instruments could include incentive mechanisms such as allowing manufacturers adopting green additives to offset a portion of their carbon emissions, thereby linking green additive adoption with emissions trading systems and strengthening the market momentum for industry-wide green transitions.

The functional–toxicological relationship presented in Fig. 8 and Table S22 offers intuitive evidence for identifying the function of plastic additives that should be prioritized for substitution or strategic retention. No direct correlation is observed between second-level functional labels (A01–A24) and toxicity. The box plots reveal a high density of outliers across various toxicity endpoints, indicating that specific functional structures may impact toxicity. Among the third-level functional labels (C00–C82), the Pearson's chi-squared tests were used to examine the null hypothesis that functional class and each toxicity endpoint are independent. For CMR, C, R, STOT_RE and AqTox, Pearson's chi-squared test strongly rejected the null hypothesis of independence (p value < 0.001), with Cramer's V values between 0.46 and 0.64, indicating moderate to strong associations between functional classes and these toxicity labels (Table S23). In contrast, for M and RespSens we did not reject the null hypothesis of independence at the current sample size (p value > 0.3, Cramer's V around 0.33), suggesting weaker or less clearly detectable functional patterns for these endpoints. The heatmap (Fig. S22) of positive prediction fractions further shows that several third-level functional labels carry a disproportionately high burden of multiple toxicity indicators, supporting the conclusion that function is a necessary and informative perspective for identifying chemicals of concern. Although functional attributes do not directly determine toxicity, there may exist quantifiable pattern relationships between functional demands and toxicological characteristics. These findings provide valuable insights for the future development of function-driven green chemical design strategies and the establishment of classification-based regulatory frameworks grounded in functional attributes.

5. Conclusion and outlook

Understanding the scientific information of plastic-related chemicals is essential for addressing chemical-related challenges in plastic pollution. As emphasized by the ongoing development of a legally binding instrument to end plastic pollution,3,4 existing knowledge gaps hinder our ability to comprehend and respond to the global plastic crisis. Bridging these gaps is critical, and the advancement of AI technologies offers a promising pathway forward. This study presents a series of AI-based technical solutions aimed at enriching data on the chemical composition, functional categories, and toxicity of plastic-related chemicals. All results were consolidated into a new plastic-related chemical database, presented in annex 2. The supplementation of high-quality data serves as a pivotal foundation, enabling the effective application of LLMs and ML techniques in this domain.

For the AI framework proposed in this study, employing more capable frontier LLMs and ML models trained on larger high-quality datasets can enhance chemical component identification. Although the current fuzzy framework expands coverage, it assumes equal contribution of all fingerprint bits. In practice, structural features differ in discriminative power, and uniform weighting can dilute signals from common motifs. Quantifying feature importance through SHAP values and employing them as positional weights represents a promising avenue.42 Emphasizing high-contribution positions would focus the method on informative substructures, improving both predictive accuracy and interpretability. The resulting attributions would clarify which motifs determine functional labels and guide rational substitution by distinguishing indispensable fragments from redundant ones. Integrating contribution-based analysis is therefore a promising direction for methodological refinement.

Despite its practical utility for rapid toxicity label imputation within plastic-related chemical datasets, the ML component proposed in this study is not intended to function as a general-purpose hazard assessment tool. Its applicability to broader toxicological domains remains limited, as the model is tailored specifically to the annotation rules and data curation structure of PlasticMAP. The ML model is also limited by the lack of an independent external validation dataset. Future research could focus on developing toxicity prediction models with broader generalizability across diverse chemical domains. A direction is to integrate cross-domain and multi-source toxicological evidence by combining molecular structural features with in vitro assay data and text descriptions related to chemical reactivity. At the same time, it is necessary to unify the standards for relevant toxicity data in the field of plastic chemicals and build a benchmark dataset. Methodologically, explainable AI approaches may enhance interpretability and transparency, and active learning strategies can support continuous improvement as new data become available. In addition, diffusion-based molecular representations, multi-fidelity learning schemes, and multimodal foundation models capable of jointly encoding chemical structures and toxicological text information represent promising avenues for improving model transferability and applicability across chemical hazard assessment contexts. Future research should build upon more mechanistically grounded toxicological frameworks. In particular, integrating molecular initiating events (MIEs), key events (KEs), and adverse outcome (AO) pathways offers a promising direction for enhancing both scientific interpretability and regulatory relevance. Within the scope of this study, supplementing toxicity labels using lightweight ML models and conducting explainable analyses to identify structural features contributing to high toxicity already meets the intended research objectives. Further efforts to model the relationship between molecular structure and the probability of reaching a specific AO are a necessary direction for future research, but they fall outside the scope of the present work. We provide the reported MIE, KE, and AO information corresponding to the seven toxicity endpoints used in this study in section 3.7 of annex 1 (Tables S10–S17). Systematically inferring the potential MIEs of plastic-related chemicals from their molecular structures, followed by mapping different MIEs onto the corresponding KE-to-AO networks, will strengthen the scientific evidence chain for hazard identification and help refine future risk assessment frameworks.

For plastic-related chemicals, the next step in advancing foundational scientific knowledge is to incorporate full life-cycle information to better define “chemicals of concern” criteria.43 Toxicity reflects only the intrinsic hazard of a chemical, without capturing its environmental impact across the entire life cycle.5 Before a chemical enters the use phase, environmental burdens can arise from upstream processes such as crude oil extraction, intermediate synthesis, and final product formulation. Existing studies have shown that plastic additives contribute substantially to the cradle-to-gate life cycle impacts of PVC plastics.22 Therefore, expanding the availability of life-cycle environmental impacts for plastic-related chemicals is essential. Recently reported ML approaches that predict life cycle assessment outcomes from molecular structure provide a promising technical pathway.44–47 Additionally, understanding the material metabolism of chemicals across their life cycle is equally important. Current “chemicals of concern” criteria often focus on toxicity per unit mass without considering exposure levels or usage volumes, which may result in biased conclusions. Additives retained in plastics through recycling may persist in the socio-economic system as legacy substances, potentially causing long-term harm.8–10 As such, integrating regional or global-scale chemical flow and stock data is crucial for determining whether a chemical class should be considered “of concern”. To support more systematic and dynamic risk assessments, efforts should be made to couple plastic-related chemical data with detailed chemical material flow databases.

Furthermore, the specific applications of additives within the socio-economic system, as captured in plastic-related chemical databases, remain to be clarified. Only around 1700 chemicals have been linked to primary polymers in existing databases.1 Primary polymers serve as critical carriers for plastic additives, and their compatibility is quantitatively defined through plastic formulations. Plastic formulations link application scenarios to additive selection, combining primary polymers and additives to meet functional requirements. These formulations are generally derived from experimental evaluations and applied in real-world manufacturing practices. A given plastic product often has many viable formulation options (see Table S24 of annex 1 for illustrative cases), due to variability in additive performance. For example, plasticizer performance can be quantified using a “plasticization efficiency coefficient”, whereas flame retardants are commonly evaluated based on the limiting oxygen index of the polymer-additive composite. Linking additives to a comprehensive formulation and function database would greatly enhance scientific understanding of their real-world applications. Leveraging LLMs to automatically extract formulation data and performance indicators from literature, patents, and textbooks represents a technically feasible solution.48,49 This approach not only enables high-throughput data processing but also ensures the standardized extraction of information across diverse textual sources. Ultimately, integrating these data-driven approaches will enable more comprehensive and efficient management of plastic additives, driving future advancements in sustainable plastic material design and regulatory frameworks.

Author contributions

Conceptualization, K. Z., X. P.; methodology, K. Z., X. P., and R. T.; validation, K. Z., and X. P.; formal analysis, K. Z., X. P., and R. T.; investigation, K. Z., X. P., R. T., Y. S., and S. H.; resources, S. H. and X. W.; writing – original draft, K. Z., X. P., and R. T.; writing – review & editing, K. Z., X. P., and H. F.; writing – revision & organization, K. Z., X. P., R. T. and H. F.; visualization, K. Z., X. P., and R. T.; supervision, S. H., and X. W. K. Z., X. P., and R. T. contributed equally to this paper.

Conflicts of interest

The authors declare no competing financial interest.

Data availability

The data supporting this article have been included as part of the supplementary information (SI). Supplementary information: annex 1: details of methods and results (Fig. S1–S22 and Tables S1–S24); annex 2: plastic-related chemical database. See DOI: https://doi.org/10.1039/d5gc04822b.

The code of ML model for this study is available on GitHub (https://github.com/MadderFlowers/plastic_toxicity).

Acknowledgements

This work is supported by the Carbon Neutrality and Energy System Transformation (CNEST) Program led by Tsinghua University, the Scientific Research Innovation Capability Support Project for Young Faculty (ZYGXQNJSKYCXNLZCXM-E7) and the Tsinghua University Initiative Scientific Research Program.

References

  1. H. Wiesinger, Z. Wang and S. Hellweg, Environ. Sci. Technol., 2021, 55, 9339–9351 CrossRef PubMed.
  2. R. Geyer, J. R. Jambeck and K. L. Law, Sci. Adv., 2017, 3, e1700782 CrossRef PubMed.
  3. Intergovernmental Negotiating Committee on Plastic Pollution, The United Nations Environment Programme.
  4. Plastics Science; UNEP/PP/INC.1/7, The United Nations Environment Programme, 2022.
  5. K. Zhao, Z. Zhang, Y. Li, H. Fu, X. Peng, R. Tao, Z. Li, W. Wen, X. Xie and S. Hu, Resour., Conserv. Recycl., 2025, 217, 108213 CrossRef CAS.
  6. A. Marhoon, M. L. H. Hernandez, R. G. Billy, D. B. Müller and F. Verones, Environ. Sci. Technol., 2024, 58, 8336–8348 CrossRef CAS PubMed.
  7. J. R. Jambeck and I. Walker-Franklin, One Earth, 2023, 6, 600–606 CrossRef.
  8. J. Wang, F. K. S. Chan, M. F. Johnson, H. K. Chan, Y. Cui, J. Chen and W.-Q. Chen, Environ. Sci. Technol., 2025, 59, 1631–1646 CrossRef PubMed.
  9. Y. Cui, J. Chen, Z. Wang, J. Wang and D. T. Allen, Environ. Sci. Technol., 2022, 56, 11006–11016 CrossRef PubMed.
  10. M. Bi, W. Liu, X. Luan, M. Li, M. Liu, W. Liu and Z. Cui, Environ. Sci. Technol., 2021, 55, 13980–13989 CrossRef PubMed.
  11. V. Fauvelle, M. Garel, C. Tamburini, D. Nerini, J. Castro-Jiménez, N. Schmidt, A. Paluselli, A. Fahs, L. Papillon, A. M. Booth and R. Sempéré, Nat. Commun., 2021, 12, 4426 CrossRef PubMed.
  12. S. Vincoff, B. Schleupner, J. Santos, M. Morrison, N. Zhang, M. M. Dunphy-Daly, W. C. Eward, A. J. Armstrong, Z. Diana and J. A. Somarelli, Environ. Sci. Technol., 2024, 58, 10445–10457 CrossRef PubMed.
  13. M. Klotz, S. Schmidt, H. Wiesinger, D. Laner, Z. Wang and S. Hellweg, Environ. Sci. Technol., 2024, 58, 18686–18700 CrossRef PubMed.
  14. H. Wiesinger, C. Bleuler, V. Christen, P. Favreau, S. Hellweg, M. Langer, R. Pasquettaz, A. Schönborn and Z. Wang, Environ. Sci. Technol., 2024, 58, 1894–1907 CrossRef PubMed.
  15. B. Carney Almroth, T. Dey, T. Karlsson and M. Wang, Science, 2023, 382, 525–525 CrossRef PubMed.
  16. T. Dey, L. Trasande, R. Altman, Z. Wang, A. Krieger, M. Bergmann, D. Allen, S. Allen, T. R. Walker, M. Wagner, K. Syberg, S. M. Brander and B. C. Almroth, Science, 2022, 378, 841–842 CrossRef PubMed.
  17. K. L. Law, M. J. Sobkowicz, M. P. Shaver and M. E. Hahn, Nat. Rev. Mater., 2024, 9, 657–667 CrossRef PubMed.
  18. Z. Wang and A. Praetorius, Environ. Sci. Technol. Lett., 2022, 9, 1000–1006 CrossRef PubMed.
  19. B. Carney Almroth, S. E. Cornell, M. L. Diamond, C. A. de Wit, P. Fantke and Z. Wang, One Earth, 2022, 5, 1070–1074 CrossRef.
  20. W. Xingwei, W. Wang and Q. Liu, Plastic Additives and Formulation Technology, Chemical Industry Press, China, 2016 Search PubMed.
  21. M. Schiller, PVC Additives: Performance, Chemistry, Developments, and Sustainability, Carl Hanser Verlag GmbH & Co. KG, Munich, Germany, 2015,  DOI:10.3139/9781569905449.fm.
  22. K. Zhao, X. Wang, Z. Zhang, Y. Li, S. Li, E. Tian, X. Xie, H. Fu and S. Hu, Environ. Sci. Technol., 2024, 58, 16386–16398 CrossRef PubMed.
  23. J. Bing, J. Zhao and Y. Bao, Polyvinyl chloride resins and their applications, Chemical Industry Press, China, 2012 Search PubMed.
  24. L. Gong, D. Zheng and J. Li, Polyvinyl chloride plastic additives and formulation design technology, Sinopec Press, China, 2010 Search PubMed.
  25. W. Wang and Y. Yan, Plastic Formulas, Chemical Industry Press, China, 2008 Search PubMed.
  26. M. Wagner, L. Monclús, H. P. H. Arp, K. J. Groh, M. E. Løseth, J. Muncke, Z. Wang, R. Wolf and L. Zimmermann, State of the science on plastic chemicals – Identifying and addressing chemicals and polymers of concern, Zenodo, 2024 Search PubMed.
  27. D. Lithner, A. Larsson and G. Dave, Sci. Total Environ., 2011, 409, 3309–3324 CrossRef PubMed.
  28. K. J. Groh, T. Backhaus, B. Carney-Almroth, B. Geueke, P. A. Inostroza, A. Lennquist, H. A. Leslie, M. Maffini, D. Slunge, L. Trasande, A. M. Warhurst and J. Muncke, Sci. Total Environ., 2019, 651, 3253–3268 CrossRef PubMed.
  29. J. N. Hahladakis, C. A. Velis, R. Weber, E. Iacovidou and P. Purnell, J. Hazard. Mater., 2018, 344, 179–199 CrossRef PubMed.
  30. E. Fries, T. Grewal and R. Sühring, Environ. Sci.:Processes Impacts, 2022, 24, 1945–1956 RSC.
  31. H. Wiesinger, A. Shalin, X. Huang, A. Siegrist, N. Plinke, S. Hellweg and Z. Wang, Environ. Sci. Technol. Lett., 2024, 11, 1147–1160 CrossRef PubMed.
  32. N. Aurisano, R. Weber and P. Fantke, Curr. Opin. Green Sustain. Chem., 2021, 31, 100513 CrossRef.
  33. UNEP/PP/INC.2/INF/5, Chemicals in Plastics - A Technical Report, 2023.
  34. W. Chen, Y. Gong, M. McKie, H. Almuhtaram, J. Sun, H. Barrett, D. Yang, M. Wu, R. C. Andrews and H. Peng, Environ. Sci. Technol., 2022, 56, 14627–14639 CrossRef PubMed.
  35. X.-P. Li, G.-Y. Huang, S.-Q. Qiu, D.-Q. Lei, C.-S. Wang, L. Xie and G.-G. Ying, Environ. Sci. Technol., 2024, 58, 121–131 CrossRef PubMed.
  36. M. Spring, P. Schröder, A. Popovici, N. O’Meara, I. Corsi, S. Aliani, K. Boodhoo, J. Gobin, A. Godoy-Faúndez, A. Kahru, C. Luscombe, S. M. Praveena, A. Mustapha Olaitan, P. Wang, R. Al Bakain and F. Sakellariadou, Nat. Sustain., 2025, 8, 728–730 CrossRef.
  37. Y. Ou, J. Li and T. Han, Plastic Additive Performance and Selection Quick Reference Handbook, National Defense Industry Press, Beijing, 2012 Search PubMed.
  38. D. Rogers and M. Hahn, J. Chem. Inf. Model., 2010, 50, 742–754 CrossRef PubMed.
  39. J. L. Durant, B. A. Leland, D. R. Henry and J. G. Nourse, J. Chem. Inf. Comput. Sci., 2002, 42, 1273–1280 CrossRef PubMed.
  40. Y. Rong, Y. Bian, T. Xu, W. Xie, Y. Wei, W. Huang and J. Huang, presented in part at the Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 2020.
  41. Y. Wang, J. Wang, Z. Cao and A. Barati Farimani, Nat. Mach. Intell., 2022, 4, 279–287 CrossRef.
  42. X. Chen, M. Nian, F. Zhao, Y. Ma, J. Yao, S. Wang, X. Chen, D. Li and M. Fang, Environ. Sci. Technol., 2025, 59, 7187–7199 CrossRef PubMed.
  43. C. Wang, K. Zhao, Y. Li and S. Hu, ACS Sustainable Chem. Eng., 2025, 13, 13399–13413 CrossRef.
  44. G. Wernet, S. Hellweg, U. Fischer, S. Papadokonstantakis and K. Hungerbühler, Environ. Sci. Technol., 2008, 42, 6717–6722 CrossRef PubMed.
  45. R. Song, A. A. Keller and S. Suh, Environ. Sci. Technol., 2017, 51, 10777–10785 CrossRef PubMed.
  46. S. You, Y. Sun, X. Wang, N. Ren and Y. Liu, Environ. Sci. Technol., 2023, 57, 3434–3444 CrossRef PubMed.
  47. D. Zhang, Z. Wang, C. Oberschelp, E. Bradford and S. Hellweg, ACS Sustainable Chem. Eng., 2024, 12, 2700–2708 CrossRef PubMed.
  48. X. Peng, Y. S. Tew, K. Zhao, C. Wang, R. A. Li, S. Hu and X. Wang, Green Chem. Eng., 2025, 6, 572–581 CrossRef.
  49. F. Cheng, J. Huang, H. Li, B. I. Escher, Y. Tong, M. König, D. Wang, F. Wu, Z. Yu, B. W. Brooks and J. You, Environ. Sci. Technol. Lett., 2023, 10, 1004–1010 CrossRef.

Footnote

These authors contributed equally to this paper.

This journal is © The Royal Society of Chemistry 2026
Click here to see how this site uses Cookies. View our privacy policy here.