Open Access Article
Emma
Pajak
a,
David
Walz
b,
Olga
Walz
b,
Laura Marie
Helleckes
a,
Klaus
Hellgardt
c and
Antonio
del Rio Chanona
*a
aThe Sargent Centre for Process Systems Engineering, Department of Chemical Engineering, Imperial College London, London SW7 2AZ, UK. E-mail: a.del-rio-chanona@imperial.ac.uk
bBASF SE, Ludwigshafen, Germany
cDepartment of Chemical Engineering, Imperial College London, London SW7 2AZ, UK
First published on 6th February 2026
The chemical industry is increasingly prioritizing sustainability, with a focus on reducing its carbon footprint to achieve net zero. By 2026, the Together for Sustainability consortium will require reporting the biogenic carbon content (BCC) in chemical products, posing a challenge as the BCC depends on feedstocks, value chain configuration, and process-specific variables. While carbon-14 isotope analysis can measure the BCC, it is impractical for continuous industrial monitoring. This work presents CarAT (Carbon Atom Tracker), an automated methodology for calculating the BCC across industrial value chains, enabling iterative and accurate sustainability reporting. The approach leverages existing Enterprise Resource Planning data in three stages: preparing value chain data, performing atom mapping in chemical reactions using chemistry language models, and applying a linear program to calculate the BCC given known inlet compositions. The methodology is validated on a 27-node industrial toluene diisocyanate value chain. Three scenarios are analyzed: a base case with all fossil feedstocks, a case incorporating a renewable feedstock, and a butanediol value chain with a recycle stream. The results are visualized using Sankey diagrams, showing the flow of carbon attributes across the value chain. The key contribution is a scalable, automated framework for BCC calculation that can update as industrial conditions change. CarAT enables chemical manufacturers to comply with upcoming sustainability mandates while supporting carbon neutrality goals by facilitating the systematic substitution of fossil carbon with biogenic alternatives. By providing transparent, auditable tracking of carbon sources throughout production networks, this framework empowers the broader chemical industry to make data-driven decisions for achieving net-zero targets and accelerating the transition to sustainable manufacturing.
Green foundation1. This research advances green chemistry by providing an automated framework to trace the BCC in complex industrial value chains, fostering accountability in feedstock sourcing and transitions toward carbon neutrality.2. Using a synergy of machine-learning-based atom mapping and linear optimization, the framework allows for accurate BCC calculations that can be updated continuously to reflect value chain changes in sourcing, processing, and distribution. 3. In future work, the framework could be leveraged for scenario analysis and value chain optimization to address a market demand for high-BCC products, thus facilitating a transition toward non-fossil feedstocks. |
An intrinsic property of chemical value chains is their interconnectedness, arising from synthesis pathways that draw on reactants from various sources, thereby linking the pathways of different products.9 These chains can be vast and complex, with some synthesis pathways relying on a series of intricate chemical transformations to produce desired compounds. Recycle streams add further complexity by creating additional loops and interactions within the production process. Major chemical manufacturers such as BASF, Dow Chemical, Shell Chemicals, and Mitsubishi Chemical operate extensive global value chains comprising numerous interconnected pathways, encompassing hundreds of thousands of nodes and yielding thousands of diverse commercial products.10
This lack of molecular transparency poses significant challenges for sustainability assessment and reporting. Without visibility into the chemical transformations occurring at each production stage, it becomes difficult to:
• Track the origin and fate of carbon atoms through complex reaction networks.
• Assess the potential for substituting fossil-derived inputs with biogenic alternatives.
• Calculate accurate sustainability metrics that require a molecular-level understanding.
• Identify optimization opportunities for reducing the environmental impact.
The TfS Guidelines provide a specialized framework for calculating PCFs for chemical products, ensuring adherence to internationally recognized standards for greenhouse gas accounting and environmental assessment. These guidelines align with Principle 7 of Green Chemistry, which advocates for the use of renewable feedstocks rather than depleting ones, by providing mechanisms to track and incentivize the transition from fossil to biogenic carbon sources throughout chemical value chains.
A critical upcoming requirement from the TfS Guidelines is the reporting of a product's Biogenic Carbon Content (BCC), starting in 2026.6,11 Biogenic carbon is defined by the World Business Council for Sustainable Development as “carbon derived from living organisms or biological processes, but not fossilized materials or fossil sources”.12 Typical sources include trees, plants, and soil, which absorb CO2 as a natural part of their life cycle.13
The introduction of BCC reporting serves a crucial purpose: it supports the estimation of end-of-life emissions, which fall outside the cradle-to-gate scope of PCF calculations (see Fig. 1). The cradle-to-gate boundary encompasses all emissions from raw material extraction (Scope 3 upstream) through production processes (Scope 1 and 2) to the factory gate, but excludes downstream emissions from product use and disposal (Scope 3 downstream). For example, a product whose carbon is entirely biogenic would contribute to no fossil CO2 emissions through combustion or degradation, regardless of the end-of-life treatment.
![]() | ||
| Fig. 1 System boundary definition for Product Carbon Footprint (PCF) calculations showing the cradle-to-gate scope. The PCF includes Scope 1 (direct emissions from owned or controlled sources), Scope 2 (indirect emissions from purchased energy), and upstream Scope 3 emissions (from purchased goods and services). Downstream Scope 3 emissions from product use and end-of-life disposal are excluded from PCF calculations but can be estimated using BCC data. Figure adapted from TfS.6 | ||
The ability to quickly calculate and recalculate BCC becomes increasingly critical as manufacturing landscapes evolve. As resilient supply chains expand the availability of biogenic raw materials,14 manufacturers need rapid BCC assessments for scenario analysis and credible end-of-life emission estimates. Similarly, as net-zero chemical pathways scale up “carbon capture, low-carbon hydrogen, carbon storage, biomass utilization”15 and other technologies, manufacturers will require a BCC framework that can quickly update with changes in routes, feedstocks, and recycling to preserve transparent, auditable product-level attribution.
Calculating the BCC for products within complex value chains presents technical challenges. Currently, this calculation is only feasible when the chemical structure clearly differentiates biogenic carbon atoms from non-biogenic ones. For instance, in an ethoxylated fatty acid – typically synthesized from a fatty acid and ethylene oxide – the fatty acid component is generally derived from vegetable oil (biogenic), while ethylene oxide may be fossil-derived. The biogenic content is then determined by the fraction of carbon originating from the fatty acid.
The complexity increases for products requiring multiple intermediates, which may have their feedstocks replaced with biogenic or recycled materials. Accurate BCC calculation demands comprehensive knowledge of all upstream reactions, including value chain configuration, raw material composition, recycle streams, and process-specific nuances affecting product composition. As BCC depends on these dynamic variables, any upstream changes necessitate recalculation – a significant burden in global value chains where sustainable feedstocks, process setups, and efficiencies frequently change.
Chemical manufacturers face a critical challenge: they must obtain certification for their sustainability metrics, yet recalculating BCC each time upstream changes occur (such as feedstock modifications or process efficiency improvements) is impractical. An alternative approach involves developing a general computational methodology that decouples the dynamic variables of the value chain from the calculation process. By certifying the methodology itself, manufacturers automatically receive certification for all subsequent calculations performed using the approved procedure. BASF has successfully employed this strategy for PCF calculations through its SCOTT methodology,11 creating a strong precedent for developing similar approaches for BCC.
Existing carbon flow accounting approaches have largely operated at aggregated or process levels. For example, Ohno et al. (2018)19 used a waste input–output material flow analysis to quantify materially retained carbon across Japan's economy, offering valuable macro-level insight but relying on historical data that cannot dynamically resolve chemical transformations or carbon origins. Likewise, process-level studies such as Kätelhön et al. (2019)20 have evaluated the climate-change mitigation potential of carbon capture and utilization (CCU) technologies, providing system-wide perspectives on emission reduction. While these static, mass-balance-based frameworks remain essential for understanding aggregate carbon stocks and process optimization, they are not designed to dynamically attribute carbon flows or distinguish between fossil and biogenic origins within chemical reaction networks. This capability is increasingly crucial as manufacturers must continuously reassess carbon attribution in response to changing feedstocks, process configurations, and value-chain designs. The present work addresses this gap by introducing a molecular-level methodology that enables atom-level carbon tracing and produces outputs compatible with broader assessment tools such as LCA and techno-economic analysis (TEA), providing a consistent, data-driven foundation for future sustainability evaluations.
The framework will decouple the computational method from value chain data to achieve methodological certification – similar to BASF's SCOTT approach for PCF – rather than relying on product-specific certification. To calculate a product's BCC or any elemental attribute share, the methodology traces atoms through the value chain to their point of origin. The proposed framework derives molecular-level insights by leveraging existing business-focused ERP data, avoiding the need to establish new datasets from scratch.
To achieve these overarching aims, the following objectives are defined:
• Identify and curate an industrial value chain case study that encapsulates challenges faced at scale to demonstrate and validate the methodology.
• Assess and implement an AI-assisted approach to propose atom mappings of value chain reactions.
• Formulate a method for dynamically computing elemental attribute shares of materials based on changing inputs (e.g., feedstock composition, value chain configuration, etc.).
Beyond immediate sustainability reporting requirements, this methodology supports broader carbon neutrality goals by facilitating the substitution of fossil-derived inputs with biogenic or recycled alternatives, as discussed by Beer et al. (2025).21 Such transparency provides decision-makers with an auditable basis for reducing Scope 1 and Scope 3 emissions while working toward net-zero targets.
A key aspect of this work is the application of existing machine learning models for atom mapping, which significantly reduces the manual burden of tracking chemical reactions across entire product portfolios. By applying state-of-the-art atom mapping algorithms to industrial value chains, we created a modular, parameter-agnostic workflow that enables rapid recalculations whenever feedstock compositions or value chain parameters change.
Although BCC is an established sustainability metric, CarAT provides the practical means to realize it across large-scale industrial chemical value chains. By bridging molecular-level carbon tracing with process-level assessment tools such as LCA and TEA, the framework operationalizes the BCC calculation in a way that aligns with established sustainability methodologies.
This integration of machine learning and systems-level optimization ensures compliance with TfS requirements while making proactive decarbonization strategies more practical, paving the way for more flexible, transparent, and ultimately greener chemical value chains.
• Industrial case studies (Section 2.1): creating a graph representation of the value chain, and pre-processing value chain data.
• Atom mapping of chemical reactions (Section 2.2): atom mapping chemical reactions using a Chemistry Language model, enabling atom tracing across a production node.
• Value chain model construction and optimization (Section 2.3): formulating and solving a linear program to determine the BCC of each substance in the value chain.
A realistic example, which is representative of the base-to-speciality chemical industry, is used to demonstrate and verify the framework. For this, an atom mapping machine learning model, RXNMapper, published by IBM,22 is leveraged.
Finally, this framework builds on confidential industrial concepts currently under review in a BASF patent application.23 Specifically, the concepts and terms, bill of materials, bill of substances, and bill of atoms (to be introduced in this section) are included within the scope of the application.
It is important to note that the industrial ERP dataset used in this work does not explicitly specify the chemical reactions occurring at each production step. Instead, it lists the materials entering and leaving each production node, representing the observed input–output relationships in real manufacturing systems. While this information defines the structural connectivity of the value chain, it does not capture how atoms are redistributed between reactants and products. Therefore, molecular-level tracing is required to determine how carbon atoms are transferred across transformations, motivating the use of atom mapping as described in Section 2.2.
In the context of the value chain, D represents virtual tanks, which serve as mix nodes. These virtual tanks are not actual physical containers where chemical reactions occur; rather, they are conceptual nodes introduced to segregate the convergence of a chemical from different sources before entering the actual chemical reactions. Where D = {d1d2…dm}, each di∈D signifies a specific virtual tank (e.g. d1d2) in Fig. 3.
Conversely, T = {t1t2…tj} denotes production nodes where chemical reactions and transformations occur. Each tj∈T represents a production step, such as synthesis, separation, formulation, or relabeling (based on ERP data). A triplet t can have one or more input materials that are consumed, and one or more materials that are produced, where g denotes the main product, and p represents materials (products, byproducts, and reactants). The given value chain structure is such that one production facility can host more than one triplet tj – this can be a consequence of it being a multi-purpose plant, or there being multiple production versions. Each triplet, tj, in the value chain is uniquely identifiable by the ERP data code (c, b, g), where c, b, and g represent the company code, business process, and main product, respectively, e.g., (c1b1g1) for t1 in Fig. 3. Furthermore, each product can be further disaggregated into constituent substances s. Each mix node, di, is indexed by (c, p) – for instance, (c1p1) is the identifier for d1 in Fig. 3. Additionally, e denotes the chemical element of interest (e.g., carbon) and a denotes the elemental attributes (e.g., biogenic, fossil, and recycled). In this work, e exclusively refers to carbon, though the CarAT framework is generalizable to other elements.
Edges E between nodes in D and T represent material flows between virtual tanks and production facilities. For edges (ditj)∈E, denoted by
, the attribute α (input ratio) is defined as the kilograms of material from di consumed per kilogram of main output at tj (see Fig. 3). Conversely, for edges (tjdi)∈E, denoted by
, the attribute μ (consumption mix share) indicates the fraction of the mixture in di originating from tj (see Fig. 3).
The value chain is thus modeled as a bipartite directed graph, where edges connect virtual tanks and production nodes. This structure is applied to construct the 27-node TDI value chain. Atom mapping is required only at production nodes, where chemical transformations occur. In contrast, mix nodes (virtual tanks) involve no chemical changes and therefore do not require atom-level tracing.
| C6H5CH3 + 2HNO3 → C6H3(NO2)2CH3 + 2H2O | (1) |
Subsequently, catalytic hydrogenation employing a nickel catalyst reduces the nitro groups of 2,4-dinitrotoluene into amine groups, yielding 2,4-diaminotoluene (TDA):
| C6H3(NO2)2CH3 + 6H2 → C6H3(NH2)2CH3 + 2H2O | (2) |
In the final stage, TDA undergoes phosgenation to produce TDI alongside hydrochloric acid:
| C6H3(NH2)2CH3 + 2COCl2 → C6H3(NCO)2CH3 + 4HCl | (3) |
This synthesis requires phosgene, derived through a sub-branch of the value chain beginning with the steam reformation of methane-rich natural gas to form syngas, primarily composed of hydrogen and carbon monoxide. Purified carbon monoxide subsequently reacts with chlorine gas, forming the phosgene necessary for the final synthesis step.
• Bill of materials: a dataset of the recipes at each production node tj, indicating the input ratios of each reactant, defined as the kilograms of material from the duplet consumed per kilogram of main output at the connected triplet, along with the corresponding output ratios of products p.
• Bill of substances: this adds a further layer of granularity to the bill of materials; it is a dataset of all substances s for a given production node/set of production nodes.
• Bill of atoms at a substance-level, ϕs′se: this is a dataset that designates the share of atoms with attribute a of a chemical element e in product substance s that originates from a reactant substance s′.
• Bill of atoms at a material-level, ψp′s′pse: this is a dataset that designates the share of atoms a of chemical element e in a product substance s in product material p that originates from a reactant substance s′ in reactant material p′; this distinction is particularly important when materials are not pure but are mixtures containing multiple substances.
• Consumption mix table: it is a dataset of all s for a given mix node/set of production nodes.
Note that substances are represented by a Simplified Molecular Input Line Entry System (SMILES), which is a string notation that allows a user to represent a chemical structure in a computer-readable format.27
To aid explanation, Tables 3 and 4 show an example bill of materials and bill of substances, respectively, for the TDI production node illustrated in Fig. 4, where toluene diamine (TDA) reacts with CO to form TDI. The corresponding bill of atoms is presented in Section 2.3.1. Note that while toluene appears in the node diagram (Fig. 4), it is omitted from the tables as it was only present in trace amounts below the threshold used in the workflow; substances below this threshold are excluded from further processing to reduce noise and computational burden.
| Index | Description |
|---|---|
| a | Elemental attributes (e.g., biogenic, fossil, etc.) |
| b | Business process, anonymized coding: PLNT b |
| c | Company code, anonymized coding: COMP c |
| e | Chemical element (e.g., carbon) |
| g | Main product, same structure as p |
| p | Product, anonymized coding: PROD p |
| s | Substance, represented by a SMILES |
| Notation | Description |
|---|---|
| D | Set of mix nodes |
| d | A mix/virtual tank node, or duplet |
| T | Set of production nodes |
| t | A production node, or triplet |
| E | Set of value chain edges |
| V | Set of all nodes |
| Reaction role | Material | Material text | Ratio |
|---|---|---|---|
| Reactant | PROD31 | TDA | 0.53 |
| Reactant | PROD19 | Sodium hydroxide | 0.04 |
| Reactant | PROD6 | Chlorine | 0.46 |
| Reactant | PROD10 | Carbon monoxide | 0.52 |
| Product | PROD36 | HCL | 0.63 |
| Product | PROD29 | TDI | 1.16 |
| Reaction role | Material | Material text | SMILES | Ratio |
|---|---|---|---|---|
| Product | PROD29 | TDI | Cc1ccc(N C O)cc1N C O |
1.16 |
| Product | PROD36 | HCL | Cl | 0.56 |
| Product | PROD36 | HCL | O C O |
0.02 |
| Product | PROD36 | HCL | [C–]#[O+] | 0.02 |
| Product | PROD36 | HCL | N#N | 0.03 |
| Reactant | PROD10 | Carbon monoxide | [C–]#[O+] | 0.52 |
| Reactant | PROD6 | Chlorine | ClCl | 0.46 |
| Reactant | PROD19 | Sodium hydroxide | [Na+]·[OH–] | 0.02 |
| Reactant | PROD19 | Sodium hydroxide | O | 0.02 |
| Reactant | PROD31 | TDA | Cc1ccc(N)cc1N | 0.53 |
Although not the case here, it is possible to have more than one reactant or product entry with the same substance, i.e., a SMILES. This can arise when two different materials share a substance, e.g., if a substance has two different sources. For calculations, the bill of substances can be aggregated such that one entry represents the cumulative amount of each substance for reactants and another entry for products. For example, during atom mapping, the stoichiometric coefficients are estimated for each reaction, necessitating the calculation of moles for each chemical species.
Determining the BCC of a molecule requires tracing each carbon atom back to its various source materials, distinguishing between fossil-based and renewable sources. This tracing involves retracing the pathway of each carbon atom from reactants through various chemical transformations to the final product in the value chain. The value chain itself is derived from industrial ERP data, which define material inputs and outputs but do not specify the underlying reaction mechanisms. Therefore, comprehensive atom-to-atom mapping for each chemical transformation within the value chain is essential. This methodology enables the precise tracking of carbon atoms from inlet materials to final products, ensuring an accurate assessment of BCC.
Several commercially available AAM tools could be used to automate the atom mapping of value chain reactions. However, based on comprehensive benchmarking against popular AAM tools including ChemAxon Automapper, Indigo, RDTool, NameRXN, and RXNMapper, the RXNMapper tool distinguishes itself with an efficient unsupervised-learning transformer model approach.22 It achieved the highest accuracy of the AAM tools and was also the second fastest algorithm – an important factor if such a model were to be deployed at the scale of an entire industrial value chain.29
RXNMapper was selected due to its demonstrated capability to handle intricate reaction details, including stereochemistry and unbalanced reactions, essential for accurately mapping diverse chemical transformations relevant to this study. Benchmark studies report that RXNMapper achieves high accuracy, correctly mapping 99.4% of a test set comprising 49
000 unbalanced patent reactions sourced from USPTO. Furthermore, it exhibits superior performance compared to other atom mapping tools such as Indigo35 and Mappet,36 providing the fastest inference times at 7.7 ms per reaction using a GPU.22
A detailed overview of RXNMapper, including its architecture based on transformer neural networks, training methodology, and performance evaluation, is provided in the SI.
![]() | ||
| Fig. 5 Atom mapping workflow: convert ERP data to molecular structures, construct reaction SMILES, apply RXNMapper to generate atom-mapped SMILES, and then convert it to a bill of atoms format. | ||
Step 1: ERP data preprocessing
The workflow begins by preprocessing ERP data to identify chemical species and convert them into molecular structures. This process generates a bill of substances for the node (Table 4), which lists all relevant species as canonical SMILES strings. These molecular structures form the foundation for subsequent analysis.
Step 2: Reaction SMILES construction
In the second stage, a reaction SMILES is constructed – a linear string notation that encodes chemical transformations from input substances to output products. The standard reaction SMILES syntax follows the format: [reactants] > [reagents] > [products].
This work adopts a simplified approach: all input substances are included in the reactant section, while the reagent section remains empty. This simplification streamlines the parsing process without affecting the results, as RXNMapper only annotates atoms in reactants and products. This generic structure enables efficient computational analysis, facilitating tasks such as reaction prediction, optimization, and data mining.37
Since RXNMapper requires a reaction SMILES with only one product substance, multiple reaction SMILES strings must be constructed for nodes with multiple products. For a production node with j reactant substances and k products, k reaction SMILES strings are generated–each containing the same set of reactants but differing in the product species, as shown in eqn (4). The stoichiometric coefficients are estimated using the mole quantity of each substance (ns), calculated using eqn (5):
![]() | (4) |
![]() | (5) |
• λps is the mass ratio of substance s in product p.
• αp is the input ratio (kg of material from duplet consumed per kg of main output at connected triplet).
• Ms is the molar mass of substance s.
Step 3: Atom mapping and generation of a bill of atoms
The constructed reaction SMILES is passed to the RXNMapper model, which returns an atom-mapped reaction SMILES. Fig. 2 displays both the unmapped reaction SMILES and visualization of the mapped reaction output for the TDI production node. For enhanced interpretability, the mapped reactions can be visualized using CDK Depict38 or RDKit,39 as shown in the third stage of Fig. 5. To calculate the BCC, the atom mapping must be translated into a “bill of atoms” format.23 The atom mapping directly yields the substance-level atom bill, denoted ϕs′se, which represents the share of atoms of chemical element e in output substance s that originated from input substance s′. Table 5 presents the complete bill of atoms for the TDI production node. Note that for this framework, the bill of atoms is only required for carbon-containing materials.
| Reactant material | Reactant SMILES | Product material | Product SMILES | Element | Atom count | Atom share |
|---|---|---|---|---|---|---|
| PROD10 | [C–]#[O+] | PROD36 | O C O |
O | 1 | 0.50 |
| PROD10 | [C–]#[O+] | PROD36 | O C O |
C | 1 | 1.00 |
| PROD31 | Cc1ccc(N)cc1N | PROD29 | Cc1ccc(N C O)cc1N C O |
C | 7 | 0.78 |
| PROD31 | Cc1ccc(N)cc1N | PROD29 | Cc1ccc(N C O)cc1N C O |
H | 6 | 1.00 |
| PROD31 | Cc1ccc(N)cc1N | PROD29 | Cc1ccc(N C O)cc1N C O |
N | 2 | 1.00 |
| PROD10 | [C–]#[O+] | PROD29 | Cc1ccc(N C O)cc1N C O |
O | 2 | 1.00 |
| PROD10 | [C–]#[O+] | PROD29 | Cc1ccc(N C O)cc1N C O |
C | 2 | 0.22 |
| PROD6 | ClCl | PROD36 | Cl | Cl | 1 | 1.00 |
| PROD6 | ClCl | PROD36 | Cl | H | 1 | 1.00 |
| PROD31 | Cc1ccc(N)cc1N | PROD36 | N#N | N | 2 | 1.00 |
| PROD19 | O | PROD36 | O C O |
O | 1 | 0.50 |
| PROD10 | [C–]#[O+] | PROD36 | [C–]#[O+] | O | 1 | 1.00 |
| PROD10 | [C–]#[O+] | PROD36 | [C–]#[O+] | O | 1 | 1.00 |
This section will first present a detailed example of calculating the BCC for a single production node. Subsequently, the value chain system will be defined, and a suitable method for solving the system will be selected and discussed. In addition to the graph notation introduced in Table 2, Table 1 defines indices required for the methodology.
![]() | (6) |
With the material-level atom bill in place, two further equations are required to fully define a system to determine the BCC of a value chain system. These equations focus on calculating the elemental attribute share β. Eqn (7) is specific to calculating the share of attribute a (e.g., fossil, biogenic, etc.) of chemical element e in substance s in material p within the production node (c, b, g). It does so by summing the attribute contributions from each incoming mix node denoted (c′, p′), weighted by the material-level atom bill ψp′s′pse.
![]() | (7) |
Eqn (8) calculates the attribute share a of element e in substance s in material p within the mix node (c, p). It does so by summing the attribute contributions from each incoming production node, denoted (c′, b′, g′), weighted by the consumption mix share (i.e., how much from the mix node comes from each production node), μc′b′g′cp.
![]() | (8) |
| C7H10N2 + 2CO → C9H6N2O2 |
![]() | ||
| Fig. 6 Example of a TDI production node with impure inlet materials, corresponding to triplet t = (c, b, g). | ||
TDA, carbon monoxide (CO), and TDI are represented as substances 1, 2, and 3, respectively. In this case, CO is present in both input materials:
• Material 1: 80% TDA, 20% CO → λ11 = 0.8, λ12 = 0.2
• Material 2: 100% CO → λ22 = 1
Let the corresponding input ratios be:
The substance-level atom bill, calculated using the atom mapper tool, ϕs′se, is:
Since substance 1 (TDA) is only present in material 1, the material-level and substance-level atom bills are equivalent:
However, for CO, which is split across both materials, the material-level atom bills are calculated using eqn (6):
Assuming material 1 is entirely fossil-derived and material 2 is entirely biogenic, let A be the set of elemental attributes considered (e.g., fossil, biogenic), and in this example, let x∈A denote biogenic carbon:
| βc11Cx, βc12Cx = 0, βc22Cx = 1 |
The BCC for this TDI node is then computed as:
| βc33Cx = βc11Cx·ψ1133C + βc12Cx·ψ1233C + βc22Cx·ψ2233C |
The BCC system can be formulated as a feasibility problem, wherein the objective is not to optimize a particular function, but rather to identify values for the elemental attribute shares of each substance that satisfy a set of constraints. In practice, however, industrial datasets often contain inconsistencies or incomplete information that may render the constraint set infeasible. To accommodate such cases, slack variables are introduced, allowing for controlled violations of the constraints. This enables the model to yield a solution even when exact feasibility is not attainable, while also quantifying the extent of any deviations.
In this formulation, the objective function is defined to minimise the total system slack. This drives the solution towards that of the original feasibility problem under the assumption of fully consistent and accurate data. Moreover, the magnitude and location of slack values provide diagnostic insight by identifying specific constraints where data limitations are most pronounced. As such, the use of slack variables offers both computational robustness and practical interpretability, making the approach particularly valuable in industrial contexts where data uncertainty is common.
The LP formulation was implemented using the Python MIP package,40 using the CBC (COIN-OR branch and cut) solvers41 – as it is open-source and suitable for LPs.
| (9a) |
![]() | (9b) |
![]() | (9c) |
![]() | (9d) |
![]() | (9e) |
βcbgpsea∈[0, 1], ∀c, b, g, p, s, e, a | (9f) |
| βcpsea∈[0, 1], ∀c, p, s, e, a | (9g) |
![]() | (9h) |
![]() | (9i) |
The objective function (9a) minimizes slack across the entire value chain, encouraging efficient use or elimination of slack variables. Table 6 summarizes the decision variables and parameters used in the LP. Slack variables are denoted z (positive) and q (negative), with subscripts indicating context: zcpse and qcpse for duplets, and zcbgpse and qcbgpse for triplets.
| Notation | Description |
|---|---|
| β cbgpsea | Fraction of elemental attribute a of chemical element e in substance s, material p, at production node (c, b, g) |
| β cpsea | Fraction of elemental attribute a of chemical element e in substance s, material p, at mix node (c, p) |
| z cbgpse | Positive slack variable for chemical element e in substance s, material p, at production node (c, b, g) |
| q cbgpse | Negative slack variable for chemical element e in substance s, material p, at production node (c, b, g) |
| z cpse | Positive slack variable for chemical element e in substances, material p, at mix node (c, p) |
| q cpse | Negative slack variable for chemical element e in substances, material p, at mix node (c, p) |
| μ c′b′g′cp | Mix node share, i.e., the fraction of a virtual tank (c, p) sourced from a production node (c′, b′, g′) |
| ψ p′s′pse | Bill of atoms, i.e., the fraction of chemical element e in substance s in product p, sourced from substance s′ in product p′ |
| D0 = {di∈D|δ− (di) = ∅} | (10) |
Here, di refers to a particular mix node in D, and δ− (di) denotes the set of inlet edges into node di; if this set is empty, then di is an inlet node. For simplicity in formulation, a structural assumption is imposed: the value chain must start and end with mix nodes. Hence, the values of βcpsea at the inlet mix nodes in D0 must be specified and set to a known constant h:
![]() | (11) |
• Eqn (9d) and (9e) ensure that for any chemical substance, the sum of carbon attributes (including fossil and biogenic) equals one, incorporating slack variables.
• Eqn (9f) and (9g) set bounds on elemental attribute shares for production and mix nodes, ensuring that β values are non-negative and do not exceed one.
• Eqn (9h) define non-negative bounds for positive and negative slack variables in production nodes. Eqn (9i) establishes analogous bounds for mix nodes.
• Base case – SI: A TDI value chain with entirely fossil-derived carbon input. TDI product BCC = 0%.
• Case 1 – section 3.1: A TDI value chain with 100% biogenic natural-gas feed. TDI product BCC = 22%.
• Case 2 – section 3.2: A BDO value chain featuring a recycle stream; 75% biogenic acetylene and 50% biogenic BDO inlet. Butanediol product BCC = 38%.
The three-stage CarAT methodology is fully implemented in a Python package and demonstrated here through three worked scenarios. For each case, the linear program results are visualized as a Sankey diagram, which clearly conveys the carbon attribute flows and the bipartite graph structure of the value chain. The edge widths are illustrative and not scaled to mass flow, as proportional weighting was found to reduce interpretability. Importantly, the objective function – i.e., the total system slack – for all three scenarios is numerical zero, indicating successful LP convergence. The complete analysis, including value chain data, solver setup, and visualization scripts, is available in a public GitHub repository,† ensuring full reproducibility and enabling others to apply CarAT to new chemical systems. The base case scenario, being structurally simple and yielding a zero BCC by design, is discussed in the SI (Section 6).
To better visualize the flow of elemental attributes, particularly the flow of biogenic carbon, a color-coding scheme was implemented. In the diagram, dark blue bars represent mix nodes and light blue bars represent production nodes. Pale yellow links represent the flow of non-carbon-containing compounds, which are still included to provide a full picture of the value chain. To differentiate between carbon and non-carbon-containing compounds, a link is shown for each substance (SMILES) transferred between the two nodes. The thickness of the links entering the blue mix nodes is proportional to the consumption mix share of that substance μc′b′gcp. Similarly, the thickness of the links entering the red production nodes is proportional to the input ratio of that substance at that node (c, b, g).
Importantly, fossil carbon is shown in gray, and green links represent biogenic carbon, with darker green signifying a larger share of biogenic carbon and paler green representing a smaller share. As expected, the carbon in the resulting TDI product stream from this value chain is 100% fossil-based.
Since TDI consists of nine carbon atoms – two sourced from the 100% biogenic carbon monoxide and seven from the fossil-derived TDA – the resulting TDI has a BCC of 22%, indicated by a lighter shade of green. This scenario demonstrates the capability of calculating the BCC of a product when using mixed sources of carbon feedstock in its synthesis.
The recycle stream adds complexity and reflects the reality of industrial value chains where such flows are common. The framework correctly apportions biogenic carbon through the recycle path, demonstrating its robustness to cyclic topologies.
1. Domain expert supersession of automated mapping. In industrial deployment, chemists can flag and correct mis-mapped reactions during data onboarding, ensuring that critical pathways are traced correctly. Having some form of confidence in atom mapping – such as that provided by the more recent LocalMapper30 – could help guide targeted spot checks.
2. Reaction-class-specific fine-tuning. If systematic errors are observed for a given reaction class, additional training on that subset can be leveraged to improve accuracy – see Section 5.1.
RXNMapper's default token limit (512) was sufficient for the value chains analyzed in this work. However, as reactions become more complex downstream in an industrial value chain, the impact of impurities, solvents, and other non-reacting compounds could make the size limitation more significant. To address this, initial efforts could involve using RXNMapper without considering stoichiometry, as the model has demonstrated good accuracy even with unbalanced reactions.43 Furthermore, careful curation of the reaction SMILES might be beneficial, such as removing non-essential compounds while preserving the core chemical reaction and essential reactants.
To meet this challenge, we have developed CarAT, a framework that integrates enterprise-level data with a chemistry language model and linear optimization to automate carbon tracing at scale. By mapping carbon atoms from feedstocks through each stage of production and solving for BCC via a linear program, CarAT decouples data from methodology. This structure allows manufacturers to obtain methodological certification, enabling automatic recalculations as operational parameters evolve.
Validation was carried out across three scenarios that increased in complexity: a fossil-only TDI value chain, the same chain with a 100% biogenic natural gas inlet, and a butanediol value chain incorporating a recycle stream and partial biogenic inputs. These scenarios tested the framework's capacity to adapt to key industrial features, including mixed feedstocks and recycling loops, with results confirming both robustness and scalability. Having demonstrated its robustness through these case studies, CarAT is now being implemented within BASF to proactively trace the carbon attributes of its product portfolio. This industrial adoption affirms CarAT's potential for large-scale deployment and highlights the chemical sector's pressing demand for scalable carbon-tracing solutions.
Further to supporting compliance with sustainability reporting requirements, CarAT enables informed decision-making for decarbonization. Tracing the biogenic carbon content and quantifying the impact of raw material composition on the product-level BCC facilitate the substitution of fossil-based carbon with biogenic alternatives. This, in turn, guides internal decisions and enhances value chain transparency, enabling Scope 3 emission reductions through more informed upstream raw material choices.
The transition to a low-carbon chemical industry will not occur through a single, definitive shift between fossil and biogenic feedstocks, but rather through a gradual and heterogeneous evolution in carbon sourcing. In practice, manufacturers will have to evaluate feedstocks and process routes with differing PCFs, considering factors such as availability, regional infrastructure, energy intensity, and overall life-cycle performance. CarAT provides a consistent and auditable basis for calculating BCC and tracing carbon flows across all such scenarios, including hybrid pathways where feedstocks of different origins and carbon intensities are combined within complex production networks. Importantly, these results are intended to complement process-level assessments: the atom-level tracing enabled by CarAT can be coupled with techno-economic or life-cycle analysis frameworks to evaluate trade-offs between the carbon origin, energy use, and yield. In particular, integration with established LCA platforms such as OpenLCA or Brightway would allow CarAT-calculated BCC to be imported and directly incorporated into cradle-to-grave assessments. This coupling would create a seamless link between molecular-level carbon tracing and full life-cycle environmental evaluation. By providing molecular-level transparency across value chains, CarAT supports chemical manufacturers in making informed, data-driven decisions toward lower-carbon production systems.
By clearly aligning with key Green Chemistry principles, particularly around substituting fossil feedstocks with biogenic or recycled sources, CarAT also offers a concrete step toward net-zero goals. Its capacity for rapid recalculation facilitates real-time adjustments in sourcing and operational strategies, making sustainable innovation both more transparent and more feasible across the chemical industry.
Second, the linear program underlying CarAT is generalizable beyond carbon. With suitable data, the same methodology could be extended to trace other elemental attributes such as nitrogen, recycled content, or toxic elements. This would enable more comprehensive sustainability assessments across chemical value chains.
Third, future work will explore the inverse optimization problem: how to allocate biogenic raw materials across a value chain to achieve a desired biogenic carbon content in the final product. This has clear relevance for setting and meeting emission targets across value chains.
Taken together, these developments will support the evolution of CarAT into a more robust and flexible tool for sustainability analysis within value chains, with the potential to assist industry in its transition towards net zero.
Supplementary information (SI) is available. The SI includes a schematic of the full TDI value chain case study used in this work, along with furhter details on the choice of chemistry language model (RXNMapper) and the base case Sankey diagram (all fossil carbon inlet streams). See DOI: https://doi.org/10.1039/d5gc04348d.
Footnote |
| † https://github.com/EmPajak21/CarAT. |
| This journal is © The Royal Society of Chemistry 2026 |