Open Access Article
Maximilian G. Hoepfner†
ab,
Dion Jakobs†
a,
Lucas F. Santos
ab and
Gonzalo Guillén-Gosálbez
*ab
aInstitute for Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 1, 8093 Zurich, Switzerland. E-mail: gonzalo.guillen.gosalbez@chem.ethz.ch
bNCCR Catalysis, Zürich CH-8093, Switzerland
First published on 14th April 2026
Life cycle assessment (LCA) has become the prevalent tool to quantify the impact of chemical processes, yet data gaps remain a major obstacle towards its widespread adoption. Existing LCA databases cover a few thousand, mostly high-production-volume, chemicals; however, fine chemicals are often underrepresented. Here we introduce an augmented LCA (AuLCA) framework based on chemical reaction networks (CRN), mass-based impact propagation, and first principles-based energy estimations to predict the life cycle inventories and impacts of chemicals. By applying AuLCA to four case studies, we find good agreement with commercial data, with the accuracy level depending on the chemical reaction network's size and density. Overall, AuLCA is intended to support sustainable decision-making across chemical scales, particularly in early-stage decisions on chemical reaction pathways selection.
Green foundation1. We provide an algorithm to cover data gaps in life cycle assessment, the fundamental tool to quantify the environmental impact of chemical systems investigated in Green Chemistry. By combining reaction networks with mass-based allocation and first-principles energy consumption approximations, we estimate life cycle impacts to guide the discovery of greener synthesis routes, even when only scarce data are available.2. This work estimates the life cycle footprint of thousands of chemicals that are missing in commercial databases. Our approach simplifies the time-consuming LCA data collection phase, covering chemicals that previously lacked LCA data and were hard to model due to the multiple synthesis steps involved. 3. Beyond improving our LCA augmentation algorithm, i.e., by refining solvent and yield estimates, the crucial next step is to optimize synthesis pathways based on sustainability criteria and make the refined tool available to the Green Chemistry community so LCA becomes more widely adopted in the field. This will allow benchmarking chemical systems, including syntheses routes, to identify greener alternatives, providing measurable insights and trends to support sustainable development. |
Despite the clarity of this overarching goal, the path towards a truly sustainable chemical industry remains uncertain. The current chemical industry represents 10% of the global greenhouse gas (GHG) emissions and is classified as a hard-to-abate sector, making it a critical target for sustainability-driven actions.4 Multiple green chemical technologies are being investigated, whose sustainability performance needs to be quantified using metrics to support experimental research, technology deployment, and policymaking.
The environmental impact of chemical routes was originally assessed via mass- and energy-based process level metrics such as the E-factor, Atom Economy, or Process Mass Intensity (PMI).5–7 Although these metrics address key environmental aspects such as resource utilization and waste prevention, they provide limited information on the full environmental footprint of complex molecules throughout their lifecycle.8,9 Complementing these metrics, Life Cycle Impact Assessment (LCIA) methods10 quantify the environmental, human health, and resources impacts of chemical systems across their full life cycle, encompassing the resource extraction, manufacturing, transportation, and end-of-life phases. LCA studies allow identifying environmentally detrimental processes,11–13 the occurrence of burden-shifting (collateral damage) across environmental categories,14–16 and the most critical parameters affecting environmental impacts.17,18
However, completing an LCA is a time- and resource-intensive task that requires detailed accounting of all material and energy flows along the life cycle of the reference product, most of which are hard to collect in practice. Consequently, LCAs are frequently only completed retrospectively for processes fully characterized and developed, thus reducing the opportunity for LCA results to influence early-stage chemical exploration.4,19 Although some early-stage LCAs have been conducted,11,20–22 performing full LCAs for hypothetical synthetic routes remains challenging due to lack of data. Moreover, even LCAs of already existing chemicals may face many data gaps,23 as discussed below, thus hampering the sustainable chemicals transition.
LCA databases, such as ecoinvent,24 only contain hundreds to thousands of chemicals, representing a small fraction of the over 279 million registered substances,25 which severely limits sustainability assessments. This is particularly true in fine chemicals (e.g., active compounds in pharma, pesticides, additives, etc.), whose synthesis typically involves multiple reaction steps entailing diverse reagents, solvents, and catalysts, which are seldom publicly disclosed.
Several predictive LCA (streamlined LCA) methodologies have been developed to cover LCA data gaps.26–29 They estimate cradle-to-gate or gate-to-gate LCA impacts using approximations, where the recent trend is to leverage machine learning algorithms to correlate basic features (e.g., molecular structure, thermodynamic properties, etc.) with environmental impacts.30–32 Commonly used machine learning methods include artificial neural networks,19,32–35 support vector machines and Gaussian process regressors (SVM/GPRs),33,36 and, more recently, transformers,29 amongst others.37,38 Additionally, optimization-based methods28,39 and similarity matrices40 have also been applied to the same problem.
Most streamlined LCA methodologies are based on regression approaches calibrated with a training set of chemical footprints. These methods are often based on the chemical's structure and properties while the underlying reaction pathway is not explicitly considered, although it plays a key role in the chemical's footprint. Ethylene, for example, could be synthesized from naphtha in the steam cracker, and also produced via dehydration of bioethanol, leading to completely different environmental footprints.
Moreover, such regression tools are trained for specific classes of chemicals and impact metrics, thus providing lower accuracies when extrapolating beyond the training set.
Here, we address these limitations by developing a novel Augmented Life Cycle Assessment (AuLCA) methodology that integrates chemical reaction networks, first-principles-based energy estimations, and mass-based propagation of LCA data. Using data from ecoinvent version 3.9.1,24 we show that AuLCA provides sensible predictions, more so when the reaction pathway is known.
Overall, AuLCA aims to facilitate the broad application of LCA to support better informed, transparent, and reliable sustainable decision-making across chemical scales.
![]() | ||
| Fig. 1 Four-step framework of the AuLCA tool: (A) Goal and Scope definition, (B) data curation and network construction, (C) impact augmentation and (D) data analysis and validation of predictions. | ||
Ideally, we would employ chemical databases to build CRNs, iteratively expanding the molecules in the corpus as many times as desired via chemical reactions retrieved from the chemical database. Alternatively, open databases (e.g., USPTO and the CJHIF chemical reaction dataset (CRD)42) might be used instead. Despite being free, the latter often contain fewer reactions, thus preventing the use of some molecules in the corpus (i.e., those not appearing in any reaction in the open CRN) and requiring some additional approximations in the calculations, as discussed later in the article.
The goal of the analysis is to estimate the LCIs for the nodes in the network with unknown footprint connected to the nodes in the corpus with known footprint.
Several data curation strategies might be required to prepare the CRNs before data augmentation (section 1 of the SI) to ensure they can be used for LCA augmentation, regardless of their source.
Ultimately, all chemicals in SUi will belong to the set SKi after completing the calculations in the last iteration of the data augmentation step. At iteration zero, set SKi contains the chemicals whose LCI data have been retrieved from the commercial database (e.g., the corpus), denoted here as chemicals in the set SEI. Therefore, at iteration 0, SK0 = SEI applies.
Moreover, we define additional sets used in the derivation of the algorithm. Specifically, for all reactions r in the network (∀r∈R), we define the set of corresponding reactants linked to the incoming nodes n (SINr) based on their mass-based stoichiometric coefficient Vn,r∈
2 with r∈R and n∈S, as given below:
| SINr = {n∈S|νn,r < 0} ∀r∈R | (1) |
Conversely, for products (SOUTr) of a reaction r, we have:
| SOUTr = {n∈S|νn,r > 0} ∀r∈R | (2) |
Similarly, the set of reactions that produce (RPRODn) or consume (RCONSn) chemical node n, respectively, are defined as following:
| RPRODn = {r∈R|n∈SOUTr} ∀n∈S | (3) |
| RCONSn = {r∈R|n∈SINr} ∀n∈S | (4) |
In Fig. 2 an example of a reaction (a), with two reactants (1,2) and two products (3,4) is given. In iteration i = 0 of the algorithm, only 1 and 2 are modelled with LCIs retrieved from a commercial database, while in the next iteration i = 1, we shall compute the LCI for chemicals 3, 4, which will then join set SKi.
The data augmentation algorithm departs from the molecules in the commercial databases, e.g., the corpus, but in performing the data augmentation it may face data gaps (i.e., the footprint of some reactants might be missing when attempting to compute the footprint of the reaction products). This might happen because the chemical database contains reactions that depart from a molecule in the corpus but require additional reactants missing in the corpus. Hence, a strategy is required to rank the reactions based on data availability, prioritizing data augmentation in those reactions where more information is at hand. Therefore, to guide the LCI prediction across the CRN, we rank the reactions r and nodes n in the set SUi (those whose footprint is yet to be estimated) based on the herein introduced Availability Factor (AF).
The AF quantifies the amount of LCI data available in each reaction yielding each compound in SUi. In each iteration, the compounds and synthesis routes with the highest AF are prioritized for data augmentation.
Hence, parameter AFPi,n is first defined in every iteration i for every node n as follows:
![]() | (5) |
The set of nodes connected to reaction r, denoted by Nr, is defined as follows:
| Nr = SINr∪SOUTr | (6) |
The AF for node n, in reaction r, at iteration i, is calculated as:
![]() | (7) |
For each iteration i, we define the maximum AF over all reactions r involving nodes n as:
![]() | (8) |
Finally, we can define RAFMaxi as the set of all reactions that actually reach the maximum AF in iteration i:
| RAFMaxi = {r∈R|n∈Nr, AFi,n,r = Mi} ∀i | (9) |
After identifying the order in which calculations will be performed, we next conduct the subsequent impact augmentation. Here, our goal is to determine the life cycle assessment inventories (and impacts LCAIk,n) of the missing chemicals. Impacts can be obtained from the LCI using characterisation factors CFj,k that map flow j, of all environmental flows J, to the impact category index k of all impact categories K.
![]() | (10) |
The life cycle inventory (LCIj,n) for every node n∈S in the reaction network is given by the LCIs embodied in the reactants in the reaction yielding the missing chemical, plus the LCI flows linked to the chemical transformation, including the reaction energy and the energy required in the downstream separations. In what follows, LCIRMj,n,r is the inventory given by the mass-based allocation of the LCIs of the corresponding reactants in node n for reaction r, LCIRj,n,r is the inventory linked to the reaction energy in reaction r for node n, while the LCI connected to the separation energy of reaction r is denoted by LCISj,r.
As different reactions might point to the same chemical (i.e., different alternative pathways might yield the same molecule), we shall compute the average of LCIs estimated across pathways as follows:
![]() | (11) |
The reaction energy inventory LCIRj,n,r is calculated using the mass-based enthalpy of formation
of all reactants and products of the reaction and normalizing by the stoichiometry of the node of interest. Moreover, once the energy requirements are computed, they are converted into the corresponding LCI using parameter LCIHEATj (i.e., inventory of the heating or cooling agent, retrieved from an environmental database).
![]() | (12) |
Similarly, the separation energy inventory LCISj,n,r is computed from the energy and solvent requirements for separations and the LCI of energy provision LCIHEATj, also retrieved from an environmental database.
![]() | (13) |
Here we estimate parameter HEATn,r, which quantifies the energy requirements for separations, following the heuristics in Gani et al.43 In essence, such heuristics provide suitable separation technologies for product n in reaction r. Once a suitable separation technology is identified, heuristics for energy and solvent requirements are applied. Herein, we focus on distillation, liquid–liquid extraction and recrystallization.
Finally, the inventory embodied in reactants LCIRMj,n,r is computed assuming a mass-based allocation method based on mass-based stoichiometry coefficients combined with the reactants’ life cycle inventories LCIj,n.
![]() | (14) |
Hence, the overall algorithm (Algorithm 1) can be written in compact form as follows.
| Algorithm 1: augmented life cycle assessment (AuLCA) | ||
|---|---|---|
| 1 | i ← 0 | # Start at iteration 0 |
| 2 | SKi ← SEI | # Init. data |
| 3 | SUi ← Si /SKi | |
| 4 | Initialize LCAIk,n, LCIj,n for n∈SKi | # Corpus of LCA data |
| 5 | While SUi ≠ 0 | |
| 6 | r ← RAFMaxi | # Ranking |
| 7 | Compute LCAIk,n, LCIj,n | |
| 8 | SKi ← SKi ∪ n | |
| 9 | SUi ← SUi /n | |
| 10 | i ← i + 1 | # Next iteration |
| 11 | Return LCAIk,n | |
Fig. 3 illustrates how Algorithm 1 works within a simple reaction network, containing seven chemicals (1, 2, 3, 4, 5, 6, 7) and three reactions (a, b, c). In iteration i = 0, all chemicals in the corpus SEI = {1, 2, 4} are added to SKi. Meanwhile, SUi = {3, 5, 6, 7} ≠ ∅. Using the availability factor, n = 3 in reaction r = a, is selected as the next node, due to the highest availability factor being RAFMaxi,n = a. After computing LCAIn from LCIj,n, the algorithm then proceeds to the next iterations until the loop stops with SUi = 0. See more detailed examples in section 2 of the SI.
![]() | ||
| Fig. 3 Schematic of Algorithm 1, based on a simple reaction network with three reactions (a, b, c) and with seven chemicals. In total ten nodes, i.e., chemicals and reactions. | ||
Fig. 4 shows that the predictions closely match the references (R2 = 0.97), with an RMSE of 0.74 kgCO2-eq per kg, an MAE of 0.52 kgCO2-eq per kg, and a mean relative error of 11%. Similar results are obtained when analysing the energy-related emissions rather than the total emissions (Table 4 of the SI). This analysis indicates that when reaction pathways are known predictions are accurate, suggesting that the mass allocation and gate-to-gate calculations work well in the cases analysed.
![]() | ||
| Fig. 4 Prediction performance of AuLCA based on manually selected chemicals from ecoinvent using the same reaction pathway. Values given in kgCO2eq per kg. | ||
| Networks | Open-source (OS) | Reaxys | ||
|---|---|---|---|---|
| Case study | CS I | CS II | CS III | CS IV |
| Reactions, r | 10 000 |
100 000 |
300 000 |
308 500 |
| Chemicals, S | 16 000 |
140 000 |
360 000 |
310 000 |
| SK0 | 49 | 98 | 122 | 236 |
| SKLOOV | 12 | 34 | 41 | 110 |
We define the training set, e.g., the set SK0 corresponding to the corpus for each such case, as the intersection of all chemicals S in the graph (CRN) and the precomputed chemicals from ecoinvent v3.9.1 SEI (SK0 = S ∩ SEI). Since the set of all chemicals S within the CRN is different for each case study, the training set is different too. In addition, this set needs to be filtered, yielding a validation dataset SKLOOV that is used in the LOOV. Particularly, single atom compounds, PFAS, small complex molecules (e.g., SiCl4), inorganic chemicals, and heavy halogenated molecules were excluded from SK0 to build the validation set (SKLOOV⊆SK0)(full filtering criteria in section 3.2 of the SI). Again, due to differences among CRNs, the number of validation chemicals SKLOOV differs also across cases, affecting the comparisons.
During the LOOV, the LCI of one chemical in the validation set is removed from the training set SK0, and then predicted using AuLCA for each case, using the remaining known data for the other chemicals in the augmentation. Hence, the entire data augmentation procedure is repeated for as many times as chemicals in the validation set, leaving one of them out of the analysis at a time.
Table 2 summarizes the results. We find that larger networks often lead to better prediction performance, although recall that such performance is computed over training sets of different sizes. Specifically, in the OS case studies, moving from 10
000 to 100
000 and 300
000 reactions increased the size of the training set SK0 from 49 to 98 and 122, respectively. Note that the size of the training set SK0 does not increase in the same proportion as the network size does, e.g., for a tenfold increase (CS I vs. CS II), the size of the training set SK0 doubles only. This moderate growth of the training set SK0 can be explained by the fact that a high number of reactions in the OS networks do not contain any ecoinvent chemicals present in SEI. Consequently, expanding the network does not guarantee a commensurate expansion of the initial nodes in the set SK0. Regardless, performance tends to improve with larger training sets, e.g., larger corpus SK0, which improve data accuracy and enable a broader coverage of chemicals within the CRN, as discussed below. Furthermore, a higher degree of interconnection, especially within larger CRNs, allows for the identification of more alternative synthesis pathways, whereas the validation data typically rely on a single selected route. Since our approach averages values across all available pathways within the CRN, the prediction is strongly influenced by the diversity of the considered synthesis routes (i.e., number of routes in the network, where some will be more industrially relevant than other). In addition, sparse CRNs might be on the other hand unable to predict the right synthesis route due to lack of sufficient reaction connections within the network. Hence, there is a clear trade-off regarding network size, as larger networks will average values over a wider range of routes (being only some of them industrially relevant) and smaller ones might be unable to identify the most realistic pathway. This trade-off is discussed more in-depth later in the article.
| Networks | Open-source (OS) | Reaxys© | ||
|---|---|---|---|---|
| Case study | CS I | CS II | CS III | CS IV |
| RMSE [kgCO2eq per kg] | 3.80 | 3.65 | 2.79 | 2.97 |
| MAE [kgCO2eq per kg] | 2.53 | 2.52 | 1.98 | 2.27 |
| MRE [%] | 72.3 | 81.1 | 59.3 | 72.1 |
| R2 | 0.28 | 0.11 | 0.41 | −0.08 |
RMSE and MAE both improve when increasing the CRN and set SK0 size (Fig. 5). For example, comparing CS I and CS III, the RMSE is reduced from 3.80 kgCO2eq to 2.79 kgCO2eq, while the MAE decreases from 2.53 kgCO2eq to 1.98 kgCO2eq. The MRE follows a similar trend from CS I to CS III.
![]() | ||
| Fig. 5 Values given in kgCO2eq per kg. Correlations and residuals analysis of case studies. (A) case study I, (B) case study II, (C) case study III, (D) case study IV. Detailed overview of results can be found in Table 2. Grey area visualizes acceptable deviation from reference data with deviation of up to +100%/−50%. Sets of all employed chemicals used for LOOV can be found in the SI. | ||
The Reaxys©-based CRN strategy in CS IV shows comparable prediction accuracy, with an RMSE of 2.97 kgCO2eq, a MAE of 2.27 kgCO2eq, but a notably higher MRE of 72.1%. Note, however, that the training set SK0 and the validation set SKLOOV are nearly twice as large as in CS III, despite containing a similar number of reactions. This is because of the way we build the CRN using Reaxys© data, which allows including more ecoinvent chemicals in the corpus. The high MRE in CS IV arises because AuLCA tends to overestimate the GWP of low-GWP molecules, which are more frequent in the Reaxys©-based CRN due to the larger number of reactions involving such compounds.
While the prior performance metrics indicated good predictive performance, the R2 values across the case studies appear comparatively modest and inconsistent. These results can be attributed to the specific distribution of the validation data. Specifically, the low variance within the validation sets significantly penalizes the R2 calculation, resulting in low values even when absolute deviations are comparable small. This effect is particularly evident in CS IV, which includes multiple low-GWP compounds with a correspondingly low variance in the validation data. Consequently, these R2 values should be interpreted in the context of the data's narrow range rather than as a lack of model accuracy.
Fig. 5 (top) shows all the predicted values for the different cases. As seen, most of the chemicals, in all four case studies, are predicted with an error in the range of +100%/−50% (grey area in Fig. 6). However, the relative number of chemicals falling outside the grey area only slightly decreases as we move to larger networks (from 33.3% in CS I to 30.0% in CS IV). As in Table 2, the trend of the violin plots (Fig. 5, bottom) shows an increasing prediction accuracy from CS I to CS III. In CS IV, however, the violin plot reveals also a slight overestimation, i.e., distribution centered slightly above zero for the reasons previously discussed.
![]() | ||
| Fig. 6 Prediction error analysis for LOOV chemicals in CS III. Similar trend observed for case studies CS I, CS II and CS IV in the SI. | ||
To identify any biases in AuLCA, we analyse the prediction error in Fig. 6 (for CS III only, the other cases, which behave similarly, are provided in Fig. 8 of the SI), finding positive and negative prediction errors of similar magnitude. Hence, no strong systematic over- or underestimation is found across all predictions, suggesting the absence of any systematic bias. Further evaluation of the results shows that overestimation with higher errors is mainly associated with large scale, industrially synthesized, olefin-like chemicals, for which very well-established routes exist. Alternatively, their footprint is predicted by AuLCA using more complex synthesis pathways, thereby leading to larger GWPs. This mostly happens in CSI I–III, as the dataset contains only patents, some of which might have never been deployed at scale and may greatly differ from current industrial practice. In addition, this often leads to more synthesis steps and larger emissions.
Moreover, in CS IV, the standard routes to produce chemicals are averaged together with others, less common (and complex) ones, thereby leading as well to overpredictions. Conversely, underestimation often occurs in some molecules (e.g., acetylene) requiring reactants (i.e., oxygen contributes to 90% of its GWP) missing in the network and, therefore, assumed through proxies (see eqn (S13) in the SI) that tend to underestimate their true value. This emphasizes the importance of the AF-ranking to produce more accurate estimates, as discussed next.
Without setting an AF threshold, all chemicals can be computed. Introducing a modest threshold of 0.25 (i.e., at least 25% of reactants and products in each reaction must be known) reduces coverage: CS II and CS III are slightly affected, with 79% and 82% of chemicals still computable, whereas CS I drops to 68%. This effect is even more pronounced for a threshold of 0.5. Higher AF thresholds, e.g., >0.75, leave only a negligible fraction of chemicals computable for all cases.
This result indicates the poor degree of interconnections within smaller CRNs, such as in CS I. Here, most chemicals are computed using a limited number of reactions strongly affected by an AF threshold, while this does not happen to the same extent in CS II and CS III, which contain more reactions. Regardless of the case, it is obvious that approximations (eqn (1) in the SI) are required for making predictions in all networks, and less so in larger networks due to more interconnections. Notably, Reaxys© provides more alternative routes and many links between chemicals, thus being virtually insensitive to the AF threshold.
Fig. 8 provides an illustrative network example for the chemical morpholine in CS II and CS III. Imposing larger AFs leads to more convoluted networks because the algorithm tries to avoid proxies. This requires expanding further the synthesis routes to fully connect the ecoinvent chemicals with the target molecule without relying on so many approximations. Certainly, only two reactions remain the same in both cases, while the number of ecoinvent chemicals (stars) used in the predictions is similar in the two cases. The selected pathways for CS III are, hence, longer, involving more reactions and chemicals to bypass proxies (triangles).
![]() | ||
| Fig. 8 Visualization of sub-networks for CS II (green, left) and CS III (blue, right), to identify the computational graph for the case study chemical morpholine. | ||
With a larger repertoire of reactions, missing compounds can be inferred from preceding steps via impact propagation, thereby avoiding the use of less accurate proxies. In smaller CRNs, such as the 100k OS example (green, left side), fewer reactions are available, so missing chemicals cannot be inferred and must be assumed through proxies: CS III requires only two assumptions to compute morpholine's GWP, and CS II six. This observed behavior underlines the advantage of larger networks to reduce the number of necessary proxies. However, they also lead to more complex routes that might also differ from more direct ones, particularly when analyzing well-established bulk chemicals involving fewer synthesis steps like those in ecoinvent. Numerical examples indicate that avoiding proxies is more critical than minimizing pathway length, although the effect can be chemical–specific. Proxies fail to distinguish high- and low-impact compounds because they assign the average impact of the reactants. Consequently, it is preferable to predict missing compounds via longer, information-preserving routes, provided the added uncertainty remains low, as these better capture compound-specific impacts.
Energy requirements in separations clearly exceed those in the reaction. This is because of the often mild conditions at the reaction step, leading to small energy needs in contrast to the energy requirements of energy-intensive separation processes, particularly distillation. Furthermore, reaction energy impacts related to cooling duties for exothermic reactions were neglected and so were energy losses. Moreover, energy requirements in separations are more spread due to the different processes assumed for the individual reactions in a pathway.
However, our approach is inherently limited by the quality and diversity of the input chemicals in the corpus and the CRNs employed in the LCA data propagation. In particular, the distribution and variance of chemicals in the corpus could be improved by including more fine chemicals. In addition, chemical diversity and network connectivity in the CRNs could be enhanced to maximize overall coverage and, thus, the predictive power.
A key next step is the integration of multiple reaction databases, potentially combining foundational CRNs, such as in CS IV, with more specialized CRNs, such as in CS I to CS III. This integration will expand the algorithm's capabilities across a broader chemical space and allow it to operate with higher AF thresholds. Moreover, gate-to-gate estimates will have to be refined, including the footprint of solvents and catalysts while considering more accurate yields. Based on the observed predictive performance, the AuLCA algorithm could provide a robust platform for generating preliminary LCA estimates. It could be leveraged to guide sustainable decision-making, with an emphasis on the selection of synthesis routes, particularly in the early stages of chemical process development, where data is often limited. Furthermore, the next version of the AuLCA framework could be provided in the form of a toolbox for LCA practitioners and chemists to generate environmental footprint estimates for a plethora of chemicals. Due to the open data structure, AuLCA will be able to support user–defined data to be tailored to the specific setups, including the integration of multiple LCIA methods, e.g., ReCiPe 2016.
Overall, AuLCA could support early-stage design decisions in a plethora of applications, with strong emphasis on guiding the sustainable scale-up of active compounds production in the pharma industry. Specifically, automating the evaluation of alternative routes, currently performed manually, would enable chemists and process engineers to identify and prioritize the most sustainable synthesis options more efficiently, helping to advance the goals of Green Chemistry in both research and industrial practice.
Footnotes |
| † The authors M. G. H and D. J. contributed equally. |
| ‡ Copyright © 2022 Elsevier Limited except certain content provided by third parties. Reaxys is a trademark of Elsevier Limited. |
| This journal is © The Royal Society of Chemistry 2026 |