Jun
Zhang
a,
Qin
Wang
*b,
Huaqiang
Wen
a,
Vincent
Gerbaud
c,
Saimeng
Jin
a and
Weifeng
Shen
*a
aSchool of Chemistry and Chemical Engineering, Chongqing University, Chongqing 400044, China. E-mail: shenweifeng@cqu.edu.cn
bSchool of Chemistry and Chemical Engineering, Chongqing University of Science and Technology, Chongqing 401331, China. E-mail: wangq356@mail2.sysu.edu.cn
cLaboratoire de Génie Chimique, Université de Toulouse, CNRS, INP, UPS, Toulouse, France
First published on 6th December 2023
Green solvent design is usually a multi-objective optimization problem that requires identification of a set of solvent molecules to balance multiple, often trade-off, properties. At the same time, process constraints need to be addressed since solvent properties impact the process feasibility like in the extractive distillation separation process. Hence, a green solvent multi-objective optimization framework is proposed with EH&S properties, process constraints, and energy consumption analysis, where the molecular design optimization model relies upon the ability of the proposed infinite dilution activity coefficient (IDAC) direct prediction model to accurately predict process properties in addition to molecular properties. The process properties are short-cut properties of the extractive distillation process, namely selectivity and solution capacity. To this end, the proposed IDAC direct prediction model is employed to prepare molecule pairs with selectivity and solution capacity improvement constraints to train the molecular multi-objective optimization model, which can learn the optimization path from the pre-set molecule pairs and then optimize a given solvent via the prediction of a disconnection site and molecular fragment addition or removal at that site. An extractive distillation process to separate a cyclohexane/benzene mixture is taken as an example to demonstrate the proposed framework. As a result, three candidate green solvents are optimized and designed to recover benzene from mixtures of benzene and cyclohexane. The proposed green solvent multi-objective optimization framework is flexible enough to be employed in other chemical separation processes, where solvent property assessment is needed to evaluate the feasibility and performance of the processes.
Besides, in separation processes, the process feasibility sets additional constraints on the solvent. Hence, a search simultaneously combining molecular and process constraints is a challenge, which is the purpose of our study, and which would be facilitated by using model-based approaches to optimize the structure of solvent molecules. But successive optimal solvent design first followed by an optimal process design bears a risk of error propagation that could rule out the whole procedure.
In this case, we proposed a molecular multi-objective optimization model to purposefully modify the structure of solvent molecules with some drawbacks (such as EH&S negative impact) to obtain the green solvent with the desired separation performance rather than simply utilizing a molecular generative model to enlarge the chemical space for subsequent solvent screenings with multi-index constraints. The multi-objective optimization model can learn the optimization path from the pre-set molecule pairs. Every pair of molecules (Mx and My) in the pre-set molecule pairs had similar molecular structures and only had a single different disconnection site, but the scores of both selectivity and solution capacity of My were at least 20% larger than those of Mx. The prepared pre-set molecular pairs were used to train the proposed molecular multi-objective optimization model, which can learn the difference between the molecular pairs and can learn the optimization path from Mx to My. To prepare the molecule pairs, an improved deep learning-based IDAC direct prediction model trained over a COSMO-SAC database was developed for predicting the selectivity and solution capacity of the molecule pairs. The proposed IDAC direct prediction method can provide superior predictive performance compared with the IDAC indirect prediction method, which first predicted the VCOSMO and 51 σ-profile and then calculated the IDAC using the COSMO-SAC model. The indirect IDAC prediction method resulted in more information lost during the prediction and COSMO-SAC calculation processes. The improved deep learning-based direct IDAC prediction model was integrated with the molecular multi-objective optimization model to form the proposed green solvent multi-objective and multi-scale optimization framework with EH&S properties and process constraints, and energy and economic analysis. The proposed green solvent multi-objective and multi-scale optimization framework can: (1) simultaneously optimize multiple trade-off properties such as the selectivity and solution capacity of the solvent; (2) learn from the pre-set molecule pairs that have similar molecular structures but have differences in their properties of interest; (3) visualize the optimization path of the solvent's molecular structure; and (4) accurately and directly predict the IDAC of the molecules.
The paper is organized as follows: the next section (Section 2) gives a non-exhaustive overview of solvent design issues, related computer-aided approaches, and connections to some process design issues for extractive distillation processes. Section 3 describes the integrated molecular multi-objective and multi-scale optimization framework. Section 4 describes and evaluates the performance of the improved model for the direct prediction of the infinite dilution activity coefficient using deep learning techniques. Section 5 introduces the molecular multi-objective optimization model. Section 6 is an illustrative case study about solvent optimization and design for an extractive distillation process.
In support of any computer-aided solvent design approach, one needs to access solvent property values. Evaluation of the solvent thermodynamic properties requires measuring them or calculating them using property estimation models because they are more appropriate in a preliminary process design phase. Hence, property calculation or estimation models play a significantly important role in model-based solvent design methods since they can correlate molecular structures with solvent thermodynamic properties. For any property of interest in a real process, there exist a variety of property models, and choosing the most suitable models is a key step.9 Each model bears different accuracies, predictive capabilities, and computation costs.10 The property estimation methods mainly include descriptor-based methods,11 group-contribution (GC) methods,12 quantum mechanical (QM) methods,13 and deep learning (DL) methods.14,15 For example, for extractive distillation, the process we select for illustration, one real property of relevance might be stated as having a preferential affinity with one of the compounds in the mixture to be separated. It can be evaluated by various models, by comparing the similarity of the Hansen solubility parameter values between the solvent and the molecule of interest, a simple correlative model with no access to temperature dependency; by solving the thermodynamic phase equilibrium for computing solubility with temperature dependency; or by comparing interaction surface potentials, like the COSMO sigma potential curves, which requires quantum mechanics calculations. The GC method is one of the most widely utilized and efficient techniques to evaluate macroscopic physicochemical properties. However, the performance of first-order GC models, with contributions regressed over experimental data directly related to the occurrence of simple chemical groups like –CH2, –OH, etc., is sometimes weakened because they cannot take account of the proximity effects and distinguish between isomers.3,16 To address these issues, second- and third-order GC models have been developed for discriminating the structural isomers.17 but they are still deficient of many stereoisomers such as cis/trans isomers.18 These issues can be tackled with quantum mechanical-based (QM-based) solvation models, such as COSMO-RS19,20 and COSMO-SAC.21,22 With only a few parameters such as the surface charge density profile (σ-profile) and the cavity volumes (VCOSMO), the COSMO-based models can achieve a decent accuracy for the calculation of thermodynamic properties. However, the initial QM calculations bear a heavy computational cost and are highly time-consuming, and even unrealistic when exploring the vast search space of solvent molecules.23 To this end, the GC-COSMO techniques have been proposed as a shortcut to more efficiently access the VCOSMO and σ-profile.24,25 However, due to the inherent GC limitations, these GC-COSMO techniques not only have difficulties in appropriately handling isomers and proximity effects but are also limited in the variety of functional groups available in open-source databases. With the availability of the COSMO-type databases (e.g., the VT-200526), as alternatives, deep learning-based (DL-based) techniques27–30 can be applied as another shortcut to obtain the σ-profile and VCOSMO.31,32 However, the VT-2005 database only contains 1431 compounds, which may not be enough to train a DL-based prediction model with satisfying generalization ability. Additionally, such DL-based prediction models are developed to predict the VCOSMO and the σ-profile, and then the predicted parameters are used to calculate the IDAC. This indirect IDAC calculation process could lead to a decline in accuracy.
Once property estimation models are available, computer-aided molecular design33 (CAMD) is an effective approach for screening existing solvents and designing new ones. In CAMD, pre-prepared molecular functional groups are assembled to generate potential solvents through mixed integrated linear programming (MILP) or mixed integrated non-linear programming (MINLP) or stochastic algorithms with objective functions and constraints (such as molecular structural, property, and process operating constraints).34–37 However, with the increase in the number of preselected functional groups, the CAMD method may face the problem of combination explosion.3,38
Recent advancements in the domain of artificial intelligence have accelerated the development and application of techniques for inverse molecular design.39–42 For instance, molecular generation models have been applied in many fields.38,43 Molecular graph generation techniques44 as an outstanding representative have become one of the most widely adopted approaches for molecular design. Recently, a fragment-based hierarchical encoder–decoder model for molecular generation was proposed by Jin et al.45 Fragments extracted from the training molecules were analogous to the molecular functional groups used in the group-contribution methods. The molecular fragments could integrate knowledge from the chemistry domain interpretability into the model.46 The molecules can be optimized by predicting a disconnection site and performing molecular fragment addition or removal at that site. However, this model cannot simultaneously optimize multiple trade-off properties of the solvent molecules. Therefore, this kind of single objective optimization model is very difficult to couple with the multi-dimensional and highly nonlinear chemical separation process. Although there are deep molecular optimization models labeled “multi-objective”,47,48 these models usually aggregate multiple objectives into a single scalar objective.
However, solvent property knowledge is only a first step in the design of a performing separation process, for which the process model can be highly nonlinear because the process feasibility is often directly related to the characteristics of the solvent. For example, there are some trade-off properties such as the selectivity and solution capacity that are not perfectly correlated and, therefore, molecular multi-objective optimization cannot be addressed by these models. Hence, some authors have explored the simultaneous design of the solvent and the process attributes in a so-called reverse engineering computer aided molecule and process design (CAMPD) approach. For example, some authors have proposed a framework for the integrated design of a solvent and extractive distillation process by solving a multi-objective optimization problem addressing constraints related to thermodynamic process feasibility, along with process operation, a process model, and molecular constraints,49 or a more rigorous rate-based model.50 In these studies, the property prediction in a molecular scale is addressed using COSMO approaches while the process model can be a pinch-based model based on a minimum solvent flow rate and minimum energy demand49 or a more rigorous rate-based model.50 The use of such process models is relevant for an accurate process design but there exist simpler criteria for assessing extractive distillation feasibility, such as solvent capacity and selectivity,51 which are further related to infinite dilution activity coefficients (IDAC), and univolatility curves.52 In this study, we propose a molecular multi-objective and multi-scale optimization framework for the combined molecular and process design with the predicted process constraints (solvent selectivity and capacity based on IDAC) where the process-related properties are directly used to train the molecular structure optimization model, with the help of deep-learning techniques.
![]() | ||
| Fig. 1 The green solvent multi-objective and multi-scale optimization framework towards the extractive distillation processes. | ||
The proposed framework for a green solvent design will be applied to an extractive distillation process to separate cyclohexane and benzene mixtures (in Section 6).
![]() | ||
| Fig. 2 The schematic diagram for IDAC calculation. (a) An indirect method (IM). (b) A direct method (DM). | ||
First, the IDACs of the 2130 compounds in benzene and cyclohexane are calculated using the COSMO-SAC model with their VCOSMO and σ-profile information from the UD database. Subsequently, the hybrid representations28,32 of the 2125 compounds (five additional compounds are used as the external validation data) are utilized as input to train the feedforward neural network for the IDAC prediction in benzene and cyclohexane (IDAC-benzene and IDAC-cyclohexane) as shown in Fig. 3. The message-passing neural network (MPNN) is a graph neural network, which consists of two phases, namely, the message-passing phase and the readout phase.28 In the message-passing phase, the MPNN updates information on the directed bonds, as shown in Fig. 3. In the readout phase, a readout function is utilized to provide a vector representation of the molecular structure. The MPNN learned descriptors mainly focus on the local information about molecular structure due to the message updating mechanism. Therefore, the molecule level 200 dimensional RDKit calculated descriptors (as shown in Fig. 3) that can capture the global information of the molecular structure are employed to integrate with the MPNN learned features to form the molecular hybrid representation, which can retain the molecular local and global information as much as possible. The data split setting for training the two proposed models is 0.8
:
0.1
:
0.1. The early stopping technique is employed to avoid overfitting. Finally, the 10-fold cross-validation (10-fold CV) method is applied to improve the stability of the two proposed models. In this study, the hidden size of MPNN, the depth of MPNN, the layer number of FNN, and the dropout of FNN are optimized using the Bayesian optimization method embedded in the Python package hyperopt.56
| Hyperparameters | Range | IDAC-benzene | IDAC-cyclohexane |
|---|---|---|---|
| Hidden size | [300,3000] | 1200 | 1300 |
| Depth | [2,7] | 6 | 6 |
| Dropout | [0,0.4] | 0.0 | 0.0 |
| Number of layers | [1,5] | 3 | 3 |
In this study, three evaluating metrics, i.e. the mean absolute error (MAE), the mean squared error (MSE), and the coefficient of determination (R2), were adopted as the evaluation criteria. The prediction performance of the IM and DM with the UD database is summarized in Table 2. In addition to the FNN model, the prediction performance using random-forest and support-vector machine approaches is also summarized in Table 2 to explore which machine learning approach is more suitable for IDAC prediction. The optimal hyperparameter combinations of the random forest and support vector machine-based approaches are detailed in Table S3.† Based on the statistical analysis, the FNN-based models (IM and DM models) had superior predictive performance over the random forest and support vector machine-based models. The performance of the 10-fold CV of the proposed DM models for the IDACs in benzene and cyclohexane prediction on the test sets was better than that of the IM predictive model.
| 10 CV MAE | 10 CV MSE | 10 CV R2 | |
|---|---|---|---|
| (a) IM 32 (FNN-based model) | |||
| IDAC-benzene | 0.1216 ± 0.0140 | 0.0720 ± 0.0163 | 0.8651 ± 0.0338 |
| IDAC-cyclohexane | 0.1755 ± 0.0180 | 0.1435 ± 0.0341 | 0.9123 ± 0.0198 |
| (b) DM (FNN-based model) | |||
| IDAC-benzene | 0.1146 ± 0.0108 | 0.0506 ± 0.0084 | 0.9036 ± 0.0128 |
| IDAC-cyclohexane | 0.1652 ± 0.0173 | 0.1126 ± 0.0262 | 0.9257 ± 0.0226 |
| (c) Random forest-based model | |||
| IDAC-benzene | 0.2224 ± 0.0198 | 0.1581 ± 0.0221 | 0.6985 ± 0.0314 |
| IDAC-cyclohexane | 0.3381 ± 0.0264 | 0.3394 ± 0.0689 | 0.7814 ± 0.0367 |
| (d) Support vector machine-based model | |||
| IDAC-benzene | 0.2516 ± 0.0164 | 0.1456 ± 0.0206 | 0.7213 ± 0.0389 |
| IDAC-cyclohexane | 0.3571 ± 0.0263 | 0.2965 ± 0.0476 | 0.8089 ± 0.0196 |
In addition to the above-mentioned statistical analysis, five molecules in the external validation dataset were utilized as examples to evaluate the ability of the proposed predictive models to discriminate the stereoisomers and structural isomers and to deal with complex molecules. In this work, the external validation dataset consisted of N,N-diethylaniline (complex compounds), P-xylene and O-xylene (structural isomers), and cis-3-hexene and trans-3-hexene (cis/trans isomers). The heteroatomic nitrogen in N,N-diethylaniline has an inducing effect on the delocalized π electron system of the aromatic ring, which could lead to a poor prediction performance by some quantitative structure–property relationship (QSPR) models.31o-Xylene and p-xyleneare a pair of structural isomers and trans-3-hexene and cis-3-hexene are a pair of cis/trans isomers. The predictive performance of the IM and DM models is tabulated in Table 3. Regarding N,N-diethylaniline, the proposed DM models can achieve a better predictive performance than the IM models. Regarding the structural and cis/trans isomers, both the IM and DM models have a satisfactory ability to differentiate isomers. Additionally, we visualized the chemical space of the training and external dataset by projecting the Morgan fingerprints (radius = 2, 1024 bits) of the molecules onto the 2D space (as shown in Fig. 4) via the t-SNE approach.57 As shown in Fig. 4, the five external data were not very similar to the well-represented molecules in the training dataset. Moreover, the five external data were scattered in different regions of the chemistry space of the training dataset. Therefore, the proposed DM models had decent IDAC predictive performance and good generalization ability.
| Compounds names | IDAC-cyclohexane | IDAC-benzene | ||||
|---|---|---|---|---|---|---|
| QM derived values | IM | DM | QM derived values | IM | DM | |
| N,N-Diethylaniline | 0.3215 | 0.2398 | 0.3028 | −0.1047 | −0.0745 | −0.1033 |
| p-Xylene | 0.2230 | 0.2194 | 0.2295 | 0.0008 | 0.0109 | −0.0004 |
| o-Xylene | 0.2654 | 0.2683 | 0.2472 | 0.0007 | −0.0037 | 0.0014 |
| cis-3-Hexene | 0.0550 | 0.0505 | 0.0463 | 0.1860 | 0.1847 | 0.2072 |
| trans-3-Hexene | 0.0457 | 0.0475 | 0.0505 | 0.2325 | 0.2165 | 0.2183 |
Based on the predictive performance analysis mentioned above, the proposed DM models had a better generalization ability than the IM models. Additionally, the proposed IDAC prediction models can discriminate the isomers, including the isomers and cis/trans isomers, and can deal with complex compounds such as hetero-atom compounds.
179
477 compounds in the processed ChEMBL dataset. Each compound was restricted to contain 10–50 root atoms and only had atoms in {H, B, C, N, O, F, Si, P, S, Cl, Br, and I}. The training molecule pairs were constructed as follows. First, 18
155 molecules were identified from the processed ChEMBL dataset with root atoms of not more than 12. We adopted the 12 root atom threshold because larger molecules usually have a higher normal boiling point, and a molecule with a high normal boiling point is not suitable for use as a solvent to separate the benzene and cyclohexane via extractive distillation. Second, C18
1552 = 164
792
935 molecule pairs (Mx and My) were constructed from 18
155 processed molecules. Third, 1
590
350 molecule pairs had similarities, sim(Mx, My) ≥ 0.4. The similarities of the molecule pairs can be measured by the Tanimoto coefficient over 2048-dimension binary Morgan fingerprints with radius 1. The similarity threshold was adopted because the proposed molecular optimization model needed the training molecule pairs with only one fragment different at one disconnection site, which can improve the learning efficiency of the molecular optimization model. Fourth, the DF-GED algorithm was used to extract molecule pairs that had only one fragment different at one disconnection site, which can improve the learning efficiency of the molecular multi-objective optimization model. 100
629 molecule pairs were extracted from 1
590
350 pairs of molecules. Fifth, among the 100
629 pairs of molecules, we selected the molecule pairs that met the following property constraints: for selectivity, the selectivity score of My should be improved by at least 20% compared with Mx in a molecule pair, that is,![]() | (1) |
![]() | (2) |
As a result, 35
496 molecule pairs (detailed in Table S4 in the ESI†) were identified that can be used as the training data with the property constraints.
In the hierarchical encoding process, a molecule can be represented by a hierarchical graph with three layers,45i.e., an atom layer, an attachment layer, and a fragment layer, as seen in Fig. 5a. The details of the fragment extraction approach and the hierarchical encoding method were introduced in the studies presented by Jin et al.45 and Chen et al.46 In the hierarchical molecular representation framework, a molecule graph
can be represented as a set of fragments
, and their attachments
. Each attachment
in this layer denotes a specific attachment configuration of fragment
, including the connection information between
and one of its neighbor fragments. In the atom layer, a molecule can be depicted as graph
, where
and
represent the atoms and corresponding bonds in Mx. In the attachment layer, molecule
is constituted by a series of fragments
extracted from the Mx. In the fragment layer, a molecule Mx is represented as a tree-constructed graph
. The tree-constructed representation can be depicted as
,44 where all the fragments in Mx are extracted as nodes in
; nodes with the same atoms are connected with edges in
. The encoder encodes the molecule pairs (Mx, My) as graph (
and
) using message passing networks, and as a tree-constructed graph (
and
) using tree message passing networks.
![]() | ||
| Fig. 5 The schematic diagram of the hierarchical molecule (a) encoder, (b) decoder, and (c) multi-objective optimization process. | ||
In the hierarchical decoding process, the decoder conducts a series of modified operations that optimize Mx into My, as seen in Fig. 5b. The details of the hierarchical decoding method are introduced in the studies presented by Jin et al.45 and Chen et al.46 First, the decoder performs disconnection attachment prediction (DAP) to find an attachment
in
as the disconnection site. Second, at the neighbors of
, the decoder performs fragment-removing prediction (FRP) to remove fragments attached to
. Third, an intermediate representation (IMR) for the remaining scaffold
is produced after the fragment removal operation. Fourth, over
, the decoder conducts new fragment attachment (NFA) prediction iteratively to optimize Mx into My. The optimal graph edit paths can be identified by the DF-GED algorithm.60
By learning from the selectivity and solution capacity of improved molecule pairs (training molecule pairs), the hierarchical molecular multi-objective optimization model can realize the multi-objective optimization of the solvent molecules as illustrated in Fig. 5c.
Taking the five common industrial solvent molecules as inputs of the molecular multi-objective optimization model, 20 optimized solvent molecules are generated for every single widely used solvent (as seen in Fig. 4a–e) via the trained molecular multi-objective optimization model introduced in Section 4. Accordingly, 100 optimized solvent molecules are generated as tabulated in Table S5 in the ESI.†
| Namesa | Smiles | Structure | Melting point/K | Boiling point/K |
|---|---|---|---|---|
| a The names correspond to the serial numbers in Fig. 6. | ||||
| a17 | O Cc1ccc(O)c(O)c1 |
|
413.15 | 550.15 |
| a18 | Cc1coc(C O)c1 |
|
303.15 | 455.15 |
| a19 | Cc1ccc(C O)o1 |
|
293.15 | 460.15 |
| b2 | CSc1cc( O)[nH]c( O)[nH]1 |
|
523.15 | 578.15 |
| b15 | O S1( O)C C2NCNC2C1 |
|
473.15 | 564.15 |
| b20 | O S1( O)C C2NC NC2C1 |
|
473.15 | 547.15 |
| d2 | O C(O)CCC( O)O |
|
461.15 | 546.15 |
| d19 | CCCCC(C) O |
|
217.65 | 400.75 |
| e7 | O C(O)CC1CCC( O)N1 |
|
430.15 | 570.15 |
| e16 | O C(O)CC1CC(CS)NC1 O |
|
430.15 | 559.15 |
To further screen solvents that would make the extractive distillation process feasible, the residue curve analyses of the 3 screened solvents were conducted and the results are shown in Fig. 7. According to a review by Gerbaud et al.,52 the combined analysis of residual curve maps (RC) and univolatility line can help evaluate whether a solvent is suitable formixture separation via extractive distillation, or not.68 As illustrated in the RC maps, every single curve originates from the azeotrope point and terminates in the pure component. Additionally, there is one distillation region for each of the three RC maps. In the residue curve map, A or B is a saddle point of the distillation region and cannot be obtained by azeotropic distillation. On the other hand, the univolatility line splits the ternary diagram into two volatility order regions for all three solvents. With the feeding of the solvent at another location than the main feed, the extractive distillation process enables the most volatile component in the volatility order regions to be obtained where the solvent is found.52 This is the case for cyclohexane with the 3 green candidate solvents. Therefore, it is possible to separate the benzene/cyclohexane mixtures as pure products, first by removing cyclohexane from the extractive distillation column, then by recovering benzene as a distillate from the regeneration column where a high-purity solvent is obtained at the bottom and then recycled to the extraction distillation column. The intersection point xp of the isovolatility curve with the triangle edge largely determines the minimum usage of the solvent.52,69,70 The lower the xp, the less the amount of solvent required. As we can see, the mole amount of 2-hexanone used is more than that of 4-methyl furfural and 5-methyl furfural. The results of the combined residue curve and univolatility analyses can further prove that the proposed IDAC predictive models can achieve reliable and accurate prediction performance.
![]() | ||
| Fig. 7 The residue curve maps of (1) 4-methyl furfural, (2) 5-methyl furfural, and (3) 2-hexanone in the cyclohexane (A)/benzene (B) mixtures. | ||
Additionally, the information on the rat oral and bioconcentration factor is tabulated in Table 6. The results indicate that there is a trade-off between energy consumption and sustainable performance (such as EH&S properties), where a decrease in energy consumption usually comes at the expense of sustainability. The toxicity of 4-methyl furfural and 5-methyl furfural is reduced by about 95% compared with furfural. The bioconcentration factor of 2-hexanone is reduced by about 62% compared with DMF. Policies worldwide are moving the application of chemical separation processes in the direction of green chemistry.6 It is worth noting that the reboiler temperature of the extraction and regeneration columns of 2-hexanone is lower than 150 °C. However, the reboiler temperatures of the extraction and regeneration column of 4-methyl furfural and 5-methyl furfural are both higher than 150 °C. This means that the reboiler using 2-hexanone can use low pressure steam while the reboiler using the other two solvents needs to use medium pressure steam.
In summary, 4-methyl furfural, 5-methyl furfural, and 2-hexanone can be used as candidate green solvents to isolate mixtures of cyclohexane and benzene with extractive distillation. In this study, to evaluate the validity of the green solvent multi-objective optimization framework, only 20 molecules were generated from every widely used solvent. More candidate green solvents will be identified if more molecules are optimized and generated for every widely used solvent.
A deep hierarchical molecular multi-objective optimization model was developed to learn the optimization path from our pre-set molecule pairs (Mx and My) and generate new solvents by fragment addition or removal. Every pair of molecules in the pre-set molecule pairs had similar molecular structures, but the scores of both selectivity and solution capacity of My were at least 20% larger than those of Mx. To prepare the molecule pairs, an improved deep learning-based IDAC direct prediction model trained over a COSMO-SAC database was developed for calculating the selectivity and solution capacity of the molecule pairs. The IDAC direct predictive model with the ability to discriminate stereoisomers achieved a better prediction performance than the IDAC indirect predictive model. As a result, 35
496 molecule pairs were identified that can be used as training data to train the deep hierarchical molecular multi-objective optimization model. Finally, the proposed IDAC prediction model and molecular multi-objective optimization model were integrated into a green solvent multi-objective and multi-scale optimization framework with EH&S properties and process constraints.
The proposed green solvent multi-objective and multi-scale optimization framework was applied to an extractive distillation process to separate the mixtures of cyclohexane and benzene. The results showed that 4-methyl furfural, 5-methyl furfural, and 2-hexanone can be utilized as candidate green solvents. Among the three solvents, 4-methyl furfural and 5-methyl furfural are derivatives of furfural. Interestingly, the branching of methyl to the furan ring could significantly reduce the toxicity of furfural. This could be due to the steric effect resulting from the aromatic ring substitution. 2-Hexanone was obtained by optimizing the structure of DMF. The dialkylation of the carbonyl carbon in DMF can not only improve the selectivity and solution capacity but also reduce the ecological hazards. This is because amide compounds play a very important role in the growth and metabolism of microorganisms and help microbes get enough protein and other important metabolites, thus promoting their growth and reproduction, which could have a negative impact on the environment.
Footnote |
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3gc04354a |
| This journal is © The Royal Society of Chemistry 2024 |