Enhancing molecular safety and health assessment via index smoothing and prioritisation
Received
14th August 2017
, Accepted 17th October 2017
First published on 17th October 2017
In recent decades, safety and health aspects have received much attention in the field of computeraided molecular design (CAMD). To satisfy the more stringent requirements of regulating agencies, a product must not only achieve its desired function as defined by the customers but also meet safety and health criteria to ensure that the risk posed to users can be minimised to acceptable levels. One of the promising methods to efficiently measure the safety and health aspects of chemical substances is the application of inherent safety and occupational health indexes. These indexes consist of multiple safety and health hazard parameters to assess the physicochemical properties of molecules. The properties are divided into multiple subranges, each of which is assigned a penalty value that corresponds to its degree of hazard. In this work, the penalty values are computed to ensure a smooth transition from one category to another. The quantification of the overall safety and health performance is improved by combining the ordered weighted averaging (OWA) operator method with the analytic hierarchy process (AHP). A case study on solvent design for carotenoid extraction from palm pressed fibre (PPF) illustrates the proposed methodology.
Design, System, Application
The proposed CAMD framework aims to generate molecules that are optimised with respect to property functionalities and the aspects of safety and health. The molecular functionalities are defined by specifying a set of target properties to ensure that the generated molecules are able to demonstrate the desired performance and characteristics. Meanwhile, the safety and health attributes of the molecules are evaluated by inherent safety and health indexes, which utilise a scoring scheme to reflect on the inherent hazard level of the molecules. The main highlight of this work is to enhance the scoring scheme of the indexes and improve the approach to quantify the overall hazard level displayed by a molecule. Structural constraints are introduced to ensure that only feasible molecules with no free bonds can be generated. An optimisation technique known as the fuzzy optimisation formulation is employed to simultaneously optimise the target properties and the overall safety and health index score. This methodology is capable of serving as a screening tool for any molecular design problem to synthesise molecules with high property functionality performance and at the same time exhibit favourable safety and health characteristics.

1. Introduction
In chemical process design and development, the selection of chemicals has a crucial impact on plant safety and occupational health. In cases where hazardous chemicals are utilised in a process, the most common approach to handle such risks is to install an addon layer of protection (LOP) via technological safety barriers. However, several past accidents such as the Flixborough (1974) and Piper Alpha (1987) disasters have occurred due to the failure of the LOP in controlling hazards. This is because safety barriers do not actually eliminate hazards.^{1} Instead of merely mitigating the hazards, the best way is to handle these safety and health concerns by eliminating them at their sources.^{2} Thus, one important approach to this problem is to replace hazardous chemicals with less harmful ones.
The use of chemicals with intrinsically reduced risks is one of the basic principles of inherent safety design (ISD), which was proposed by Kletz^{3} to eliminate or minimise hazards present in the plant. The fundamental aim of ISD is to avoid the use of hazardous substances, minimise the inventories of such substances, and consider simpler processes with milder process conditions.^{4} Many research works have been conducted to evaluate inherent safety in the process plant, mainly the selection of a chemical process route during the early design stage. Edwards and Lawrence^{5} were the first to develop the Prototype Index for Inherent Safety (PIIS) by using several safety parameters to measure the safety level of different process routes in the conceptual design stage. These parameters (or subindexes) were then formulated into an index scoring scheme to quantify the hazard level of the process routes. Following this first attempt, Heikkilä^{6} extended the PIIS by introducing the Inherent Safety Index (ISI) to also assess the inherent safety in process route selection, but during the preliminary design phase. A wider range of safety subindexes was considered in ISI.
Meanwhile, the goal of inherent occupational health is derived from inherent safety, which is mainly to reduce occupational health risks to workers in the workplace environment. Hassim and Edwards^{7} presented the Process Route Healthiness Index (PRHI) for the assessment of occupational health hazards posed by alternative chemical process routes. Hassim and Hurme^{8} then developed the Inherent Occupational Health Index (IOHI) for the evaluation of occupational health hazards by focusing on process conditions and health properties of chemicals during the R&D phase. Likewise, IOHI employs several health subindexes to measure the health risk level through the allocation of penalty scores to the alternative process routes.
To adopt the concept of inherent safety and occupational health, the hazardous chemicals used in a process route have to be replaced with less harmful ones. The computeraided molecular design (CAMD) technique has been adequately employed for the exploration of molecular structures with minimised risk that are also best suited for a particular chemical process.^{9} The two main design criteria in the CAMD framework are product functionalities and its inherent safety and health level. Chemicalrelated subindexes from the inherent safety and health indexes are adopted to assist in the design of an inherently safer and healthier molecule. For each subindex, the physical and chemical properties of molecules are used to measure the potential hazards. The properties in these subindexes are divided into multiple subranges, and numerical penalty scores are then assigned to the subranges. The general profile for the assignment of subindex scores is shown in Fig. 1.

 Fig. 1 General trend for subindex score allocation.  
As shown in Fig. 1, p_{L} and p_{U} are the lower and upper bounds of the feasible property range, respectively. There are two property boundary values in the graph, namely p_{b1} and p_{b2}, which are the points where the scores change from one value to another. This score allocation method has two weaknesses. First, property values that fall in the same subrange are deemed to possess a similar hazard level. For instance, using the example in Fig. 1, there are three different molecules with property values of p_{1}, p_{2} and p_{3} plotted in the graph. Both p_{1} and p_{2} fall in the subrange of “p_{b1} to p_{b2}.” These two property values are considered to have the same hazard level, as they share a similar subindex score of I_{B}. However, as p_{1} is located near the lower edge of the subrange while p_{2} is almost at the other edge of the subrange, they should not be considered to exhibit the same level of hazard. Secondly, the other main limitation of this current score allocation method is the discontinuity at the boundary values, which may distort the comparison of alternatives that are near these limits. This discontinuity may create an artificial distinction between otherwise similar molecules that belong to adjacent categories near a sharp boundary. This issue may affect the reliability of CAMD in generating a molecule with minimised safety and health risk. The two molecules with property values of p_{2} and p_{3} as shown in Fig. 1 are used to illustrate this limitation. Both p_{2} and p_{3} are located considerably nearer to each other as compared to the distance between p_{1} and p_{2}. However, the former pair possesses different subindex scores as they are separated by a sharp boundary where the score changes abruptly. As a result, p_{2} and p_{3} fall in two adjacent subranges even though they have a relatively small difference in property value. Hence, there is a need to revise the scores at the edges of the subrange to enhance the representation of the molecular hazard level.
In addition, the current method of quantifying the inherent risk level of a molecule is by summing up all the subindex scores from the selected safety and health subindexes. Since the aim of CAMD is to find a suitable molecular candidate with reduced hazard, one of the design objectives is to minimise the summation of all subindex values for reduced hazards.^{9} However, a molecule with the lowest total index score may not necessarily display a low risk level with respect to all subindexes. This is because one of its subindexes may exhibit a very high score, while the remaining subindexes have relatively lower scores to compensate for the highscoring subindex. In reality, any molecule demonstrating a highly hazardous attribute with respect to a specific subindex (e.g. highly toxic) is usually deemed a dangerous chemical, and that particular molecule is therefore screened out in the chemical selection phase. In addition, this summation of scores method presumes that all subindexes have equal impact on the plant safety and occupational health of the workers. Hence, there is a need to address this limitation on hazard quantification so that the more hazardous or greater impact subindex will be penalised to a greater extent.
In this paper, the two weaknesses resulting from the current index scoring scheme will be addressed by smoothing the scorings near the boundary values. This eliminates the allocation of similar scores within a particular subrange and removes the sharp property boundaries. This modification of index scoring is able to enhance the comparison of molecular hazard levels in CAMD. As for the limitation on hazard quantification, a weight factor will be allocated to each subindex when determining the overall risk level of the molecules. This ensures that the severity of each subindex is taken into account when the CAMD programming is searching for a lowrisk molecule. However, one challenge here is to identify a systematic technique to decide on the appropriate values for the weight factors. This paper will explore the utilisation of the analytic hierarchy process (AHP), which is a structured multiattribute decision analysis approach, to determine a set of suitable weights assigned to the subindexes.
To summarise, this work will improve the CAMD framework with the integration of safety and health aspects developed by Ten et al.^{9} The two improvements made are the smoothing of subindex scorings at their respective property boundary values and the allocation of weight factors to each subindex to distinguish the impact level of different subindexes. The structure of this paper is organised as follows. Section 2 presents the background and methodology of the CAMD approach. The consideration of safety and health aspects into CAMD by different works is also discussed. Section 3 details the concept and mathematical algorithm of AHP. Section 4 illustrates the methodology of this work which combines AHP with the ordered weighting operator (OWA) for determining the weighting factor for each subindex as shown in section 4.3. The OWA approach enables conditional weighting (i.e., a priority can be calibrated based on the weaker features of any given alternative, thus resulting in a more conservative solution). The smoothing of subindex scores is depicted in section 4.5. Section 5 presents the application of the proposed methodology in a case study, which is to identify the best solvent to extract carotenoids found in the residual oil in palm pressed fibre (PPF).
2. Computeraided molecular design (CAMD)
Over the years, the concept of safety, health and environment criteria has received much attention in the increasingly demanding global business environment. One way to attain sustainable development in the chemical industries is the discovery and development of new chemicals such as industrial solvents, refrigerants, and drugs to increase the process efficiency while also minimising the potential adverse impacts on safety, health and the environment. In these days, the exploration of new chemicals can be carried out using the CAMD technique, which is deemed to be an efficient tool for the rapid search of a molecule or molecular structure from a given set of molecule building blocks that matches the desirable target properties.^{10} In order to estimate the target properties of molecules, most CAMD methods have relied on group contribution based property prediction methods. The general form of a group contribution method (GCM) model developed by Marrero and Gani^{11} is given by the following equation: 
 (1) 
where C_{i} is the contribution of the firstorder group of type i with N_{i} occurrences, D_{j} the contribution of the secondorder group of type j with M_{j} occurrences and E_{k} the contribution of the thirdorder group of type k with O_{k} occurrences. The lefthand side of eqn (1) is a function f(P) to estimate property P, which is distinct for each property P.
For the past two decades, numerous works on CAMD have considered the aspects of safety, health and environment when designing chemicals. Some earlier works have considered the replacement of chlorofluorocarbons (CFCs) as refrigerants with less harmful substitutes, since the use of CFCs has resulted in the depletion of the ozone layer. Duvedi and Achenie^{12} presented a mixed integer nonlinear programming (MINLP) approach in the search for alternative safe refrigerants to replace Freon 12. Karunanithi et al.^{13} imposed safety property constraints to develop solvents for solution crystallisation. Papadokonstantakis et al.^{14} considered several design factors such as thermodynamic performance, environmental impact, hazard evaluation, and economic analysis with predictive molecularbased approaches, CAMD, and process modelling for the synthesis of a solventbased postcombustion CO_{2} capture system. PalmaFlores et al.^{15} adopted safety and health parameters for the working fluid design of a lowtemperature waste heat recovery with the organic Rankine cycle (ORC). Gerbaud et al.^{16} introduced health, safety and security requirements for the substitution of aprotic highly dipolar solvents with itaconic acid derivatives. In most of these CAMD problems, the safety, health and environmental properties are implemented as constraints or limits, where molecules that do not satisfy the constraints are excluded. However, Lampe et al.^{17} stated that a tradeoff between the performance of the excluded molecules and the investment cost for the installation of safety barriers should be assessed. Thus, simultaneous optimisation of product performance and their safety and health attributes as presented by Ten et al.^{9} is considered in this paper for synthesising molecules that meet both criteria.
3. Analytic hierarchy process (AHP)
As mentioned in section 1, the AHP method is applied for the allocation of weight factors to all safety and health subindexes. The AHP is described by Saaty^{18} as a theory of relative measurement that is now widely used for many decisionmaking problems. The application of AHP has three main principles, namely, problem decomposition, comparative judgements and synthesis of priorities. The stage of decomposition begins by breaking down a problem into smaller elements and structuring them into a hierarchy with different levels, where each level contains a finite number of elements. The top of the hierarchy is usually the objectives of the decisionmaking problem, the intermediate levels are represented by several criteria on which the subsequent levels depend, while the lowest level contains a list of potential alternatives. Each element in the hierarchy will serve as the criterion for all elements of the level below.^{19}
In the stage of comparative judgements, pairwise comparisons of the relative strength or importance of n elements in the same hierarchy level are carried out with respect to a common aspect in the level above. The decision maker can compare any two elements (e.g. E_{i} and E_{j}) and assign a numerical scale a_{ij} as the ratio of their relative importance (i.e., w_{i}/w_{j}). If two elements being compared have equal importance, then a_{ij} would be given a value of one. If element E_{i} is considered to have higher importance than E_{j}, then a_{ij} would be greater than one. Once n(n − 1)/2 pairwise comparisons are completed among all elements, a positive reciprocal square matrix containing the comparative judgments is obtained as shown in eqn (2).

 (2) 
In the comparison matrix, the reciprocal property a_{ji} = 1/a_{ij} (where a_{ij} > 0) always holds true for j = 1, 2, …, n and i = 1, 2, …, n, in which n represents the number of elements in the level. A measurement scale of 1 to 9 as shown in Table 1 is used to assign the numerical value to each a_{ij}.^{20} For instance, if element E_{i} is deemed to be strongly more important than E_{j}, then its a_{ij} value would be 5. Its corresponding reciprocal, a_{ji}, will then be equivalent to 1/5.
Table 1 The fundamental AHP scale^{20}
Numerical scale 
Definition (explanation) 
1 
Equal importance (two elements contribute equally to the objective) 
3 
Moderate importance of one over another (experience and judgement slightly favour one element over another) 
5 
Essential or strong importance (experience and judgement strongly favour one element over another) 
7 
Very strong importance (an element is strongly favoured and its dominance demonstrated in practice) 
9 
Extreme importance (an element is strongly more favoured over another with the highest possible order of affirmation) 
2, 4, 6 and 8 
Intermediate values between the two adjacent judgements (the judgement falls between two levels) 
Reciprocals 
If element i has one of the above numbers allocated to it when compared with element j, then j has the reciprocal value when compared with i 
The relative strength or importance of all elements being compared has to be identified from the comparison matrix. To do this, a vector of ratioscale priorities or weights, w, has to be computed from the matrix. First, in order to determine vector w, Saaty^{21} has suggested the principal eigenvalue method as shown in eqn (3):
where
A is the pairwise comparison matrix and
λ_{max} is the maximum eigenvalue of matrix
A. The solution for vector
w is determined numerically by raising the matrix
A to a sufficiently large power, then summing over the rows and normalising them to obtain the weight vector
w = (
w_{1},
w_{2}, …,
w_{n})
^{T}. The summation of the weights in vector
w is equivalent to one. The value of
λ_{max} can then be identified through
eqn (3). The next step is to determine the consistency of the comparison matrix. A matrix is considered to be perfectly consistent when its elements fulfil the following condition:

a_{ij}·a_{jk} = a_{ik} ∀i,j,k  (4) 
Given a problem with three elements where element E_{i} is considered to be more important than E_{j}, and E_{j} is more important than E_{k}. The problem is deemed to be inconsistent if element E_{k} has higher importance than E_{i}. Saaty^{21} has introduced eqn (5) and (6) to measure the extent of the deviation from consistency of the comparison matrix:

 (5) 

 (6) 
where CR is the consistency ratio, CI is the consistency index, and RI is a random consistency index that can be referred to Saaty.
^{21} The value of CR should be less than 0.1 or 10% for the deviation from consistency to be acceptable. If the calculated CR does not fall within this range, decision makers are asked to revise their comparative judgements. Since the AHP numerical scale as shown in
Table 1 ranges from one to nine, it is suggested that one should not consider more than seven elements for pairwise comparison in order to ensure the validity of numerical comparisons. In case of a problem with a large number of elements, hierarchical decomposition should be carried out by grouping the elements into comparability classes of approximately seven elements each.
^{19}
In the synthesis of priorities stage, the priorities are determined for each level beginning from the second level to the bottom level. Given an example decisionmaking problem which contains three levels is shown in Fig. 2. The top level is represented by the objective. In its second level, there are two criteria known as C1 and C2, in which the local priorities calculated by pairwise comparison with respect to the objective are 0.667 and 0.333, respectively. Meanwhile in the third level, each criterion is provided with two alternatives known as A1 and A2. If the local priorities of A1 and A2 with respect to criterion C1 were 0.75 and 0.25, respectively, while the local priorities of A1 and A2 with respect to C2 were 0.2 and 0.8, respectively; then the global priorities for A1 and A2 are synthesised by multiplying the local priorities by the priority of their respective criterion in the level above and then adding them for each element in a level according to the criteria it affects.^{19} Thus, the global priority for A1 can be calculated by (0.667 × 0.75) + (0.333 × 0.2) = 0.567, while the global priority for A2 is determined by (0.667 × 0.25) + (0.333 × 0.8) = 0.433. Alternative A1 is chosen as the preferred solution as it has a larger global priority than A2. In this work, the application of AHP to determine the weight factors is further illustrated in section 4.3.

 Fig. 2 Hierarchy for an example decisionmaking problem.  
4. Methodology
4.1 Design objective and problem formulation
In this first stage, the design goal of a chemical product design problem is specified. In order to achieve the goal, suitable target properties are identified to ensure that the final synthesised product can function and behave in the desired manner. In most molecular design problems, the target properties are expressed by the physicochemical properties of the molecule. The target property will be selected either as an objective function to be optimised or as a property constraint to be fulfilled. A property range is introduced into each target property as shown in eqn (7) to ensure that the product can function optimally: 
v^{L}_{p} ≤ V_{p} ≤ v^{U}_{p} ∀p ∈ P  (7) 
where p represents target property, V_{p} denotes target property value, while v^{L}_{p} and v^{U}_{p} are the lower and upper bounds of the range, respectively.
4.2 Safety and health assessment
In this stage, the evaluation tool to measure the safety and health characteristics of the molecule is determined. As mentioned in section 1, all inherent safety and health indexes are developed by identifying parameters that have a significant contribution to the arising of adverse safety and health impacts. In this section, the parameters are adopted to measure the safety and health performance of the molecule. For the safety aspect, some of the prominent inherent indexes are the PIIS,^{5} ISI,^{6} and iSafe.^{22} Meanwhile for the health aspect, the two established inherent indexes for the early design stage are the PRHI^{7} and IOHI.^{8} Each safety or health parameter is listed as a subindex, in which a subindex value will be allocated depending on the degree of potential hazard. A higher subindex value indicates that the molecule exhibits a higher intrinsic hazard for that particular subindex. The subindexes are mainly assessed by using the physicochemical properties of molecules. For example, the volatility subindex from the IOHI as shown in Table 2 is evaluated by using the boiling point of a molecule. The boiling point can be estimated through the application property prediction method. The allocation of subindex value depends on the input value of boiling point. This type of numerical assessment can be integrated into the mathematicalbased CAMD optimisation model.
Table 2 Volatility (I_{V}) subindex^{8}
Factor 
Scoring detail 
Subindex value 
Volatility, I_{V} 
Liquid and gas 

Very low volatility (T_{b} > 150 °C) 
0 
Low (150 °C ≥ T_{b} > 50 °C) 
1 
Medium (50 °C ≥ T_{b} > 0 °C) 
2 
High (T_{b} ≤ 0 °C) 
3 
The selected safety and health subindexes along with their corresponding properties being assessed are summarised in Table 3. The subindex of toxicity is not chosen as one of the safety subindexes since it is evaluated by the subindex of exposure limit in the health aspect. The subindex scores for the two safety subindexes are taken from the NFPA flammability rating and ISI. Meanwhile, the subindex scores for the five chosen health subindexes are taken from the PRHI, IOHI and NFPA health hazard rating.
Table 3 The selected safety and health subindexes with their properties
Subindex 
Property being assessed 
Safety 

Flammability, I_{FL} 
Flash point (F_{p}) and boiling point (T_{b}) 
Explosiveness, I_{EX} 
Upper and lower explosion limits (UEL and LEL) 
Health 

Viscosity, I_{η} 
Viscosity (η) 
Material phase, I_{MS} 
Boiling point (T_{b}) and melting point (T_{m}) 
Volatility, I_{V} 
Boiling point (T_{b}) 
Exposure limit, I_{EL} 
Permissible exposure limit (PEL) 
Acute health hazard, I_{AH} 
LD_{50} for acute oral toxicity 
4.3 Determination of total weighted index score
Once all the relevant subindexes are identified (see section 4.2), the intrinsic hazard level of a molecule will then be quantified. A total index score, I_{SHI}, is given to the molecule to depict the degree of its hazard. As shown by Ten et al.,^{9} the total index score assigned to a molecule is equivalent to the summation of all seven subindex scores. A lower total index score is favourable as the molecule exhibits a lower magnitude of hazard. This method of measuring hazard treats all subindexes with equal importance, as they all have a similar weight of contribution towards the final total score. Assume that there are two molecules (M1 and M2) with their subindex scores displayed in Table 4.
Table 4 The two molecules with their subindex scores
Molecule 
I
_{FL}

I
_{EX}

I_{η}

I
_{MS}

I
_{V}

I
_{EL}

I
_{AH}

I
_{SHI}

M1 
1 
1 
2 
1 
0 
4 
1 
10 
M2 
2 
2 
2 
1 
1 
2 
1 
11 
From Table 4, it is shown that molecule M1 has a lower hazard than M2 as it has a lower I_{SHI} value. However, the summation of all subindex values to calculate I_{SHI} does not really illustrate the molecular performance in each individual subindex. Among the subindexes, I_{FL}, I_{EX}, I_{EL} and I_{AH} have the most severe penalty score of four, while I_{η}, I_{MS} and I_{V} have the most severe penalty score of three. The variation in score domain among each subindex reflects the importance and impacts of the particular subindex to the plant safety and occupational health. In Table 4, molecule M1 has an extremely high penalty score for I_{EL}, which is also the most severe score for this subindex. Therefore, molecule M1 is deemed to be extremely toxic. However, it still receives a lower I_{SHI} score as its high I_{EL} score is being ‘compensated for’ by the remaining six lowscoring subindexes. Even though molecule M2 has a higher I_{SHI} value, all its seven subindex values can be considered as moderately low. Such compensation cannot be justified since in many conventional molecular design problems, a molecule which is highly hazardous in any safety and health parameter is often screened out during the decisionmaking process.
In order to address this problem, a weighting factor can be introduced to each subindex value in which a higher subindex score (higher adverse impact to the plant) will be assigned with a larger weight. This can ensure that a higherscoring subindex will have a greater contribution to the final weighted I_{SHI} value. This results in an inherently conservative weighing procedure, which can be managed by applying aggregation operators that have been developed to assist in aggregating information. Some of these methods include the max and min operators, arithmetic averaging (AA) operator, weighted AA (WAA) operator, geometric averaging (GA) operator, ordered weighted averaging (OWA) operator, etc.^{23} In this work, the weights are assigned conditionally based on the ordered position of the subindex scores. The allocation of weight in this manner can be adequately done with the application of the OWA operator method introduced by Yager.^{24} An OWA operator of dimension n is a function
with an associated
n vector
w = (w_{1}, w_{2},…, w_{n})^{T} 
where
Besides,

 (8) 
where
b_{j} is the
jth largest of the
a_{i}. The principal characteristic of the OWA operator is the reordering step, in which an argument
a_{i} is not associated with a specific weight
w_{i}, but a weight
w_{i} is associated with a specific rank
i of the arguments.
^{24} Thus, the weights determined
via AHP can be assigned conditionally to the criteria based on the logic that there should be more weight placed on the weaker and more critical features of a given alternative. This approach can thus yield a more conservative approach than simply assigning fixed weights to the criteria. For the measurement of safety and health aspects of the molecules, a subindex with a weaker performance (or a more severe score) is assigned a higher weight to penalise the high hazard condition as demonstrated by the molecules. The quantification of the overall hazard exhibited by the molecules can be done using
eqn (8), where the righthand side is the summation of all multiplication between the subindex values and their corresponding weights. Among the seven subindexes, the one with the largest penalty score is multiplied by the largest weight, while the subindex with the second largest penalty score will be multiplied by the second largest weight and so on. The function on the lefthand side is the total weighted index score,
I_{SHI,w}. In order to determine the seven weight factors, a systematic technique known as analytic hierarchy process (AHP) can be utilised. Previously, Yager and Kelman
^{25} have come out with an extension of AHP using the OWA operator, which integrates linguistic quantifiers of the OWA operator to enhance decision making considered in the AHP. Lamata
^{26} ranked the alternatives in multiattribute decision making using fuzzy AHP with OWA operators. Boroushaki and Malczewski
^{27} proposed the incorporation of a geographic information system (GIS) and the extension of AHP using the quantifierguided OWA procedure.
The purpose of AHP in this work is to determine the weights introduced to the subindexes, which represent the impact or contribution of the specific subindex to I_{SHI,w}. In the first stage of AHP, the problem is presented in the form of a hierarchy that contains several elements. The hierarchy for this weight determination problem is shown in Fig. 3.

 Fig. 3 Hierarchy for the weight determination of subindexes.  
In Fig. 3, the topmost level is the goal of this AHP problem, which is to determine the weight of the subindexes. The subindex with the highest scoring is named “rank 1 subindex” or SI_{R1}, the second highest scoring subindex is defined as “rank 2 subindex” or SI_{R2} and so on. The seven subindexes are the elements which contribute to the final I_{SHI,w} value. In the next step, pairwise comparisons are considered among all seven subindexes to assess the relative importance and impact towards the final weighted index score. The complete pairwise comparison matrix is shown in Table 5.
Table 5 Pairwise comparison matrix

SI_{R1} 
SI_{R2} 
SI_{R3} 
SI_{R4} 
SI_{R5} 
SI_{R6} 
SI_{R7} 
SI_{R1} 
1 
2 
3 
4 
5 
6 
7 
SI_{R2} 
1/2 
1 
2 
3 
4 
5 
6 
SI_{R3} 
1/3 
1/2 
1 
2 
3 
4 
5 
SI_{R4} 
1/4 
1/3 
1/2 
1 
2 
3 
4 
SI_{R5} 
1/5 
1/4 
1/3 
1/2 
1 
2 
3 
SI_{R6} 
1/6 
1/5 
1/4 
1/3 
1/2 
1 
2 
SI_{R7} 
1/7 
1/6 
1/5 
1/4 
1/3 
1/2 
1 
The first row of the matrix in Table 5 shows the pairwise comparisons made between the highestscoring subindex with the other subindexes. This highestscoring subindex is deemed to have an intermediate level of importance between “equal importance” and “moderate importance” over the second highestscoring subindex; thus, the allocated numerical scale for this comparison is two. Meanwhile, the highestscoring subindex is considered to have a very strong importance compared to the lowestscoring subindex; thus, a numerical value of seven is assigned to this comparison. The numerical values in the first column are equivalent to the reciprocal values of the first row, respectively. The diagonal values in the matrix are equal to unity as the subindex being evaluated is compared with itself.
In the synthesis of priorities stage, the vector of weights w has to be determined from the comparison matrix in Table 5. Through the maximum eigenvalue method as given by eqn (3), the weight vector w is calculated as shown below:

 (9) 
From vector w in eqn (9), the value of λ_{max} calculated in eqn (3) is 7.1955. The next step is to identify the extent of the deviation from consistency of the comparison matrix. By substituting the λ_{max} value into eqn (5), the calculated value of CI is 0.0326. According to Saaty,^{21} the value of RI is 1.32 when the number of elements n is 7. Therefore, the value of CR determined from eqn (6) is 0.0247 or 2.47%, which is lower than the 10% tolerance. Thus, the intensity of inconsistency of the matrix is acceptable. The seven weights are now introduced to eqn (8) in order to determine the value of I_{SHI,w} of molecules M1 and M2 (Table 4) using eqn (10):

I_{SHI,w} = 0.3543SI_{R1} + 0.2399SI_{R2} + 0.1587SI_{R3} + 0.1036SI_{R4} + 0.0676SI_{R5} + 0.0448SI_{R6} + 0.0312SI_{R7}  (10) 
In order to apply the OWA operator in eqn (8), the subindex scores of the two molecules have to be sorted in descending order (highest to smallest value) as shown in Table 6. A higherranking subindex score is then multiplied by a higher weight, and vice versa. The summation of the subindex scores multiplied by their corresponding weights will produce I_{SHI,w}. In the CAMD model, constraints have to be introduced to help allocate the descending order of subindex values for SI_{R1} to SI_{R7} given in eqn (10). Binary integer variables are used to formulate the constraints, which are given by eqn (11) to (14):

b_{i1}I_{FL} + b_{i2}I_{EX} + b_{i3}I_{η} + b_{i4}I_{MS} + b_{i5}I_{V} + b_{i6}I_{EL} + b_{i7}I_{AH} = SI_{Ri} i = 1…7  (11) 

b_{i1} + b_{i2} + b_{i3} + b_{i4} + b_{i5} + b_{i6} + b_{i7} = 1 i = 1…7  (12) 

b_{1i} + b_{2i} + b_{3i} + b_{4i} + b_{5i} + b_{6i} + b_{7i} = 1 i = 1…7  (13) 

SI_{Ri} ≥ SI_{Rj} i = 1…6, j = i + 1  (14) 
where
b_{ij} (for
i = 1…7 and
j = 1…7) is the binary integer variable. As shown in
Table 6, molecule M2 now has a lower
I_{SHI,w} as compared to M1 since M2 does not have any subindexes with high scoring. On the other hand, the single high score of four for molecule M1 contributes to ‘magnifying’ its extremely hazardous condition for exposure limit (
I_{EL}), which leads to higher
I_{SHI,w}. It can be noted that molecules with multiple maximumscoring subindexes will have considerably large
I_{SHI,w} values. Therefore, the introduction of weights using the OWA operator helps to enhance the final total index score to accurately represent the overall intrinsic hazard level of a molecule.
Table 6 Total weighted index score I_{SHI,w} of the two molecules
Subindex 
Molecule M1 
Molecule M2 
Weight 
SI_{R1} 
4 
2 
0.3543 
SI_{R2} 
2 
2 
0.2399 
SI_{R3} 
1 
2 
0.1587 
SI_{R4} 
1 
2 
0.1036 
SI_{R5} 
1 
1 
0.0676 
SI_{R6} 
1 
1 
0.0448 
SI_{R7} 
0 
1 
0.0312 
I
_{SHI}

10 
11 
— 
I
_{SHI,w}

2.2717 
1.8566 
— 
4.4 Identification of property prediction models
In this stage, all the properties that are listed as target properties in section 4.1 and the subindex properties shown in Table 3 in section 4.2 must be identified. Since the identity of a molecule is still unknown at this stage, property prediction methods are integrated into the CAMD optimisation model to assist in estimating all the listed properties. Based on the properties listed in Table 3, GCM models are available to estimate the flash point (F_{p}), normal boiling point (T_{b}), normal melting point (T_{m}), viscosity (η), permissible exposure limit (PEL), and LD_{50} for acute oral toxicity. PEL and LD_{50} (oral rat) can be estimated using GCM models provided by Hukkerikar et al.;^{28} while F_{p}, T_{b} and T_{m} are predicted using models proposed by Hukkerikar et al.^{29} Meanwhile, η can be determined through the GCM model developed by Conte et al.^{30} Both flammability limits, LFL and UFL, are identified by applying GCM models from Frutiger et al.^{31} The GCM models to predict the eight mentioned properties are listed in Table 7.
Table 7 GCM for selected properties
Property p 
f(p) in eqn (1) 
Universal constants 
PEL (mol m^{−3}) 
−logPEL 
— 
LD_{50} (mol kg^{−g}) 
−logLD_{50} − A_{LD50} − B_{LD50}M_{w} 
A
_{LD50} = 1.9372; B_{LD50} = 0.0016 
F
_{p} (K) 
F
_{p} − F_{p0} 
F
_{p0} = 170.7058 K 
T
_{b} (K) 
exp(T_{b}/T_{b0}) 
T
_{b0} = 244.5165 K 
T
_{m} (K) 
exp(T_{m}/T_{m0}) 
T
_{m0} = 143.5706 K 
η (cP) 
lnη 
— 
LFL (vol%) 
ln(LFL/LFL_{0}) 
LFL_{0} = 4.5315 (vol%) 
UFL (vol%) 
ln(UFL/UFL_{0}) 
UFL_{0} = 129.9552 (vol%) 
4.5 Smoothing subindex scores
Based on the established inherent safety and health indexes, the allocation of subindex scores is based on subjective scaling and weighting. The physical and chemical properties assessed in the subindexes are divided into subjective ranges, where each range is then assigned a distinct subindex score depending on the authors' judgement. As mentioned in section 1, the main limitation of this method is the discontinuity at the boundary values, which is illustrated by the scenario in Fig. 1. The comparison of two property values near the same boundary value but located at different subranges is not efficient as there exists discontinuity at the boundary value where the score switches abruptly from one value to another. This issue is addressed in this section, where the subindex scores at the boundary value regions will be smoothened to ensure that there is continuity for the allocation of subindex scores at any property value. The resulting general profile for the smoothened subindex scores is shown in Fig. 4.

 Fig. 4 Smoothened subindex scores.  
In Fig. 4, the grey regions represent the smoothened regions where linear slopes are introduced to transit the scores from one subindex value to another. The wideness of the smoothened region is determined by the value of δ. This value will identify the lower and upper bounds of the smoothened region for each boundary value. All smoothened regions should have a similar δ value for consistency purposes. To determine δ, a 10% margin of the property boundary values is applied. In general, a design factor of 10% is used for process flows to provide some flexibility in process operation.^{32} Therefore, a 10% safety/health factor is calculated for all boundary values in Fig. 4. Let us assume that the safety/health factors for p_{b1} and p_{b2} are δ_{1} and δ_{2}, respectively. From Fig. 4, p_{b2} has a larger numerical value than p_{b1}, so δ_{2} is larger than δ_{1}. As mentioned previously, a similar δ value must be applied for all boundary values. In order to ensure that all boundary values can attain a minimum 10% factor, the selected δ value should take the largest value between δ_{1} and δ_{2}. In this case, δ is equivalent to δ_{2}. In other words, the δ value applied for a particular subindex is the 10% factor of the largest boundary (numerical) value.
4.6 Disjunctive programming to assign subindex scores
In this section, the algorithm to assign a subindex score to a molecule depending on its property value is developed. Based on the modified subindex in Fig. 4, the assigned subindex score, I_{p}, is given by: 
 (15) 
Based on eqn (15), the property value p is divided into five intervals. The subindex score in each interval is defined by a distinct function. Thus, the subindex score from the lower property value bound p_{L} to the upper property value bound p_{U} is not represented by a single continuous function. An algorithm known as disjunctive programming can be employed, which uses discontinuous functions to describe abrupt changes over a certain decision variable.^{33} This algorithm works by first determining which interval a specific property value p falls in. Once the interval is identified, its corresponding discontinuous score function will be applied to calculate the subindex score to p. Binary integer variables are used to model these discontinuous functions. The I_{p} function in eqn (15) is converted to the following mixedinteger formulation using four binary integer variables (b_{1}, b_{2}, b_{3} and b_{4}):

 (16) 
where the binary integer variables are subjected to the following criteria:

 (17) 

 (18) 

 (19) 

 (20) 
The binary integer variables in eqn (16) act as switches to only select a single score function at a time to determine the subindex score. To ensure that the model allocates the correct value to the binary integer variables in order to satisfy criteria (17) to (20), the following constraints are also introduced:

[p_{L} − (p_{b1} − δ)]b_{1} < p − (p_{b1} − δ) ≤ [p_{U} − (p_{b1} − δ)](1 − b_{1})  (21) 

[p_{L} − (p_{b1} + δ)]b_{2} < p − (p_{b1} + δ) ≤ [p_{U} − (p_{b1} + δ)](1 − b_{2})  (22) 

[p_{L} − (p_{b2} − δ)]b_{3} < p − (p_{b2} − δ) ≤ [p_{U} − (p_{b2} − δ)](1 − b_{3})  (23) 

[p_{L} − (p_{b2} + δ)]b_{4} < p − (p_{b2} + δ) ≤ [p_{U} − (p_{b2} + δ)](1 − b_{4})  (24) 
4.7 Molecular group selection and structural constraints
In GCM, a molecule is formed by combining and bonding several molecular building groups, such as CH_{3}, CH_{2}, OH, CH_{3}CO, CHNH_{2}, etc. The full list for all possible firstorder molecular building groups can be found in Hukkerikar et al.^{28,29} The selection of the appropriate molecular groups is dependent on the nature of the design problem. For example, molecular groups containing the nitrogen element must be selected for the design of aminebased molecules. Next, the octet rule of structural feasibility developed by Odele and Macchietto^{34} as shown by eqn (25) is included to ensure that the generated molecule does not contain free bonds: 
 (25) 
where v_{i} is the valence of group i, and N_{i} is the number of occurrences for firstorder group i. The variable g is assigned a value of 1 or 0 for acyclic compounds and monocyclic compounds, respectively. In addition, the mathematical structural constraints proposed by Churi and Achenie^{35} are also introduced into the model to eliminate a combination of infeasible structures and to ensure that only a single molecular structure is produced.
4.8 Optimisation model formulation
In this stage, the optimisation model is developed by including all procedures from sections 4.1 to 4.7. The main design objective in this work is to maximise the product performance and to minimise the safety and health hazards posed by the molecules. Since there are multiple design criteria in this work, the CAMD problem has now become a multiobjective problem. In order to ensure that the two criteria do not compromise one another, a tradeoff between them has to be considered. Optimal solutions can be generated from a multiobjective problem through the utilisation of a fuzzy optimisation algorithm. This approach is effective in solving multicriteria problems without compromising either one of the criteria; thus, all design criteria can be optimised simultaneously. In order to achieve high product performance, the target properties selected as objective functions in section 4.1 will be optimised. On the other hand, the molecular safety and health level is measured by the total weighted index score I_{SHI,w} in section 4.3 [eqn (10)]. High performance in terms of safety and health attributes can be attained by minimising I_{SHI,w}. To apply fuzzy optimisation, first a degree of satisfaction for target property λ_{p} is assigned to each target property; while a degree of satisfaction for inherent safety and health λ_{I} is introduced into I_{SHI,w}. Both λ_{p} and λ_{I} are represented by linear membership functions as shown in eqn (26) to (28): 
 (26) 

 (27) 

 (28) 
where v^{L}_{p} and v^{U}_{p} are the property lower bound and upper bound, respectively, while I^{L} and I^{U} are the lower bound and upper bound of I_{SHI,w}, respectively. Eqn (26) is only applied for the target property to be maximised, while eqn (27) is intended to minimise the target property. For eqn (26), the aim is to maximise the target property value. When V_{p} is lower than or equivalent to v^{L}_{p}, the aim is not satisfied; thus λ_{p} would be zero. The aim is achieved when V_{p} is higher than or equivalent to v^{U}_{p}; hence λ_{p} takes the value of one. In between v^{L}_{p} and v^{U}_{p}, a linear membership function is used where λ_{p} increases linearly from zero at v^{L}_{p} to one at v^{U}_{p}. The reverse trend is applied to eqn (27) to minimise the target property. Meanwhile, λ_{I} is calculated only by eqn (28) as I_{SHI,w} is always minimised to reduce the intrinsic hazard level of the molecules.
Next, all λ_{p} and λ_{I} have to be maximised simultaneously to achieve a higher degree of satisfaction for all design criteria. In this work, the max–min aggregation method developed by Zimmermann^{36} is utilised to maximise all the degrees of satisfaction. This method assists in maximising the least satisfied degree of satisfaction, λ, to ensure that all λ_{p} and λ_{I} can be satisfied partially to at least the degree of λ. The following constraints are included for max–min aggregation, where the principal objective is to maximise the value of λ:
5. Case study: design of solvent to extract carotenoids from palm pressed fibre (PPF)
5.1 Design objective and problem formulation
Palm pressed fibre (PPF) is the residual byproduct after the extraction of crude palm oil (CPO) from palm fresh fruit bunches (FFBs). The PPF is normally burned as fuel for power generation in palm oil mills^{37} or transported to the plantation for field mulching.^{38} However, Choo et al.^{39} found that the residual oil present in PPF contains a substantial amount of carotenoids (4000–6000 ppm), vitamin E (2400–3500 ppm), and sterols (4500–8500 ppm). Currently, hexane is the conventionally used solvent for extraction as it offers a high carotenoid extraction yield and has a convenient T_{b}, which is relatively low to limit heat consumption during carotenoid recovery but high enough to prevent much solvent losses during extraction.^{40} However, hexane is toxic to aquatic life, is highly flammable and may result in fatality if swallowed or inhaled.
In this case study, the objective is to identify a solvent that can replace hexane to extract carotenoids from the residual oil found in PPF. The solvent should be able to lower the energy requirement needed for evaporation. Thus, it must have a low boiling point (T_{b}) and low heat of vaporisation (H_{v}) so less energy is needed to heat the solvent to its T_{b} and subsequently vaporise it. In addition, carotenoids should have a high solubility in the developed solvent. The Hansen solubility parameter (HSP) can be applied to calculate the solubility of carotenoids in the solvent. The three parameters in the HSP are δ_{d}, δ_{p} and δ_{h}, which represent dispersion, polar and hydrogen bonding, respectively. Another parameter known as the distance of a solvent from the centre of the Hansen solubility sphere, R_{a}, can be calculated by eqn (31).^{41}

R_{a}^{2} = 4(δ_{d,A} − δ_{d,B})^{2} + (δ_{p,A} − δ_{p,B})^{2} + (δ_{h,A} − δ_{h,B})^{2}  (31) 
In eqn (31), components A and B represent the solute (carotenoids) and solvent, respectively. A smaller R_{a} is desired as it indicates a greater affinity between the carotenoids and solvent. Next, the solvent must also achieve the desirable attributes for its safety and health aspects, such as low flammability and toxicity. These can be measured using the inherent safety and health subindexes selected in Table 3. The total weighted index score, I_{SHI,w} in eqn (10), will be used to determine the inherent hazard level posed by the solvent. A low I_{SHI,w} score is preferred for an inherently safer and healthier solvent.
Overall, the four main objective functions in this case study are to minimise T_{b}, minimise H_{v}, minimise R_{a}, and minimise I_{SHI,w}. Since the conventionally used hexane is highly flammable and toxic to aquatic life, the flash point (F_{p}) and acute toxicity (96 h LC_{50} to fathead minnow) are selected as property constraints. The lower bound of F_{p} is set at −13.3 °C, which is 10 °C higher than that of hexane (−23.3 °C). As for acute toxicity LC_{50}, the lower bound is set at 100 mg l^{−1}, which indicates that the solvent is not harmful to the aquatic environment as defined by the United Nations' Globally Harmonised System of Classification and Labelling of Chemicals (GHS).
5.2 Safety and health assessment
As mentioned in section 4.2, the safety and health assessment of the solvent are measured using the selected safety and health subindexes that include flammability (I_{FL}), explosiveness (I_{EX}), viscosity (I_{η}), material phase (I_{MS}), volatility (I_{V}), exposure limit (I_{EL}), and acute health hazard (I_{AH}). All these subindexes are taken from the inherent safety and occupational health indexes found in the literature. When developing the subindexes, the range of the subindex scores is assigned based on the importance of the specific subindex to the plant safety^{6} and the magnitude of impacts resulting from chemical exposure.^{8} A more significant subindex will be allocated to a larger scoring range. The scores of I_{FL} and I_{AH} are obtained from NFPA ratings, while scores for I_{MS}, I_{V} and I_{EL} are taken from IOHI.^{8} Meanwhile, the subindex values of I_{EX} and I_{η} are referred from ISI^{6} and PRHI,^{7} respectively. Four subindexes (I_{FL}, I_{EX}, I_{EL} and I_{AH}) contain the largest scoring range of four, while two subindexes (I_{η} and I_{MS}) have the smallest scoring range of two. It should be noted that both flammability and explosiveness are the crucial parameters affecting the safety factors of the chemical. For occupational health, the parameters of exposure limit and acute health hazard, which measure the toxicity of the chemical, will cause more significant health impacts on the plant workers.
Among the selected subindexes, their lowest scores vary from zero to one while their highest scores may vary from three to four. The scorings for all subindexes can be considered consistent as there are no subindexes with disproportionately high or low scores. In this work, the applied subindex scores will be in the initial form. No score normalisation is carried out on the seven subindexes, as such a step will cause all subindexes to be considered to have equal importance. Eqn (10) is applied to calculate I_{SHI,w}, where the seven subindex scores are first sorted in descending order. The largest score is then multiplied by the largest weight, the second largest score is multiplied by the second largest weight, and so on. The proposed method to quantify the overall hazard of a solvent ensures that a particular subindex displaying a higher hazard is penalised more; thus it has a larger impact on the I_{SHI,w} score.
5.3 Identification of property prediction models
In section 5.1, the properties chosen as objective functions and constraints are T_{b}, H_{v}, R_{a}, F_{p} and LC_{50}. R_{a} can be calculated from the three parameters of HSP: δ_{d}, δ_{p} and δ_{h}. Their corresponding property prediction methods must be identified in this stage to allow the model to estimate the property values. For T_{b} and F_{p}, their GCM models have been provided as shown in Table 7. Meanwhile, H_{v}, δ_{d}, δ_{p} and δ_{h} are estimated by methods presented by Hukkerikar et al.,^{29} while acute toxicity LC_{50} (96 h to fathead minnow) is predicted by a model introduced by Martin and Young.^{42}
5.4 Smoothing subindex scores
In this section, the smoothing of the flammability subindex (I_{FL}) is demonstrated. I_{FL} evaluates the tendency of a material to burn in air. The allocation of I_{FL} scores is shown in Table 8, where the two properties assessed by this subindex include flash point (F_{p}) and boiling point (T_{b}). There are three boundary values for F_{p} (93.4 °C, 37.8 °C and 22.8 °C), while there is only a single boundary value for T_{b} (37.8 °C). Since there is only one T_{b} boundary, the I_{FL} subindex scores can be presented under two T_{b} scenarios, with the graphical representations shown in Fig. 5.
Table 8 Flammability (I_{FL}) subindex (NFPA)
Factor 
Score information 
Penalty 
Flammability, I_{FL} 
Nonflammable 
0 
F
_{p} ≥ 93.4 °C 
1 
F
_{p} < 93.4 °C 
2 
F
_{p} < 37.8 °C 
3 
F
_{p} < 22.8 °C and T_{b} < 37.8 °C 
4 

 Fig. 5 Subindex scores of I_{FL} when (a) T_{b} < 37.8 °C and (b) T_{b} ≥ 37.8 °C.  
Fig. 5 shows the graphical illustration for the I_{FL} subindex under two T_{b} conditions. For both scenarios, the subindex scores are similar above an F_{p} of 22.8 °C. However, the scores differ below an F_{p} of 22.8 °C, in which the scores increase to four in scenario 1 (T_{b} < 37.8 °C) while remaining at three in scenario 2 (T_{b} ≥ 37.8 °C). Thus, there will only be two F_{p} boundary values in scenario 2 (37.8 °C and 22.8 °C). At all boundary values, the I_{FL} scores switch abruptly from one value to another. In this section, the scores at the boundary regions will be smoothened to address the discontinuity issue mentioned in section 4.5. The entire smoothing will be carried out in two parts; the first part will be conducted in terms of F_{p}, while the next part will be in terms of T_{b}. As mentioned previously, the boundary values for F_{p} are 93.4 °C, 37.8 °C and 22.8 °C. The 10% margins of the three boundary values are 9.34 °C, 3.78 °C and 2.28 °C, respectively. From the three margin values, the largest value, 9.34 °C, is chosen as the overall F_{p} margin. The selection of the largest margin value ensures that it attains at least a 10% factor of all boundary values. Next, a smoothing range is determined for each boundary value, where the lower bound is calculated by subtracting 9.34 °C from the boundary value, while the upper bound is determined by adding 9.34 °C to the boundary value. The smoothed regions are shown by the grey zones in Fig. 6. In these regions, the allocation of scores will be modified so that the scores can transit continuously from one value to another. For each region, a linear slope is introduced where it begins from the I_{FL} score at the lower bound and decreases linearly to the I_{FL} score at the upper bound. With the application of linear slopes, the smoothened I_{FL} scores in terms of F_{p} are shown in Fig. 7. However, there exists an overlapping of two smoothing ranges (shown by the darker grey zone) in Fig. 7(a) as the lower bound for the 37.8 °C smoothing range is lower than the upper bound for the 22.8 °C range. As a result, two linear slopes (illustrated by the two dotted lines) are present in the overlapping range. However, any property value in the feasible property range can only be represented by a single score. Hence, a composite curve is used on the darker grey region as shown in Fig. 7(a) to merge the two linear slopes in which the score transits linearly from the higher score at the lower bound (28.46 °C) to the lower score at the upper bound (32.14 °C). Since the I_{FL} scores are now smoothened in terms of F_{p}, the next step is to further smoothen the scores with regard to T_{b}.

 Fig. 6 Determining the range to smoothen in the I_{FL} subindex when (a) T_{b} < 37.8 °C; (b) T_{b} ≥ 37.8 °C.  

 Fig. 7 Smoothened I_{FL} scores when (a) T_{b} < 37.8 °C; (b) T_{b} ≥ 37.8 °C.  
As shown by the two scenarios in Fig. 5 to 7, T_{b} has one boundary value of 37.8 °C. The margin value applied in T_{b} is the 10% factor of 37.8 °C, which is equivalent to 3.78 °C. The smoothened range for T_{b} is between 34.02 °C and 41.58 °C (37.8 ± 3.78 °C). Since scenario 1 is intended for a T_{b} value below 37.8 °C, while scenario 2 represents I_{FL} scores for a T_{b} value above or equal to 37.8 °C; the scores in Fig. 7(a) now are applied for a T_{b} value below or equal to 34.02 °C, whereas the scores in Fig. 7(b) are utilised for a T_{b} value above or equal to 41.58 °C. The scores in Fig. 7(a) and (b) will now serve as the lower and upper bound values for the T_{b} smoothing range, respectively. Fig. 8 shows the combined I_{FL} scores from Fig. 7(a) and (b).

 Fig. 8 Combined I_{FL} scores.  
From Fig. 8, at F_{p} below 32.14 °C, the I_{FL} scores at or below the lower bound of the T_{b} smoothened range (34.02 °C) differ from the scores at or above the upper bound (41.58 °C). The two scores then converge at F_{p} of 32.14 °C, and the scores remain similar above that F_{p} value. Therefore, for a molecule with F_{p} above or equivalent to 32.14 °C, the I_{FL} score is only dependent on its F_{p} regardless of the T_{b} value. For a molecule with F_{p} lower than 32.14 °C, if its T_{b} falls within the smoothing range (between 34.02 °C and 41.58 °C), an interpolation technique is required to determine its I_{FL} score. First, both the T_{b} lower and upper bound scores are identified from Fig. 8 based on the F_{p} value of the molecule. The lower and upper bound scores now represent the I_{FL} scores at T_{b} of 34.02 °C and 41.58 °C, respectively. Based on the estimated T_{b} of the molecule calculated by the GCM model in Table 8, the I_{FL} score for the molecule can be determined through interpolation between the lower and the upper bound scores. For molecules with T_{b} below 34.02 °C or 41.58 °C, the scores are directly determined from the curves in Fig. 8. The overall smoothened I_{FL} scores are summarised in Table 9. The other subindexes are also smoothened using the same technique, which are shown in the Appendix. I_{MS} is the only subindex that is not smoothened, since it measures the material state of the molecule without the need to divide the properties into subranges. The I_{MS} scores for gas, liquid and solid are 1, 2 and 3, respectively.
Table 9 Smoothened I_{FL} scores
F
_{p} (°C) 
I
_{FL,low} (when T_{b} ≤ 34.02 °C) 
I
_{FL,up} (when T_{b} ≥ 41.58 °C) 
F
_{p} ≥ 102.74 °C 
1 
1 
84.06 °C ≤ F_{p} ≤ 102.74 °C 


47.14 °C ≤ F_{p} ≤ 84.06 °C 
2 
2 
32.14 °C ≤ F_{p} ≤ 47.14 °C 


28.46 °C ≤ F_{p} ≤ 32.14 °C 


13.46 °C ≤ F_{p} ≤ 28.46 °C 

3 
F
_{p} ≤ 13.46 °C 
4 
3 
Based on Table 9, the final I_{FL} score can be determined by eqn (32):

 (32) 
5.5 Molecular group selection and structural constraints
In this section, the selection of firstorder molecular building blocks is based on the molecular structure of the conventionally used solvent to extract carotenoids. The chosen molecular groups include CH_{3}, CH_{2}, CH, C, OH, CH_{3}CO, CH_{2}CO, CH_{3}O, CH_{2}O, CHO, CH_{3}COO, CH_{2}COO, CH_{2} (cyclic), CH (cyclic), C (cyclic) and O (cyclic). To ensure that feasible molecules are synthesised, the structural constraints listed in section 4.7 are introduced into the model. The maximum number of molecular groups used to generate the molecules is set at 10. In addition, a constraint is also considered on molecular groups containing O: 
n_{OH} + n_{CH3CO} + n_{CH2CO} + n_{CH3O} + n_{CH2O} + n_{CHO} + n_{CH3COO} + n_{CH2COO} + n_{O(cyc)} ≤ 2  (33) 
where n_{OH}, n_{CH3CO}, n_{CH2CO}, n_{CH3CO}, n_{CH2O}, n_{CHO}, n_{CH3COO}, n_{CH2COO} and n_{O(cyc)} represent the number of occurrences of OH, CH_{3}CO, CH_{2}CO, CH_{3}O, CH_{2}O, CHO, CH_{3}COO, CH_{2}COO and O (cyclic) present in the molecule, respectively.
5.6 Optimisation model formulation
In the final stage, the four design objective properties are then converted into their respective property operator, Ω_{p}, as shown in Table 10. Ω_{p} is equivalent to the simple function f(p) for each target property p, which is exactly the lefthand side of eqn (1). The purpose of altering the form is to reduce the nonlinearity equations in the model. The next step is to optimise the property operators of the four objective properties, which are Ω_{Tb}, Ω_{Hv}, Ω_{Ra} and Ω_{ISHI,w}. Since all four property operators are to be minimised, the linear membership functions given by eqn (27) and (28) are applied. The lower and upper bounds for the property operators must be identified by optimising each property operator one at a time. The boundary values are summarised in Table 10.
Table 10 Property operators and their lower and upper bounds
Property p 
Ω_{p}

Lower bound 
Upper bound 
T
_{b}

exp(T_{b}/T_{b0}) 
3.0238 
7.3889 
H
_{v}

H
_{v} − H_{v0} 
11.6891 
36.8800 
R
_{a}

R
_{a}

4.0542 
15.8991 
I
_{SHI,w}

I
_{SHI,w}

1.7463 
2.6553 
Then, λ_{Tb}, λ_{Hv}, λ_{Ra} and λ_{I} are introduced into T_{b}, H_{v}, R_{a} and I_{SHI,w}, respectively, for each of their linear membership functions as shown by eqn (34) to (37). The overall objective of the model is to maximise λ subjected to constraints (29) and (30). To generate multiple solutions, the integer cuts method is applied, whereby additional constraints are added into the model to synthesise alternate molecular structures.^{33}

 (34) 

 (35) 

 (36) 

 (37) 
5.7 Results and discussion
The top six solvents generated by the proposed optimisation model are shown in Fig. 9. The solvents can be categorised into esters (A1 and A3), ketones (A4 and A6) and monocyclic compounds (A2 and A5). Their estimated properties are shown in Table 11, while the subindex values are summarised in Table 12. Solvent A1 is the best solution as it has a low I_{SHI,w} and moderate T_{b} and R_{a}. When comparing the four objective properties individually, solvent A2 performs the best in terms of lowest H_{v} and R_{a}; while solvent A3 offers the lowest I_{SHI,w} and T_{b}. In their work, YaraVarón et al.^{40} showed that ethyl acetate (solvent A3 in this case study) offers a similar carotenoid extraction yield to that of hexane. Therefore, solvents A1, A2 and A5 with lower R_{a} than solvent A3 should exhibit better carotenoid extraction than hexane. Even though solvent A6 does not perform the best with regard to the four objectives, it still offers the highest F_{p}.

 Fig. 9 The generated solvents with their molecular structures.  
Table 11 The six generated solvents with their properties
Solvent 
λ

I
_{SHI,w}

R
_{a}

H
_{v} (kJ mol^{−1}) 
T
_{b} (°C) 
F
_{p} (°C) 
LC_{50} (mg l^{−1}) 
A1 
0.697 
1.957 
7.432 
34.32 
86.2 
9.6 
114.6 
A2 
0.687 
2.031 
4.147 
29.08 
80.1 
−11.9 
176.5 
A3 
0.687 
1.946 
7.761 
32.89 
68.7 
4.5 
208.0 
A4 
0.617 
2.024 
8.593 
30.88 
88.6 
6.9 
853.1 
A5 
0.612 
2.099 
6.804 
31.56 
89.3 
−2.6 
875.7 
A6 
0.597 
2.049 
8.208 
32.07 
109.5 
14.0 
521.9 
Table 12 The six generated solvents with their subindex scores
Solvent 
I
_{FL}

I
_{EX}

I_{η}

I
_{MS}

I
_{V}

I
_{EL}

I
_{AH}

I
_{SHI}

A1 
3 
1 
1 
2 
1 
1.09 
0.81 
9.90 
A2 
3 
1 
1 
2 
1 
0.86 
1.54 
10.41 
A3 
3 
1 
1 
2 
1 
1.02 
0.82 
9.83 
A4 
3 
1 
1 
2 
1 
1.34 
1.21 
10.54 
A5 
3 
1 
1 
2 
1 
1.58 
1.56 
11.14 
A6 
3 
1 
1 
2 
1 
1.48 
1.23 
10.71 
In terms of their individual subindex scores, the difference between their safety and health performance is mainly contributed by I_{EL} and I_{AH}. These two subindexes measured the toxicity of chemicals, where I_{EL} examines the chronic toxicity through inhalation, while I_{AH} assesses acute toxicity through ingestion. I_{SHI} in Table 12 is calculated by summing up the seven subindex values of each solvent. Among the seven subindex values in Table 12, the one with the largest penalty is I_{FL}, which indicates that the solvents are easily flammable. In the event where a molecule with lower flammability is required, additional integer cuts are introduced to the optimisation model. The best solvent with a lower I_{FL} score (less than or equal to 2) is 1ethoxy2butanone with a λ value of 0.463, where its I_{SHI,w}, R_{a}, H_{v}, T_{b} and F_{p} are equivalent to 2.144, 6.599, 37.15 kJ mol^{−1}, 137.7 °C and 53.0 °C, respectively. Its R_{a} is lower than those of the top six solvents in Table 11, but it has a relatively high T_{b} and H_{v}, which means that more energy is needed to vaporise the solvent to recover carotenoids. Even though its I_{FL} has an improved value of 2, its I_{EL} score has worsened to 3, indicating that it has higher toxicity than the top six solvents. This also increases its overall inherent hazard level as demonstrated by its high I_{SHI,w}.
The introduction of index smoothing and prioritisation in this paper enhances the representation of molecular safety and health performance through the calculation of the adjusted index score, I_{SHI,w}. The improvement in safety and health measurement can be demonstrated by comparing the results in Table 12 with the calculation of I_{SHI,i} of the six solvents by omitting both index smoothing and prioritisation. In other words, the initial index scoring schemes are used to calculate I_{SHI,i} by summing up the subindex scores without allocating weight factors to them. The calculated I_{SHI,i} and the individual subindex values are shown in Table 13. In Table 13, the I_{FL}, I_{EX}, I_{η}, I_{MS} and I_{V} scores of the six solvents remain unchanged, as their property values do not fall near the property boundaries of the five subindexes. As for I_{EL} and I_{AH}, the results in Table 12 show that the property values (PEL and LD_{50}) of the solvents fall near the property boundaries or the edges of their respective property subranges, so their I_{EL} and I_{AH} scores are not discrete values. In Table 13, all solvents but A5 have a similar I_{EL} score of one, while the I_{AH} scores of all solvents but A2 and A5 share the same value of one. Solvents A1, A3, A4 and A6 exhibit the lowest I_{SHI,i} score of 10, while A2 and A5 possess higher hazard levels with total index scores of 11 and 12, respectively. As four solvents share the same I_{SHI,i}, it is difficult to distinguish their inherent hazard level. This limitation is addressed with the proposed index smoothing and prioritisation in this paper, where the hazard level of each solvent can be calculated by I_{SHI,w} as shown in Table 11.
Table 13 The six generated solvents (without index smoothing and prioritisation) with their subindex scores
Solvent 
I
_{FL}

I
_{EX}

I_{η}

I
_{MS}

I
_{V}

I
_{EL}

I
_{AH}

I
_{SHI,i}

A1 
3 
1 
1 
2 
1 
1 
1 
10 
A2 
3 
1 
1 
2 
1 
1 
2 
11 
A3 
3 
1 
1 
2 
1 
1 
1 
10 
A4 
3 
1 
1 
2 
1 
1 
1 
10 
A5 
3 
1 
1 
2 
1 
2 
2 
12 
A6 
3 
1 
1 
2 
1 
1 
1 
10 
From the results, the smoothing of scores at property boundary offers a better comparison of hazard level among the solvents. For instance, the calculated I_{SHI,i} values in Table 13 show that solvent A6 is less hazardous than A2. However, the calculated I_{SHI,w} values in Table 11 demonstrate otherwise. This is because the PEL and LD_{50} (used to measure I_{EL} and I_{AH}, respectively) of solvent A6 are near the upper boundary (higher degree of hazard) of their respective subranges; thus both its I_{EL} and I_{AH} scores as shown in Table 12 are penalised more than those in Table 13. The same happens to solvent A2 where its I_{AH} score in Table 12 is penalised more than that in Table 13. Meanwhile, the PEL of solvent A2 falls near the lower boundary (lower degree of hazard) of its I_{EL} subrange; thus its I_{EL} score as displayed in Table 12 is penalised less than that in Table 13. This results in the total I_{SHI,w} of solvent A6 being greater than that of A2. The smoothing of scores also addresses the limitation of comparing two solvents with property values near one another but are separated by a sharp property boundary. For instance, the I_{EL} scores of solvents A5 and A6 without index smoothing are two and one, respectively, in which their score difference is one. Their PEL values are close to one another but they are separated by a shape scoring edge, which results in their I_{EL} scores being different. After the smoothing of subindexes, their I_{EL} score difference is reduced to 0.1, as determined from Table 12. Hence, this revision of the subindex scorings allows the comparison of molecular hazard among different alternatives to be done more accurately.
Meanwhile, the introduction of index prioritisation by assigning different weights to subindexes takes into account the severity of individual subindex scores when quantifying the overall hazard level. To illustrate this improvement, the values of I_{SHI} as shown in Table 12 are compared with their corresponding I_{SHI,w} values in Table 11. It is noted that a higher I_{SHI} value does not guarantee a larger I_{SHI,w} value. Even though solvent A2 has a lower I_{SHI} than solvent A4, it does not display the same relationship for their corresponding I_{SHI,w} values. As the six solvents only have distinct I_{EL} and I_{AH} values, this discussion is focused only on these two subindexes. For solvent A2, its I_{AH} value is higher than its I_{EL} score, while it is the opposite for solvent A4. Since solvent A2 has a more severe I_{AH} score (1.54) than the I_{EL} score of solvent A4 (1.34), solvent A2 is penalised more with the application of eqn (10) to determine I_{SHI,w}. Thus, the results prove that the adjusted index score takes into account the severity of each subindex to help distinguish the safety and health attributes of the molecules.
In conclusion, the revision of the index scoring scheme improves the measurement and comparison of the inherent hazard level among molecules. First, the smoothing of scores at property boundaries in the subindexes can enhance the representation of molecular hazard level near the edges of the subrange. In a specific subrange, a molecular property that is near the upper edge with a higher hazard level will be penalised more, and vice versa. As a result, when this CAMD problem aims to minimise the total index score, the modification on the index scoring scheme allows the CAMD programming to opt for a solution that is closer to the lower edge (lower hazard) within a particular subrange, as opposed to a solution with property at the upper edge or middle region of the subrange. In another case, if there are two neighbouring solutions that may demonstrate a somewhat similar hazard level but are separated by the sharp scoring edge, the smoothing of scores at the boundary allows their score difference to be reduced. As the hazard levels of these two solutions are closer or slightly similar to each other, this allows the other design objectives to have a higher influence on the selection of the final optimal molecule. Meanwhile, the second improvement with index prioritisation allows the CAMD programming to consider the severity of each subindex score when minimising I_{SHI,w}. As a subindex with a larger score is penalised to a greater extent, the CAMD programming will strive to minimise the severity of each subindex score in order to generate a more conservative solution with respect to all safety and health subindexes. In general, the two discussed improvements allow a better reflection of molecular hazard level in order to synthesise an optimal molecule with enhanced quality in terms of property functionalities and safety and health performance.
Conclusions
In this work, inherent safety and health subindexes are integrated into the CAMD framework to measure the safety and health performance of the generated molecules. The main highlight is the development of an enhanced method to quantify the overall safety and health attributes of the molecules by applying an OWA operator to conditionally assign weight factors to the subindexes based on severity rank. Higher severity subindex scores are given higher weights because these reflect the more critical or problematic features of the molecules. The AHP approach is employed to assist in determining the weight factors given to subindexes with different severity levels. In addition, a smoothing technique is developed to reduce the granularity and potential distortion caused by having discrete, stepwise hazard ranges. The smoothening region is determined by using a safety/health margin of 10%, and a linear transition slope is introduced for the score allocation. A case study has been considered to replace hexane for the extraction of carotenoids found in PPF. The results showed that the top six generated solvents offer high performance with respect to product functionality and the safety and health aspects. The adjusted index score I_{SHI,w} as calculated by the OWA operator takes into account the severity of all subindex scores to enhance the representation of overall hazard level as demonstrated by the solvents. For future work, the presented CAMD framework to design an optimal molecule with reduced risk can be extended to an integrated process–product design problem. Besides optimising the current design objectives (product functionalities and the molecular safety and health level), this problem also aims to optimise the process conditions and the process safety and health level. Instead of only applying the chemicalrelated subindexes, the inherent safety and health level of the process route can be examined by including the processrelated subindexes. The proposed integrated process–product design framework can serve as a screening tool to design the optimal chemical and process operating conditions that demonstrate high product and process performance and favourable safety and health attributes.
Conflicts of interest
There are no conflicts of interest to declare.
Appendix: Smoothened inherent safety and health subindexes
Table A.1 Smoothened explosiveness (I_{EX}) subindex
Factor 
Score information 
Penalty 
Explosiveness, I_{EX}S = (UEL–LEL) vol% 
0 ≤ S ≤ 13 
1 
13 ≤ S ≤ 27 
(S − 13)/14 + 1 
27 ≤ S ≤ 38 
2 
38 ≤ S ≤ 52 
(S − 38)/14 + 2 
52 ≤ S ≤ 63 
3 
63 ≤ S ≤ 77 
(S − 63)/14 + 3 
77 ≤ S ≤ 100 
4 