Ali
Bagheri
*a and
Christian
Cremona
b
aFaculty of Science Engineering and Technology, Swinburne University of Technology, Melbourne, Australia. E-mail: a-bagheri@hotmail.com
bDirector, Technical Division, Bouygues TP, Guyancourt, France
First published on 3rd June 2020
This work evaluates the application of machine learning in the formulation of construction materials. The aim is to introduce a feasible approach to classify geopolymer samples made via additive manufacturing technique. Using an experimentally acquired conversion factor 2.95, this study employs popular recursive-partitioning functions including rpart and ctree to build separate classification models being compared at the end. According to the findings, these functions demonstrate great ability to create classification models for 3D-printed geopolymers with up to 100% positive predictive value in ctree function and up to 81% positive predictive value in the rpart function. However, rpart function with 70% cumulative accuracy expressed slightly better performance compared to 63% for that of ctree function. Locating the content of slag and the ratio of boron ions respectively in the roots of ctree and rpart decision trees implies the significance of them in the compressive strength of samples.
Nonetheless, a huge amount of ordinary Portland cement has been consumed in these projects, which results in high autogenous shrinkage, the heat of hydration, and cost. Moreover, cement manufacturing is known as a high greenhouse gas emission industry. The associated CO2 emission in addition to embodied energy consumption deteriorates sustainability performance of 3D printed concrete structures. Geopolymer has been recognised as a promising construction material for 3D printing process due to its fast-setting, cost-effective and eco-friendly nature.5,6 Apart from these, the fire resistance7 and durability8,9 of geopolymers make them superior to the conventional cement composites.
Geopolymers are normally formed by activating of aluminosilicate resources in a caustic environment.10 Silicate and hydroxide compounds of alkali metals, such as sodium silicate and sodium hydroxide, are commonly utilised to activate silicon and aluminium species.11 Studying geopolymers has been performed based on a large variety of aluminosilicate resources and alkaline solutions used for preparing geopolymer.12–18 This diversity in two main parts of geopolymers has formed one of their main interesting facets. Iron-making slag and fly ash, as the aluminosilicate resources, as well as the combination of sodium silicate and sodium hydroxide, as the activator, are the most popular constituents.
Despite the benefits of geopolymers, using silicate compounds can be disadvantageous not only because of environmental problems but also for its corrosive character. Hence, changing the composition of alkaline activator of geopolymers in order to introduce new binders has been the topic of many types of research.19–23 Many efforts have been made to substitute the silicon and aluminium atoms of the geopolymer matrix with other elements. Boron-based geopolymer is one of them that was introduced in previous studies.24
The huge volume of data produced in many engineering disciplines, especially civil and construction engineering, can be employed in order to learn patterns and classifications. As learning from data is a complex procedure, it is necessary to use computational methods. The use of machine to learn from data that is produced in materials design and its significance can be found in ref. 34–38 Supervised machine learning is a group of modern computational approaches that can be considered for data classification.25–27 The conditional inference trees (ctree) and recursive partitioning (rpart) methods are supervised machine learning functions that are frequently utilised for data mining. The aim of this study is to build a model that predicts a target variable according to certain input variables.28 Recursive-partitioning (RP) algorithm approximate a regression correlation through binary recursive partitioning in a conditional inference framework. These algorithms work in the following stages. Step (1) examining the global null hypothesis of independence between any of the input data and the targets. It stops if the hypothesis could not be rejected. Otherwise, it selects the input parameter with the highest contribution to the target variable. Their association is calculated via a p-value relating to an exam for the partial null hypothesis of a single input data and the corresponding response. Step (2) implementing a binary sub-division in the chosen input parameter. Step (3) recursively repeating previous steps. Step (4) creating a visual flowchart mapping the entire classification process.
Given an innumerable number of independent variables, the prediction of the compressive strength of printed geopolymer samples without the use of a machine will generate a high level of error. For instance, one can predict the strength of samples that are classified into four categories with 75% error. However, the use of machine learning would reduce this error significantly as can be seen further in this work. The most efficient way is to learn from the existing data through machine learning. One approach is to take the printing variables constant and investigate the effective factors of the mix. Another way is to take the mix constantly and change the printing parameters. The current study has focused on the former approach. Among the mentioned effective parameters, the content of the fly ash, the content of the ground granulated blast furnace slag (GGBFS), as well as the ratio of boron ions, silicon ions, and sodium ions in the alkaline solution have the most significant impact on the compressive strength.19
%F | %S | B/AA | Si/AA | Na/AA | |
---|---|---|---|---|---|
Minimum | 0.00 | 0.00 | 0.00 | 0 | 0.273 |
Maximum | 100 | 100 | 0.310 | 0.500 | 1.00 |
Mean | 66.9 | 33.1 | 0.088 | 0.224 | 0.521 |
STD | 44.9 | 44.5 | 0.096 | 0.176 | 0.175 |
A total number of 114 targets were measured, and the average conversion factor 1.95 was applied to the dataset. Fig. 1 is the statistical presentation of dependent variable (compressive strength). It illustrates the one-dimensional data distribution graph of the target values. The box plot shown in this figure reveals the distribution of the targets into their quartiles, highlighting the mean value and the outliers. The whiskers, which are the lines extending vertically out of boxes, indicate dispersion out of the upper and lower quartiles, and any point outside the whiskers is considered an outlier.
To perceive the relationship of the predictors with the response variable, an initial exploratory data visualization is performed as illustrated by the plot matrix of Fig. 2. It includes the scatter plots of individual parameter combination, in addition to their density plots and the correlations coefficient between pair of variables. The density plots visualise the distribution of response variable over a continuous interval. These charts demonstrate the variation of a histogram that utilises centre smoothing to chart values, providing smoother distribution by smoothing out the noise. Where values of the variables are concentrated over the interval are shown by the peaks of density plots.
According to Fig. 2, there has not been noticed any strong correlation between any of the variables and the response values. Lack of a reliable correlation coefficient between dependent and independent variables results in an inaccurate linear regression model. However, it would be conceivable to transform a regression problem into a classification problem. In other words, the values of the compressive strength to be predicted might be transformed into discrete brackets. The target data visualised in Fig. 1 are then classified into four classes of A, B, C and D as described in Table 2. The “cut” function, which is used to break up a continuous variable, divides the target values into four classes.
Class | A | B | C | D |
---|---|---|---|---|
Compressive strength (MPa) | <5 | ≥5 & <10 | ≥10 & <15 | ≥15 |
Number of data in each class | 34 | 24 | 22 | 34 |
The present study relies on the R package, which is freely available through their website (r-project.org) to researchers for computer programming, to create classification tree of the compressive strength of 3D-printed boroaluminosilicate geopolymers.
Hj0: D(Y|Xj) = D(Y) |
The interpretation of the ctree DT allows prediction of optimal formulation of geopolymer with high strength. Geopolymer formulation is defined by the DT rules along with corresponding population and accuracy values. Table 3 provides this formulation based on the rules of created network.
Strength category | Population of the bin (%) | Accuracy (%) | Rule |
---|---|---|---|
A [<5] MPa | 32 | 70 | %F > 70 & B/AA > 0.048 & Si/AA ≤ 0.45 |
B [5–10] MPa | 8 | 100 | 9 < %F ≤ 70 & B/AA > 0.048 & Si/AA ≤ 0.45 & 30 < %S ≤ 0.91 |
C [10–15] MPa | 8 | 56 | Si/AA > 0.45 & %S ≤ 0.91 |
C [10–15] MPa | 26 | 47 | B/AA ≤ 0.048 & Si/AA ≤ 0.45 & %S ≤ 0.91 |
C [10–15] MPa | 15 | 56 | Na/AA > 0.524 & %S > 0.91 |
D [>15] MPa | 11 | 59 | Na/AA ≤ 0.524 & %S > 0.91 |
At the first glance, this formulation confirms the significance of the contribution of slag in the geopolymer mix design. The slag-dominated mix designs (samples with high content of slag) result in higher compressive strength. This is confirmed by the Fig. 3, where sub-node 9 with 26% of the population will end at categories of C and D. Among them, samples with higher ratios of sodium in the alkali-activator have lower compressive strength. It is observed from terminal node 10 with dominant class D to node 11 with dominant class C while the content of sodium ions increased from contents lower than 0.524 to higher than 0.524. This phenomenon has previously been observed in boron-based geopolymers if ref. 19 and 22 It is reported that the increase in the content of boron can raise the contribution of sodium ions and deteriorate the compressive strength.19 On the other hand, however, the lower the slag content, the less the compressive strength. It is observed by comparing the sub-nodes 2 and 9. Furthermore, an increase in the ratio of silicate to above 0.45 increases the compressive strength (less than 5 MPa for A to 10–15 MPa for C). The contribution of silicate in the mix design is vital for strength development as not only is silicate necessary for the initiation of polycondensation reactions but also silicate increases crosslinking phenomenon in geopolymerisation. Moreover, declining boron ions ratio in the alkaline solution from lower than 0.048 (terminal node 4) to the higher contents (terminal nodes 6 and 7) regresses the compressive strength from class C to the classes A and B.
More details about the performance of the predicted model can be obtained by looking at the confusion matrices that are demonstrated in Tables 4 and 5. The true positive rates and false negative rates of the prediction made by ctree function are stated in Table 4. According to Table 4, the maximum true positive rate of 83% belongs to samples with a compressive strength between 10 MPa and 15 MPa. It implies that a large portion of these samples, which constitute almost one-third of the population, are predicted correctly. However, the maximum false negative rate of 64% relates to the samples with a compressive strength between 5 MPa and 10 MPa that comprise one-fifth of the population. Table 5 shows the performance of the prediction in the opposite way to that of Table 4. Given the number of predictions in each category and the corresponding positive predictive values ranging from 51% to 100%, the total true prediction value of this approach is 63%.
Actual values (%) | |||||
---|---|---|---|---|---|
A | B | C | D | ||
Predicted values (%) | A | 71 | 28 | 11 | 0 |
B | 0 | 36 | 0 | 0 | |
C | 20 | 36 | 83 | 63 | |
D | 9 | 0 | 6 | 37 | |
True positive rate (%) | 71 | 36 | 83 | 37 | |
False negative rate (%) | 29 | 64 | 17 | 63 |
Actual values (%) | Positive predictive value (%) | False discovery rate (%) | |||||
---|---|---|---|---|---|---|---|
A | B | C | D | ||||
Predicted values (%) | A | 70 | 19 | 11 | 0 | 70 | 30 |
B | 0 | 100 | 0 | 0 | 100 | 0 | |
C | 12 | 16 | 51 | 21 | 51 | 49 | |
D | 25 | 0 | 17 | 58 | 58 | 42 |
The prediction of optimal formulation of geopolymer with high strength can be achieved by the interpretation of the rpart DT. DT rules along with corresponding population and accuracy values define geopolymer formulation. Table 6 provides this formulation based on the rules of created network by the rpart function.
Strength category | Population of the bin (%) | Probability (%) | Rule |
---|---|---|---|
A [≤5] MPa | 32 | 69 | %F ≤ 81 & B/AA ≥ 0.063 |
B [5–10] MPa | 8 | 100 | 5 < %F ≤ 81 & B/AA ≥ 0.63 |
B [5–10] MPa | 16 | 44 | 0.024 > B/AA & %S < 30 & Si/AA < 0.48 |
C [10–15] MPa | 8 | 89 | 0.024 < B/AA < 0.063 & %S < 30 & Si/AA < 0.48 |
C [10–15] MPa | 16 | 78 | B/AA < 0.063 & %S < 30 & Si/AA < 0.48 |
D [>15] MPa | 11 | 67 | 5 > %F & B/AA ≥ 0.63 |
D [>15] MPa | 11 | 58 | Si/AA ≥ 0.48 & B/AA < 0.063 |
It is notable that unlike the ctree function, the rpart function has not used the ratio of sodium ions to create the prediction model. The root node has used the ratio of boron to split the samples into two main sets. One set is fly ash-based samples with high amount of boron, which is graded in A and B classes with compressive strength lower than 5 MPa and between 5 and 10 MPa respectively. The second set include samples with very low ratio of boron (lower than 0.024), samples with high content of slag, as well as specimens with high ratio of silicate. The efficiency of rpart model can be assessed from confusion matrices that are illustrated in Table 7 and 8. The true positive rates and false negative rates of the prediction are stated in Table 7 according which the maximum true positive rate of 79% was obtained for samples with a compressive strength greater than 15 MPa. On the other side, 37% of the 3D-printed geopolymer samples with a compressive strength between 10 MPa and 15 MPa are predicted incorrectly. Given the number of observations in each class of compressive strength and their true positive rates, 70% of the observations are predicted in the correct category of compressive strength. Table 8 reflects the positive predictive values for correct predictions and the false discovery rates for incorrect predictions. From almost one-third of the whole predictions falling within class A, 70% was correctly predicted. The highest positive predictive value of 81%, however, relates to class C with a compressive strength between 10 MPa and 15 MPa. This category composed 24% of the whole predictions. Given the number of predictions in each category and the corresponding positive predictive values ranging from 63% to 81%, the total true prediction value of this approach is 70%.
Actual values (%) | |||||
---|---|---|---|---|---|
A | B | C | D | ||
Predicted values (%) | A | 71 | 28 | 11 | 0 |
B | 20 | 68 | 9 | 0 | |
C | 0 | 4 | 63 | 21 | |
D | 9 | 0 | 17 | 79 | |
True positive rate (%) | 71 | 68 | 63 | 79 | |
False negative rate (%) | 29 | 32 | 37 | 21 |
Actual values (%) | Positive predictive value (%) | False discovery rate (%) | |||||
---|---|---|---|---|---|---|---|
A | B | C | D | ||||
Predicted values (%) | A | 70 | 19 | 11 | 0 | 70 | 30 |
B | 26 | 63 | 11 | 0 | 63 | 37 | |
C | 0 | 4 | 81 | 15 | 81 | 19 | |
D | 13 | 0 | 25 | 63 | 63 | 37 |
No. | Formulation F–S–B–Si–Naa | Compressive strength (MPa) | Predicted strength by ctree | Predicted strength by rpart |
---|---|---|---|---|
a F = %F; S = %S; B = B/AA; Si = Si/AA; Na = Na/AA. | ||||
1 | 100–0–0.21–0.313–0.428 | A (4) | A | A |
2 | 100–0–0.259–0.26–0.418 | A (2) | A | A |
3 | 100–0–0.22–0.33–0.45 | A (2) | A | A |
4 | 100–0–0.23–0.324–0.446 | A (3) | C | B |
5 | 100–0–0.11–0.34–0.55 | B (9) | C | B |
6 | 100–0–0.19–0.39–0.42 | B (5) | C | B |
7 | 100–0–0.31–0.11–0.58 | A (1) | C | B |
8 | 100–0–0.31–0.08–0.61 | A (1) | C | B |
9 | 100–0–0.235–0.237–0.528 | A (3) | C | B |
10 | 100–0–0.102–0.334–0.564 | B (7) | C | B |
11 | 100–0–0.232–0.298–0.47 | A (3) | A | A |
12 | 100–0–0.255–0.344–0.401 | A (4) | A | A |
13 | 100–0–0.219–0.306–0.475 | A (4) | A | A |
14 | 100–0–0.198–0.295–0.507 | A (2) | A | A |
15 | 100–0–0.187–0.301–0.512 | A (4) | A | A |
16 | 30–70–0.044–0.396–0.56 | D (23) | D | D |
17 | 0–100–0–0.417–0.583 | D (34) | D | D |
18 | 50–50–0.086–0.382–0.532 | D (19) | D | D |
19 | 50–50–0.098–0.318–0.584 | D (17) | D | C |
20 | 30–70–0.038–0.389–0.573 | D (24) | D | C |
Total true prediction | 13 | 14 |
The accuracy of each prediction can be assessed by dividing the number of true predictions to the total number of examinations. Accordingly, the accuracy of ctree and rpart function are 65% and 70% respectively, which are in excellent agreement with results acquired before.
Obtaining false discovery rate is more related to the complexity of correlation between compressive strength and covariates. This study can be an excellent starting point for developing a guide/standard that maps the 3D-printed boron-based geopolymer samples into categories based on compressive strength.
This journal is © The Royal Society of Chemistry 2020 |