Zhe Li,
Shunhao Huang and
Juan Chen*
College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, PR China. E-mail: jchen@mail.buct.edu.cn
First published on 24th October 2019
Chlorine is a common natural water disinfectant, but it reacts with ammonia's nitrogen to form chloramines, which affects the accuracy of free chlorine measurement. In this case, total chlorine can be used as an indicator to evaluate the content of the effective disinfectant. In this article, a novel method to detect total chlorine using an electrode array in water has been proposed. We made the total chlorine sensor and captured the cyclic voltammetry curve of the electrode at different concentrations of chlorine ammonia. Principal component analysis and a peak sampling method were used to extract cyclic voltammetry curves, and the total chlorine prediction model was established by support the vector machine and extreme learning machine. The results show that the best predicting power was achieved by support vector regression with principal component analysis (R2 = 0.9689). This study provides a simple method for determining total chlorine under certain conditions and likely can be adapted to monitor disinfection and water treatment processes as well.
Free chlorine contains hypochlorite and hypochlorite ions. Hypochlorite is the main effective component in free chlorine, which reacts with various amino acids and nucleic acids in the virus.3,13 However, when natural water contained ammonia, hypochlorite will react with it and induce chloramine formation.14 Although chloramine also has antiseptic effects, it cannot be detected as free chlorine. One way to solve this problem is to measure total chlorine. At present, there are several methods to measure total chlorine, such as colorimetric method,15 gas chromatography,16 ion chromatography,17 inductively coupled plasma mass spectrometry,18,19 inductively coupled plasma emission spectrometry,20 electrode methods and etc. The traditional electrode methods of total chlorine are based on the iodine content method.21 Iodine and acid needed to add in the test process, which makes the measurement process not simple enough. Although some companies have developed other types of total chlorine electrodes, there are still few reports about the electrode method of total chlorine detection.
Total chlorine is the sum of free chlorine and combined chlorine. In other words, the total chlorine is composed of hypochlorite, hypochlorite ion, monochloramine, dichloramine and trichloramine. In previous studies, the researchers found that the concentration of monochloramine, dichloramine, trichloramine and hypochlorite ions has some relationship with the current value at a corresponding sweep potential.22,23 However, the relationship between these characteristics and total chlorine is not clearly given. Besides this, hypochlorite, hypochlorite ion, monochloramine, dichloramine and trichloramine are susceptible to pH changes, which makes it more complicated to calculate the total chlorine from the current values. The soft-sensing technique, which can predict hard-to-measure variables by measuring easy-to-measure variables, may suitable for solving the problem of measuring total chlorine by electrode method.
In this study, we made the total chlorine electrode arrays, designed the experiment and used the experimental data to establish the prediction model to predict the total chlorine concentration in the water. The cyclic voltammetry curves' features of the electrode were extracted by principal component analysis (PCA) and peak sampling (PS) method. Kernel extreme learning machine (KELM) and support vector regression (SVR) were used to establish total chlorine prediction models, and the prediction abilities of different models were compared. To the best of our knowledge, this measurement method of total chlorine has never reported before.
As shown in Fig. 1, the electrode arrays contain five electrodes. The working electrode was a platinum cylinder, 1 mm in diameter and 5 mm in height. The counter electrode was a platinum ring that concentric with the working electrode, 20 mm in diameter and 2 mm in width. Both of the working electrode and the counter electrode were integrated together on one end of a polytetrafluoroethylene rod. It is important to note that during cyclic voltammetry, the solution in the flow cell needs to stop flowing to ensure that the diffusion process on the electrode surface is stable. The reference electrode was an Ag/AgCl (3 M KCl) electrode. The other two electrodes are pH (Hach pHC101, connect to Hach HQ440D) and temperature (PT100, 4-wire, connect to Advantest TR2114 digital thermometer) compensated electrodes. An overflow pipe was designed in the flow cell to suppress the potential impact of the liquid level on the measurement.
Cl2 + H2O ⇌ HClO + Cl− + H+ | (1) |
NaClO + H2O ⇌ HClO + Na+ + OH− | (2) |
HClO ⇌ ClO− + H+ (pKHClO,25°C = 7.54) | (3) |
As shown by eqn (3), the concentration of hypochlorite ions affect by the pH value of the solution. When the pH is below 5.5, the main component of the solution is hypochlorite. When the pH is above 9.5, hypochlorite ions are the main component of the solution.
6ClO− + 3H2O ⇌ 2ClO−3 + 4Cl− + 6H+ + O2 + 6e | (4) |
At a specific potential (about 1.1 V), the electrode reaction can be written as formula (4), and there is a fixed relationship between the reaction current and the concentration of hypochlorite ion.25,26 Therefore, it is feasible to estimate the free chlorine by measuring temperature, pH (above 5.5) and concentration of hypochlorite ion of the solution.
If ammonia is present in the solution, monochloramine, dichloramine and trichloramine are formed in sequence and their reactions are simplified described as follows.
NH3 + ClO− → NH2Cl + OH− | (5) |
NH4+ + 2HClO → NHCl2 + 2H2O + H+ | (6) |
NH4+ + 3HClO → NCl3 + 3H2O + H+ | (7) |
When the pH of solution changes, monochloramine, dichloramine and trichloramine will convert to each other and the reaction equation can be written as follows.
2NH2Cl + H+ ⇌ NHCl2 + NH4+ | (8) |
3NHCl2 + H+ ⇌ 2NCl3 + NH4+ | (9) |
On the surface of the electrode, electrochemical reduction of chlorine occurs in sequence within a certain voltage range and the current of those reactions can be observed in the cyclic voltammetry curve. Those reactions are described as below.
NCl3 + H+ + 2e− → NHCl2 + Cl− | (10) |
NHCl2 + H+ + 2e− → NH2Cl + Cl− | (11) |
NH2Cl + 2H+ + 2e− → NH4+ + Cl− | (12) |
Although the curve features are not obvious, this study still tries to estimate the total chlorine by processing and calculating the cyclic voltammetry curves. In addition, during the process of adding chlorine, excessive hypochlorite will react with monochloramine to form nitrogen, as shown in formula (13).
2NH2Cl + HClO → N2↑ + 3H+ + 3Cl− + H2O | (13) |
When chlorine/ammonia-nitrogen ratio (Cl2/N mass ratio) is higher than 7.6, breakpoint chlorination occurs.27 The combined chlorine decreases with the increase of free chlorine after the breakpoint, and finally the total chlorine is basically made up of free chlorine. Therefore, the chlorine/ammonia-nitrogen ratio is a necessary factor in the total chlorine measurement model.
It can be seen from eqn (5)–(9) and (11) that multiple reactions may occur simultaneously in the process of adding chlorine, and the reaction equilibrium is easily affected by changes in pH, even temperature and pressure. At the same time, the addition of a large amount of chlorine will produce the phenomenon of breaking point chlorination, resulting in the decomposition of chloramine. Therefore, pH, ammonium ion content and chlorine/ammonia-nitrogen ratio was selected as factors in this experiment. The experiment was carried out at a temperature of 20 °C without precise control, so the water temperature was recorded during the experiment and used as one of input of the prediction model. Due to the existence of the overflow pipe, the liquid level in the flow cell did not change dramatically, so the influence of pressure was not been considered in this paper.
The experiment divided into 65 groups, as shown in Table 1. For the first 50 experiments, every 10 groups had the same ammonium concentration and pH value. Firstly, a certain amount of sodium hypochlorite was added to the ammonium chloride solution to control the chlorine/ammonia-nitrogen ratio between 0 and 15. After that, the pH value was adjusted to the design value by adding hydrochloric acid or sodium hydroxide. When the pH value of the solution was stable, the cyclic voltammetry was performed, while the pH value and temperature value of the solution were recorded, and the total chlorine value of the solution was measured by DPD method. As a special case, ammonium chloride was not added in the last 15 groups. In this case, the total chlorine value of the solution is equal to the free chlorine value. In groups 50 to 55, the concentration of total chlorine increased gradually, while in groups 55 to 60 and 61 to 65, only the pH value of the solution adjusted. Similarly, cyclic voltammetry was performed in the last 15 experiments and total chlorine, pH, and temperature were recorded.
Group | Factors | Result | ||||
---|---|---|---|---|---|---|
pH | Temperature (°C) | Ammonia-nitrogen (mg L−1) | Ratio (chlorine/ammonia-nitrogen) | Total chlorine (mg L−1) | ||
1 | 9.46 | 20.8 | 10 | 0.00 | 0.0 | Fig. S1(1) |
2 | 9.47 | 21.2 | 10 | 0.97 | 4.0 | Fig. S1(2) |
3 | 9.52 | 21.6 | 10 | 2.14 | 8.8 | Fig. S1(3) |
4 | 9.50 | 21.7 | 10 | 2.58 | 10.6 | Fig. S1(4) |
5 | 9.53 | 21.9 | 10 | 3.79 | 15.6 | Fig. S1(5) |
6 | 9.48 | 22.0 | 10 | 4.62 | 19 | Fig. S1(6) |
7 | 9.51 | 22.0 | 10 | 5.56 | 16.8 | Fig. S1(7) |
8 | 9.49 | 22.0 | 10 | 12.75 | 21.2 | Fig. S1(8) |
9 | 9.49 | 22.0 | 10 | 13.48 | 24.2 | Fig. S1(9) |
10 | 9.49 | 22.0 | 10 | 14.60 | 28.8 | Fig. S1(10) |
11 | 8.50 | 19.9 | 20 | 0.00 | 0.0 | Fig. S1(11) |
12 | 8.52 | 20.2 | 20 | 0.34 | 2.8 | Fig. S1(12) |
13 | 8.49 | 20.9 | 20 | 1.20 | 9.9 | Fig. S1(13) |
14 | 8.51 | 21.5 | 20 | 2.48 | 20.4 | Fig. S1(14) |
15 | 8.54 | 21.7 | 20 | 3.21 | 26.4 | Fig. S1(15) |
16 | 8.53 | 21.9 | 20 | 3.77 | 31 | Fig. S1(16) |
17 | 8.48 | 21.9 | 20 | 6.84 | 12.5 | Fig. S1(17) |
18 | 8.48 | 22.1 | 20 | 7.23 | 6.0 | Fig. S1(18) |
19 | 8.52 | 22.2 | 20 | 9.12 | 12.5 | Fig. S1(19) |
20 | 8.50 | 22.1 | 20 | 10.27 | 22 | Fig. S1(20) |
21 | 7.45 | 20.0 | 30 | 0.00 | 0.0 | Fig. S1(21) |
22 | 7.44 | 20.3 | 30 | 0.87 | 10.8 | Fig. S1(22) |
23 | 7.51 | 20.7 | 30 | 1.78 | 22 | Fig. S1(23) |
24 | 7.50 | 21.0 | 30 | 2.43 | 30 | Fig. S1(24) |
25 | 7.46 | 21.2 | 30 | 3.48 | 43 | Fig. S1(25) |
26 | 7.49 | 21.2 | 30 | 5.94 | 41 | Fig. S1(26) |
27 | 7.45 | 21.0 | 30 | 6.26 | 33 | Fig. S1(27) |
28 | 7.44 | 20.7 | 30 | 7.56 | 1.0 | Fig. S1(28) |
29 | 7.51 | 20.6 | 30 | 8.38 | 9.6 | Fig. S1(29) |
30 | 7.51 | 20.7 | 30 | 9.58 | 24.5 | Fig. S1(30) |
31 | 6.50 | 17.5 | 40 | 0.00 | 0.0 | Fig. S1(31) |
31 | 6.5 | 18.0 | 40 | 1.03 | 17 | Fig. S1(32) |
33 | 6.53 | 18.5 | 40 | 1.94 | 32 | Fig. S1(33) |
34 | 6.54 | 18.8 | 40 | 3.10 | 51 | Fig. S1(34) |
35 | 6.50 | 19.2 | 40 | 3.83 | 63 | Fig. S1(35) |
36 | 6.51 | 19.6 | 40 | 4.25 | 70 | Fig. S1(36) |
37 | 6.50 | 19.5 | 40 | 5.72 | 62 | Fig. S1(37) |
38 | 6.50 | 19.6 | 40 | 5.96 | 54 | Fig. S1(38) |
39 | 6.55 | 19.4 | 40 | 7.39 | 6.8 | Fig. S1(39) |
40 | 6.53 | 19.9 | 40 | 8.94 | 22 | Fig. S1(40) |
41 | 5.50 | 18.9 | 50 | 0.00 | 0.0 | Fig. S1(41) |
42 | 5.46 | 19.5 | 50 | 0.87 | 18 | Fig. S1(42) |
43 | 5.54 | 19.8 | 50 | 1.94 | 40 | Fig. S1(43) |
44 | 5.57 | 20.2 | 50 | 3.28 | 54 | Fig. S1(44) |
45 | 5.52 | 20.4 | 50 | 4.25 | 70 | Fig. S1(45) |
46 | 5.48 | 20.5 | 50 | 5.04 | 83 | Fig. S1(46) |
47 | 5.53 | 20.6 | 50 | 5.17 | 80 | Fig. S1(47) |
48 | 5.51 | 20.6 | 50 | 5.56 | 67 | Fig. S1(48) |
49 | 5.49 | 20.4 | 50 | 7.08 | 17 | Fig. S1(49) |
50 | 5.54 | 19.8 | 50 | 8.82 | 20 | Fig. S1(50) |
51 | 5.51 | 16.8 | 0 | — | 0.0 | Fig. S1(51) |
52 | 7.75 | 17.0 | 0 | — | 6.2 | Fig. S1(52) |
53 | 8.09 | 17.0 | 0 | — | 9.4 | Fig. S1(53) |
54 | 8.32 | 17.5 | 0 | — | 19.6 | Fig. S1(54) |
55 | 8.68 | 17.9 | 0 | — | 24.5 | Fig. S1(55) |
56 | 7.89 | 18.2 | 0 | — | 24.5 | Fig. S1(56) |
57 | 7.42 | 18.3 | 0 | — | 24.5 | Fig. S1(57) |
58 | 7.04 | 18.5 | 0 | — | 24.5 | Fig. S1(58) |
59 | 6.49 | 18.7 | 0 | — | 24.5 | Fig. S1(59) |
60 | 4.54 | 18.8 | 0 | — | 24.5 | Fig. S1(60) |
61 | 7.02 | 19.0 | 0 | — | 44 | Fig. S1(61) |
62 | 7.40 | 19.0 | 0 | — | 44 | Fig. S1(62) |
63 | 7.84 | 19.2 | 0 | — | 44 | Fig. S1(63) |
64 | 8.55 | 19.2 | 0 | — | 44 | Fig. S1(64) |
65 | 9.37 | 19.2 | 0 | — | 44 | Fig. S1(65) |
When ammonium ions absent in the solution, the cyclic voltammetry curve reflects the redox process of hypochlorite ions, as shown in Fig. 2(a). In this figure, black solid line is group 52 with total chlorine 6.2 mg L−1, red dot line is group 53 with total chlorine 9.4 mg L−1, blue dot dash line is group 54 with total chlorine 19.6 mg L−1, green dash line is group 55 with total chlorine 24.5 mg L−1. There was an obvious oxidation peak of hypochlorite ions around 1 V and an obvious reduction peak of hypochlorite ions around −0.3 V. The peak potential of these two peaks is proportional to the concentration of hypochlorite, which indicates that cyclic voltammetry can effectively detect the presence of hypochlorite ions. Fig. 2(b) shows another case, four groups with similar total chlorine values but completely different chloramines. In this figure, black solid line is group 6 with total chlorine 19 mg L−1, red dot line is group 14 with total chlorine 20.4 mg L−1, blue dot dash line is group 23 with total chlorine 22 mg L−1, green dash line is group 32 with total chlorine 17 mg L−1. The apparent non-coincidence of the cyclic voltammetry curve occurred in the region below −0.2 V, which reflected the difference in the reduction potential caused by the chloramine concentration changes. When the total chlorine and chloramine of the solution change at the same time, the characteristics of the cyclic voltammetry curve become difficult to distinguish directly, as shown in Fig. 2(c). In this figure, black solid line is group 44 with total chlorine 54 mg L−1, red dot line is group 46 with total chlorine 83 mg L−1 blue dot dash line is group 48 with total chlorine 67 mg L−1, green dash line is group 49 with total chlorine 17 mg L−1. As can be seen from the figure, there are great differences between different cyclic voltammetry curves, especially for the reduction curves, multiple reduction reactions occur simultaneously, which making it difficult to measure the concentration of a single substance. In this case, establishing a prediction model of total chlorine is a potential solution to total chlorine measurement.
Fig. 3 represents the flow diagram of the modeling procedure. The prediction model consists of two stages. One is preprocessing of the original data, the other is the prediction model establishment and evaluation. Each original dataset contains pH, temperature and cyclic voltammetry curves with a total number of 2102 dimensions. During data preprocessing, only the 2100 dimensional data of the cyclic voltammetry curves were extracted by PCA or peak sampling method, and the pH value and temperature in the original dataset remained unchanged. Those two data and the extracted data constitute a new data set for the establishment of the prediction model.
On the first stage, the feature of the cyclic voltammetry curve was extracted by peak sampling and PCA method. For the former method, the peak voltage of the cyclic voltammetry curve needs to be determined first. As it is shown in K. A. S. Pathiratne's25 and F. Terzi's23 research, the oxidation peak of hypochlorite ion appeared at the position of 1030 mV, and the five reduction peaks were appeared at the position of −108 mV/−350 mV, 0 mV/380 mV and 200 mV respectively, representing electroreduction monochloramine, dichloramine and trichloramine. By looking up the file of the cyclic voltammetry curve, the current values of six peak voltages in each curve can be obtained. These six current values and the corresponding pH and temperature data will be used as new dataset in the subsequent modeling process. In this way, the original data is extracted to eight dimensions. Here, we call this method as peak sampling method.
Another data processing method is the PCA method. In this method, the pH value and temperature in the original dataset remained unchanged, too. The 2100-dimensional original cyclic voltammetry curve dataset is represented as a linearly independent vector among various dimensions by linear transformation28 and replaced with fewer comprehensive variables under the principle of minimum original data loss.29 The original cyclic voltammetry curve data with dimension P can be expressed as X = (x1, x2, x3,…xp), in order to make all data have the same weight in the calculation process, it is necessary to standardize the data according to the following formula30,31 (14):
(14) |
To verify whether the PCA method is valid on the data set, the Kaiser–Meyer–Olkin (KMO) test was carried out. The KMO index can be calculated by the following formula31,32 (15):
(15) |
By this method, each cyclic voltammetry curves was extracted into four principal components. These four principal components and the corresponding pH and temperature data will be used as new dataset in the subsequent modeling process. In this way, the original data is extracted to six dimensions. Here, we call this method as PCA method.
On the second stage, SVR and KELM were used to model those data samples obtained by the two feature extraction methods mentioned above, which are respectively called principal component analysis support vector regression (PC-SVR), peak sampling support vector regression (PS-SVR), principal component analysis kernel extreme learning machine (PC-KELM) and peak sampling kernel extreme learning machine (PS-KELM). Because of the speed of processing large data sets with KELM is extremely fast, this arithmetic was also used to model the original data directly.
(16) |
Defined here:
(17) |
The output of SLFN can be expressed by the following formula:
Hβ = T′ | (18) |
The output of ELM can be expressed by the following formula:
f(x) = h(x)β = h(x)HT(HTH)−1T′ | (19) |
To avoid singularity of HTH, a variable I/C is introduced, where I is the identity matrix and C is the penalty factor.35 The following equation can be obtained:
(20) |
Substitute eqn (20) into eqn (19), the output of ELM can be written as:
(21) |
By introducing kernel function, the dot product of vector in high dimensional space is operationally mapped to low dimensional space. The kernel function is defined as k(x,y) = 〈ϕ(x),ϕ(y)〉 and the kernel matrix is constructed as Ω = HTH, Ωi,j = 〈h(xi),h(xj)〉. So the operations of HTH and h(x)HT in high dimensional space can be given by kernel matrix Ω in low dimensional space. KELM uses kernel function to replace the random value of the weight coefficient between the input layer and the hidden layer in ELM. Hence, there is no need to set the number of hidden layer nodes, which improves the generalization and accuracy of the model.36 The output function of KELM can be represented as:
(22) |
Let the spatial regression function be the following equation:
f(x) = w·Φ(x) + b | (23) |
In order to improve the generalization ability of the model and prevent over-fit relaxation variables ξi and are introduced, and the objective function becomes:
The constraint conditions can be written as:
After that, Lagrange functions and kernel functions are introduced and converted into dual forms, and the regression function can be expressed as:
(24) |
As can be seen from the Table 2, the cumulative variance of the first four principal components (PC1–PC4) reaches 95.39%. Therefore, the first four principal components are selected to replace the original cyclic voltammetry curve data set. These four principal components and the corresponding pH and temperature data will be used as new characteristics in the subsequent modeling process. With PCA, the final eigenvector is u = [pH,T,PC1,PC2,PC3,PC4].
PC | Variance proportion/% | Cumulative variance proportion/% |
---|---|---|
PC1 | 55.7055 | 55.7055 |
PC2 | 23.8902 | 79.5957 |
PC3 | 8.6415 | 88.2372 |
PC4 | 7.1534 | 95.3906 |
PC5 | 1.7478 | 97.1384 |
PC6 | 1.0480 | 98.1864 |
PC7 | 0.7110 | 98.8974 |
The loading matrix curves of the first four principal components are obtained through calculation, as Fig. 4. It can be seen from the loading curve in the range from 0.9 V to 1.2 V that PC2 exhibits high overall loading, and its peak potential is 1.050 V. This peak potential is consistent with the oxidation voltage of hypochlorite ions on the electrode surface (formula (4)), which proves that the concentration change characteristics of hypochlorite ions were captured successfully by the method of PCA. In the voltage region corresponding to the reduction curve, PC2 exhibits high overall loading in the range of 1.2 V to 0.8 V, and its voltage peak value is 0.895 V, which may indicate that chlorine gas in solution was reduced here.40 The reaction equation is shown below:
Cl2(aq) + 2e− → 2Cl− | (25) |
PC1 exhibits high loading over the whole range of 0.8 V to −0.6 V. Only three small peaks appear within this relatively flat curve. The first peak value at 0.656 V may represent the reduction of hypochlorite as following equation:40
HClO(aq) + H+ + 2e− → Cl− + H2O | (26) |
The loading curve from 0.6 V to −0.6 V may represent the overall reduction process of chloramine. Just like Barbara Piela's research results,22 in the cyclic voltammetry curve, the position and gradient of chloramine reduction peak may change under different pH values and chloramine type, indicating that it is not sufficient to judge chloramine concentration only by peak value. This is the main difference between PCA and peak sampling method.
The grid method was used to find the optimal parameters for the data set processed by PCA. For PC-KELM, the number of the hidden layers can be automatically adjusted. In this paper, Radial Basis Function (RBF) K(x,y) = e−γ‖x−y‖2 were selected and the optimal result of the grid method is penalty coefficient C = 5 while super parameter γ = −3. For PC-SVR, the same RBF were selected and the optimal result of the grid method is penalty coefficient C = 3 while super parameter γ = 0.0221.
The similar optimization method was used for the peak sampling data. For PS-KELM, the optimal result of grid method is penalty coefficient C = 5 while super parameter γ = −0.5. For PS-SVR, the optimal result of grid method is penalty coefficient C = 3 while super parameter γ = 0.3536.
The KELM method with the direct raw data set as input was optimized in the same way and the result is penalty coefficient C = 5 while super parameter γ = 9.
• Root mean square error (RMSE) is a very common error prediction method for general purpose. It can be written as follow:
(27) |
• Mean absolute error (MAE) reflects the size of the actual prediction error. It can be written as follow:
(28) |
The smaller the MAE is, the better the model effect.
• Coefficient of determination (R2) represents the proportion of dependent variables that can be explained by controlled independent variables, which can be written as follow:
(29) |
The larger the R-squared is, the better the model effect.
• Nash–Sutcliffe efficiency coefficient (NSE) represents the degree of coincidence between the plot of true value and predicted value and 1:1 line. It can be written as follow:
(30) |
Table 3 shows the performance comparison of the algorithms used in this article. For the training data, all the models showed high R2 and NSE values, and the prediction of the total chlorine concentration was in good agreement with the experimental results. We mainly care about the generalization performance of the model on the unknown data set, that is, the performance on the testing set, as shown in Fig. 5. The SVR model of PCA dimensionality reduction data (Fig. 5(a) and (b)) shows a high prediction performance for the total chlorine concentration in the testing set. That is because of the core goal of SVR is to find support vectors, and the number of samples actually participating in model construction is far less than the given number of samples, which is suitable for small sample conditions. KELM model (Fig. 5(c) and (d)), which need to traverse samples, tend to show high fitting of training set and poor prediction performance of testing set because there are not enough samples to adjust each weight, resulting in over-fitting.
Method | Data set | RMSE | MAE | R2 | NSE |
---|---|---|---|---|---|
a The optimal result of the training set.b The optimal result of the testing set. | |||||
PC-SVR | Training data | 0.0549 | 0.0301 | 0.9883 | 0.9879 |
Testing data | 0.0900a | 0.0619a | 0.9689a | 0.9639a | |
PS-SVR | Training data | 0.0328 | 0.0204 | 0.9949 | 0.9949 |
Testing data | 0.1071 | 0.0776 | 0.9587 | 0.9505 | |
PC-KELM | Training data | 0.0893 | 0.0669 | 0.9701 | 0.9686 |
Testing data | 0.1131 | 0.0825 | 0.9524 | 0.9434 | |
PS-KELM | Training data | 0.1550 | 0.1168 | 0.9103 | 0.9072 |
Testing data | 0.1972 | 0.1501 | 0.8705 | 0.8417 | |
KELM | Training data | 0.0083b | 0.0069b | 0.9999b | 0.9998b |
Testing data | 0.3516 | 0.2939 | 0.5666 | 0.3737 |
Fig. 5 Relationship between actual values and predicted values of total chlorine for the testing set. (a) PC-SVR model. (b) PS-SVR model. (c) PC-KELM model. (d) PS-KELM model. (e) KELM model. |
In addition, the model performance of the PCA data sets is generally superior to peak sampling data set, this is because PCA computes and processes the entire cyclic voltammetry curve, preserving the original information to the maximum extent, while peak sampling is easier to ignore some useful information that cannot be observed by the naked eye, and richer feature information is undoubtedly better in model training.
The dimension of data is another factor that affects the model training. As shown in the Fig. 5(c)–(e), the training set results of KELM model are optimal on most indicators, but the test set results are the worst. The KELM model shows severe over-fitting when modeling the original data, and the modeling effect of PCA is obviously better than that of the original data. This indicates that the feature extraction of the cyclic voltammetry curve is a necessary and important part in the measurement model.
(31) |
Method | Detection limit (mg L−1) | Detection range (mg L−1) | Average quoted error (%) |
---|---|---|---|
PC-SVR | 2.42 | 2.42–83 | 3.8% |
PS-SVR | 9.62 | 9.62–83 | 3.9% |
PC-KELM | 15.45 | 15.45–83 | 4.6% |
PS-KELM | 35.40 | 35.40–83 | 11.5% |
KELM | 53.68 | 53.68–83 | 23.7% |
No. | Tap water | Wastewater | ||||
---|---|---|---|---|---|---|
PC-SVR (mg L−1) | DPD method (mg L−1) | Deviation (mg L−1) | PC-SVR (mg L−1) | DPD method (mg L−1) | Deviation (mg L−1) | |
1 | 2.6 | 0.03 | 2.53 | 4.5 | 0 | 4.5 |
2 | 7.3 | 4.6 | 2.7 | 7.7 | 3.5 | 4.2 |
3 | 13.8 | 11.2 | 2.6 | 12.9 | 9.1 | 3.8 |
4 | 16.8 | 15.9 | 0.9 | 19.8 | 15.2 | 4.6 |
5 | 20.9 | 18.7 | 2.2 | 24.5 | 21.2 | 3.3 |
6 | 26.3 | 24.8 | 1.5 | 11.1 | 9.4 | 1.7 |
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c9ra06609h |
This journal is © The Royal Society of Chemistry 2019 |