F. S.
Grasel
*ab,
M. C. A.
Marcelo
c and
M. F.
Ferrão
c
aTANAC S/A, Rua Torbjorn Weibull, 199, 95780-000, Montenegro – RS, Brazil Web: fsgrasel@gmail.com
bPrograma de Pós-graduação em Engenharia e Tecnologia de Materiais, Pontifícia Universidade Católica do Rio Grande do Sul, Avenida Ipiranga, 6681, 90619-900, Porto Alegre – RS, Brazil
cInstituto de Química, Universidade Federal do Rio Grande do Sul, Avenida Bento Gonçalves, 9500, 91501-970, Porto Alegre – RS, Brazil
First published on 17th March 2016
One way to produce tannins is through their extraction from trees, which are an abundant resource and are safe for the environment and human health. Thereby, it is possible to produce tannins in an environmentally friendly way due to its renewable sources. Also, the applications of these compounds have increased in order to obtain new materials. In this study, a methodology was developed for the identification and classification of six commercial tannin extracts by type of plant (chestnut, valonea, tara, myrobalan, quebracho and black wattle) using multivariate analysis of digital images acquired through a commercial scanner. The first two principal components of the principal component analysis showed a well-defined separation of the extracts into six distinct classes. Hierarchical cluster analysis corroborates this separation. Support vector machine discriminant analysis (SVM-DA) and partial least square discriminant analysis (PLS-DA) indicated good classification results. The SVM-DA algorithm presented better results than the PLS-DA, with both sensitivity and specificity of 100%. Multivariate analyses of vegetable tannin extracts through scanned images offer equivalent results to those obtained by FTIR and NIR without the need to invest in expensive and sophisticated equipment, according to green chemistry principles.
The term “tannin” was first used in the late 18th century by the French chemist Armand Séguin to describe the chemical agents responsible for the fabrication of leather.2,7 The tannins are classified into two groups according to their chemical structure: hydrolysable and condensed tannins.6 Hydrolysable tannins are esters of gallic acid or glycosylated ellagic acids where the sugar hydroxyl groups are esterified with phenolic acids.8,9 The ellagic tannins are much more frequent than the gallic tannins and it is likely that the biphenyl system of hexahydroxydiphenyl acid is a result of the oxidative coupling of two gallic acids.10,11 Widely found in the vegetable kingdom, condensed tannins, or proanthocyanidins, are oligomers of flavan-3-ol and flavan-3,4-diol condensed towards C4–C6 and C4–C8 of the structure.8,12
Concerning the environment, the development of new materials derived from bio-based sources is a necessity.13 When searching for new and better materials, all relevant ecological guidelines (requiring the replacement of petroleum-based products) have to be taken into consideration.14 One way to produce tannins is through their extraction from trees, which are an abundant resource and are safe for the environment and human health.15,16 Thereby, it is possible to produce tannins in an environmentally friendly way due to their renewable sources. Also, the applications of these compounds have increased in order to obtain new materials.17 Among the new applications are uses related to carbon foams,18,19 adhesives for particle and fiberboards,20,21 insulation materials,22,23 superparamagnetic biochar24 and anti-fouling agents.25–27
The chemical composition of the tree extracts is very complex and difficult to characterize because there is a wide variation in the extract composition from one plant to another. Complex and expensive techniques such as HPLC-MS, ESI-MS, MALDI-TOF, 13C-NMR, and UPLC-MS/MS have been used to characterize these structures.28–34 Recently, Grasel et al.1 identified different types of polyphenolic extracts by Fourier transform infrared spectroscopy (FTIR) combined with multivariate analysis. Through both principal component analysis (PCA) and hierarchical cluster analysis (HCA), well-defined separation can be observed between the extracts. The FTIR technique associated with multivariate analysis proved to be a quick, easy and reliable technique to identify the extracts.
Accurate identification of tannin extracts by FTIR is very complex and time consuming, depending on the sensitivity of the analyst, and may lead to erroneous results. The use of multivariate analysis enables a more rapid and precise identification based on statistical methods that simultaneously analyze multiple measurements for each individual or object under investigation. In another study, Grasel and Ferrão35 used NIR spectroscopy associated with PLS-DA for the classification of tannin extracts with 100% sensitivity and specificity. PLS-DA was performed in order to sharpen the separation between groups of observations by rotating PCA components until a maximum separation among classes was obtained, as well as to understand which variables carry the class separating information.
The analysis of digital images associated with multivariate analysis is a cheap, fast and non-destructive method which doesn’t require the use of equipment such as FTIR or NIR. The results of the analysis of digital images acquired by scanners, cell phones, digital cameras and webcams are as accurate as instrumental results obtained by other methods when associated with multivariate analysis.36–40
Costa et al. proposed a methodology requiring no reagents or sophisticated equipment, based on the principles of green chemistry using digital images and pattern recognition techniques for biodiesel classification according to oil type (cottonseed, sunflower, corn, or soybean).36 Different colored histograms (extracted from the digital images) in the RGB (red, green and blue), HIS (hue, saturation and value of luminance) and grayscale channels and their combinations were used as analytical information and statistically evaluated by SIMCA, PLS-DA, and SPA-LDA. The classification models provided good results (up to 95% for all approaches) in terms of accuracy, sensitivity, and specificity in both the training and test sets.
Santos and Pereira-Filho proposed the use of digital imaging as an alternative method for the identification and quantification of milk adulteration. Bovine milk samples were spiked with tap water, whey, hydrogen peroxide, synthetic urea and synthetic milk at different levels of adulteration.37 By using an inexpensive scanner as the analytical instrument, the proposed strategy offered a promising alternative to assess milk quality using a simple, rapid and non-destructive method. SIMCA and KNN classification models discriminated the control milk samples from several potential adulterants at levels of adulteration of >5% v/v.
In this study a new methodology is proposed based on digital images and pattern recognition techniques for tannin classification according to their origin (chestnut, valonea, quebracho, black wattle, tara and myrobalan). In this study, color histograms in the RGB, HSI and grayscale channels were extracted from the digital images and used as analytical information and then statistically evaluated using PLS-DA and SVM-DA.
![]() | (1) |
The number of samples in each set was as follows: 7 and 8 black wattle, 6 and 5 quebracho, 3 and 4 tara, 4 and 6 chestnut, 6 and 5 myrobalan, and 4 and 2 valonea were included in the training set for the Kennard–Stone and Duplex algorithms, respectively, whereas 3 and 2 black wattle, 3 and 4 quebracho, 4 and 3 tara, 4 and 2 chestnut, 1 and 2 myrobalan, and 1 and 3 valonea were in the test set for the Kennard–Stone and Duplex algorithms, respectively. The sensitivity and specificity were calculated according to eqn (2) and (3):
![]() | (2) |
![]() | (3) |
![]() | ||
Fig. 1 Digital images for each type of tannin extract acquired by a commercial scanner. (a) Chestnut. (b) Myrobalan. (c) Quebracho. (d) Tara. (e) Valonea. (f) Black wattle. |
PC number | Variance (%) | Cumulate variance (%) |
---|---|---|
1 | 84.18 | 84.18 |
2 | 12.95 | 97.13 |
3 | 2.81 | 99.94 |
4 | 0.06 | 100.00 |
Fig. 2 shows the plane of the scores of the PC1 (84.18%) and PC2 (12.95%). According to Fig. 2, it is indicated that PC1 separates two main groups: the black wattle, quebracho, chestnut and valonea extracts in the negative scores and the tara and myrobalan extracts in the positive scores. This separation is due to the observed color lightness of each group: the lightest tannins (myrobalan and tara) are separated from the darkest tannins (black wattle, quebracho, chestnut, and valonea). The PC2 separates the myrobalan, chestnut and valonea in the positive scores from the tara, quebracho and black wattle in the negative scores. The chestnut and valonea have darker coloration than the black wattle and quebracho. Still, the myrobalan is a shade much more intense than the tara. Also, according to Fig. 1, all samples are correctly grouped in their own classes.
Fig. 3 shows the loadings of PC1 and PC2 for each variable. It is important to emphasise that the formation of groups as well as their separation observed in Fig. 2 are directly related to the signals observed in their loadings. In Fig. 3, it is observed that all of the selected variables have an influence on the PC1 separation; the R, G, B, g% and b% have positive values, and r% has a negative value. For PC2, the variable g% has a positive value and a greater influence on the separation observed in Fig. 1; the remaining parameters have little influence on this PC in relation to the g% variable.
Therefore, according to Fig. 2 and 3, PC1 is responsible for the separation of a tannin extract by its lightness based on all six of the variables studied. Moreover, PC2 separates the myrobalan, chestnut, and valonea samples in its positive side from the tara, quebracho and black wattle samples in its negative side based mainly on the g% variable.
![]() | ||
Fig. 4 Dendrogram of the digital images of the tannin extracts: black wattle (W), quebracho (Q), tara (T), myrobalan (M), valonea (V) and chestnut (C). |
The results observed in the HCA corroborated with those observed in the PCA: the groups that are closer in the PCA are also more similar in the HCA. Also, all of the samples of the same extract are grouped together, which indicate that these samples may be correctly classified by supervised techniques. In relation to the results of the spectroscopic techniques, the color results from PCA and HCA were very similar to those observed in the multivariate analysis of the structural data by FTIR1 and NIR,35 but rather than a grouping by chemical similarity, it was by colorimetric similarity.
Modeled class of tannins | PLS-DA | SVM-DA | SIMCA | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Kennard–Stone | Duplex | Kennard–Stone | Duplex | Kennard–Stone | Duplex | ||||||||||||||
Sens (%) | Spec (%) | Sens (%) | Spec (%) | Sens (%) | Spec (%) | Sens (%) | Spec (%) | Sens (%) | Spec (%) | Sens (%) | Spec (%) | ||||||||
Tr | Ts | Tr | Ts | Tr | Ts | Tr | Ts | Tr | Ts | Tr | Ts | ||||||||
a Spec = specificity; Sens = sensitivity; Tr = training set; Ts = Test set. | |||||||||||||||||||
1 | Chestnut | 100 | 100 | 88.5 | 100 | 100 | 94.7 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
2 | Myrobalan | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 75 | 100 |
3 | Quebracho | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
4 | Tara | 100 | 100 | 100 | 100 | 100 | 97.5 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 66 | 100 |
5 | Valonea | 100 | 100 | 100 | 100 | 100 | 97.5 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
6 | Black wattle | 100 | 100 | 78.3 | 75 | 100 | 97.5 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
Fig. 5 shows the results obtained for PLS-DA with the PLS-DA threshold as the top line, using the leave-one-out method for cross-validation, three latent variables (LVs), and the Kennard–Stone algorithm. According to Fig. 5, all analysed samples were classified correctly in their own classes (sensitivity of 100%). The classification of the chestnut and black wattle class presented results of 88.5% and 78.3%, respectively, for specificity while all of the other classes presented a result of 100%. In the chestnut classification, some valonea extracts presented false positives due to the similarity between their color and that of the chestnut class, which is derived from the chemical and physical similarity of their composition.49,50 In the classification of the black wattle class, chestnut and quebracho were false positives. These misclassifications are probably because the black wattle extract has an intermediate color between the chestnut and quebracho extracts. The PLS-DA classification accuracy with the Duplex algorithm was worse than with the Kennard–Stone algorithm. The chestnut, tara, valonea and black wattle specificity parameters are almost perfect. However, in the sensitivity parameter, the black wattle training samples were misclassified, also probably due to their intermediate color.
![]() | ||
Fig. 5 Natural tannin extracts classification by PLS-DA of the digital images. (1) Chestnut (![]() ![]() ![]() ![]() ![]() ![]() |
On the other hand, the classification results for SIMCA with the Kennard–Stone algorithm were perfect in relation to the sensitivity and specificity parameters. It was used for 2 PCs for black wattle, quebracho, tara and myrobalan whereas only 1 PC was used for the chestnut and valonea classes. This improvement, in comparison to the PLS-DA classification results with the same separation algorithm, may be because two categories were aligned along the same direction in the multivariate space causing a masking problem in the PLS-DA. However, when SIMCA was used to classify the samples with the Duplex algorithm, it yielded a similar result to that with PLS-DA. The number of PCs was 2 for the chestnut group and 1 PC for the other classes. Only two samples, one myrobalan and one tara, were misclassified, probably because they are more distant from their class’s centroid in the PC1 according to Fig. 2. These same more distant samples were selected to be in the training group in the Kennard–Stone algorithm, which justifies the better classification accuracy.
Fig. 6 shows the results obtained for SVM-DA with Kennard–Stone separation. The cost functions were 316227.8 and 100
000 and the gamma functions were 1 × 10−5 and 1 × 10−5 for the Kennard–Stone and Duplex algorithms, respectively. All samples analysed using the SVM-DA algorithm were classified correctly in relation to sensitivity and specificity for both separation methods.
![]() | ||
Fig. 6 Natural tannin extract classification by SVM-DA of the digital images. (1) Chestnut (![]() ![]() ![]() ![]() ![]() ![]() |
The best classification achieved by SVM-DA is mainly due to the flexibility and ability of the algorithm to create a generalized model, even for small training groups. Its high efficiency for robust classification is attributed to the appropriate use of kernel functions. The separation algorithm that led to a best classification rate was the Kennard–Stone algorithm, which may be due to the fact that the algorithm chose the more diverse samples for the training group whereas the Duplex provided a training and test group which were more equally dispersed.
Multivariate analysis of vegetable tannin extracts through the scanned images offers equivalent results to those obtained by FTIR and NIR without the need to invest in expensive and sophisticated equipment. The methodology developed for the analysis of tannin extracts obeys the principles of green chemistry, requiring no reagent, and being fast, non-destructive and inexpensive.
This journal is © The Royal Society of Chemistry 2016 |