GC-MS combined with multivariate analysis for the determination of the geographical origin of Elsholtzia rugulosa Hemsl. in Yunnan province

Elsholtzia rugulosa Hemsl., a Chinese herbal medicine, may have the potential to treat COVID-19. The geographical origin has a significant influence on the quality and application of E. rugulosa. In this paper, gas chromatography-mass spectrometry (GC-MS) combined with principal component analysis (PCA) and hierarchical cluster analysis (HCA) and other multivariate statistical analyses were performed for the identification of E. rugulosa. origins. The results showed that the volatile components of E. rugulosa. from different origins were significantly different. PCA and HCA can clearly distinguish the E. rugulosa of Lijiang and Fumin, and Dali and Yongsheng can be distinguished but with a certain overlap. The correlation of different components of was investigated by Pearson correlation. The results showed that E. rugulosa. characteristic component Elsholtzia ketone is regulated by terpenoid metabolism. The discriminant functions of different origins are constructed by Fisher stepwise discrimination, and its initial verification accuracy and leave-one-out cross-validation accuracy were 100% and 87.5%, respectively.

Origin traceability is an essential guarantee for the authenticity of the product, and the rights and interests of consumers. There are many research reports on the origin traceability of agricultural products such as wine, cheese, beef, rice, and wheat. [19][20][21][22] These techniques of origin traceability include isotope analysis, multielement analysis, infrared spectroscopy, LC-MS, DNA method, etc. Gas chromatography-mass spectrometry (GC-MS) has good reproducibility, high sensitivity, and excellent separation and identication of volatile components. 23 The GC-MS determination of volatile components can be used to make a preliminary evaluation of the medicinal efficacy of herbal medicines and further combined with chemometric methods can realize the traceability of its origin.
The primary aim of this research was to trace the origin of E. rugulosa from different origins. For the rst time, GC-MS analysis technology, combined with multivariate statistical methods such as principal component analysis (PCA), hierarchical cluster analysis (HCA), and correlation analysis were used to analyze the volatilization of E. rugulosa from different origins. The E. rugulosa geographical origin discriminant functions were Samples were collected in the fall. All samples were dried in the sun and ground to obtain the powder.

GC-MS analysis
0.1 g ground sample was weighed in 10 mL centrifuge tube and 1.5 mL of 1 mg mL À1 deuterated toluene ethyl acetate solution was added. Then, the mixture was ultrasonically treated for 10 min, centrifuged at 4000 rpm for 8 min, ltered, 500 mL supernatant was transferred to a chromatography injection vial for the GC-MS test.
Gas chromatography (TRACE 1310, Thermo Scientic, USA) coupled with mass spectrometry (ISQ 7000, Thermo Scientic, USA) was employed to evaluate the volatile components in the sample. The column used was a DB-35 GC column (30 m Â 0.25 mm Â 0.25 mm) (Agilent, USA). The inlet temperature was 280 C, and the helium gas ow rate through the column was 1 mL min À1 . The injection volume was 1 mL, and the split ratio was 10 : 1. The initial oven temperature was 80 C, heated to 275 C at a rate of 60 C min À1 , then raised to 295 C at a rate of 1 C min À1 , and held for 1 min. The transfer line and the ion source temperatures were 300 and 280 C, respectively. The ionization mode was the electron impact at 70 eV. The solvent delay was 2 min.

Data pre-treatment and statistical analysis
The data pre-treatment is preprocessed as follows: 31 Among them: C i represents the content of the measured odor components; A i is the peak area of each compound; A 0 represents the peak area of the internal standard substance; M 0 represents the mass of the internal standard substance; M is the mass of the sample taken. The data is then used for statistical analysis.
Analysis of variance (ANOVA) was carried out for each element using SPSS 22.0 soware (IBM, US). The signicance level was P < 0.05. And the SPSS 22.0 is also used to correlation analysis, Fisher Discriminant analysis, initial validation, LOOCV. PCA multivariate analysis was performed by SIMCA-P 11 (Sartorius Stedim, Germany). Hierarchical cluster analysis (HCA), Heatmap and other plots were used OriginPro 2017 (OriginLab, US).
Under the high signicance of difference level (F > 50, P < 0.05), the E. rugulosa pharmaceutical components with the highest relative content of the four origins were Elsholtzia ketone in Fumin, (1R,7S,E)-7-isopropyl-4,10dimethylenecyclodec-5-enol in Dali, squalene in Lijiang, and these compounds can be labeled as origin characteristic medicinal components. However, in this condition, Yongsheng E. rugulosa has no pharmaceutical components whose content is signicantly bigger than that of other origins.

PCA and HCA analysis
PCA is an unsupervised classication method, which can objectively and directly reect the classication of samples. 37 It is commonly used as a reduction analysis method in multistatistical analysis. PCA was performed for 4 different geoorigins of E. rugulosa using the Simca, and the results are shown in Fig. 3. As shown in Fig. 3, PCA rst primary component variance interpretation rate is 47.2%, and the second primary component is 21.3%. Its total variance interpretation rate was 78.5%. As seen in the PCA plots, Fumin and Lijiang can be well distinguished when clustered together. However, Dali and Yongsheng can be partially distinguished, and the E. rugulosa of the two origins are very close.
It shows that the differences in volatile components of Fumin and Lijiang E. rugulosa can be clearly distinguished by PCA, while the difference of Dali and Yongsheng components is small. It may be due to their geographical and climatic characteristics. Fumin has a typical subtropical mountain monsoon climate with the lowest altitude, while Lijiang has a low-latitude warm temperate plateau monsoon climate with the highest altitude, so these two origins of E. rugulosa have obvious characteristics, respectively. 25,26 Dali and Yongsheng belong to the transition from the subtropical mountain monsoon climate to the low-latitude warm temperate mountain monsoon climate, and the altitudes are relatively close. 25,26 Therefore, the E. rugulosa characteristics of the two regions are relatively similar, and the difference is not obvious.
In terms of administrative division, Lijiang county and Yongsheng county are under the jurisdiction of Lijiang Autonomous Prefecture, while Dali county is under the jurisdiction of Dali Autonomous Prefecture. At the same time, Yongsheng E. rugulosa has similar characteristics to Dali, but is different from Lijiang county, indicating that the plants' characteristics are mainly inuenced by geographical and climatic rather than administrative divisions.
The chemical composition of plant, indeed, is partially related to geographical and climatic conditions that are different among different geographical areas. 24,38,39 Geographical and climatic conditions affect plants directly via the regulation of their biosynthetic pathways or indirectly via their effects on vine physiology and phenology. 38 Ahmed found that the seasonality, water, geography, light factors, altitude, and temperature can cause 50% variation in secondary metabolites. 39 Rienth research on grapes showed that temperature, water, and solar radiation govern the synthesis and degradation of primary (sugars, amino acids, organic acids, etc.) and secondary (phenolic and volatile avor compounds and their precursors) metabolites. 38 In the future, the inuence of various geographical and climatic conditions on the metabolites of E. rugulosa will be systematically studied.
HCA is also an unsupervised objective classication method. 37 It is a method of classifying samples according to the degree of similarity, which reects the implicit similarity between samples. It combines the most similar samples together according to the degree of similarity between observations or variables and clusters the samples in a successive aggregation manner until all samples are nally clustered into one class. Generally, the smaller the critical value, the more similar the samples are. 37 HCA analysis was carried out using OriginPro 2017 for E. rugulosa of 4 different origins. As shown in Fig. 3, when the critical value is between 45.2 and 89.3, Fumin E. rugulosa can be distinguished from the other three types of yebazi. When the critical value is between 14.3 and 18.4, Lijiang E. rugulosa can be distinguished. When the critical value is between 18.4 and 45.2, Dali and Yongsheng E. rugulosa can be divided into two types, but there is a certain intersection. Therefore, the two groups of Fumin and Lijiang E. rugulosa were well separated and clustered into two categories, respectively. In contrast, the Dali and Yongsheng E. rugulosa were poorly separated and clustered into three categories with a certain crossover. The results were consistent with the PCA results. In addition, it can be seen from Fig. 4 that the distance between Fumin and the other three origins is the largest, and the difference is the most obvious. The Lijiang samples are closer to the samples of Dali and Yongsheng, and the samples are more similar.  The data are expressed as the mean AE the standard deviation. b "-" Not detected.

Correlation analysis
The correlation analysis of different components in E. rugulosa is of great signicance for studying the correlation, metabolic pathway, and metabolic network among E. rugulosa metabolites. It can be used to understand the inuence of geographical and climatic conditions on metabolites and the interaction process of different metabolites.
In this paper, the Pearson correlation coefficient between the compounds with a signicance level of P < 0.05 was calculated, and the results are shown in Fig. 5. The absolute value of the correlation coefficient is limited to be above 0.85, and the p < 0.05, as the signicant correlation condition. Under this condition, a total of 57 compounds, composed of 1596 pairs of compounds, of which 341 compound pairs with signicant correlation were found, 300 pairs were positively correlated, and 41 pairs were negatively correlated (Fig. 5). In general, the positively correlated metabolite pairs have similar chemical composition, biological function, and homogeneous characteristics. Among them, the characteristic components of E. rugulosa Elsholtzia ketone are positively correlated with 2methylpropyl 3-methylbutanoate, methyl 5-hydroxyiminopentanoate, 3-methyl-butanoic acid, caryophyllene, and negatively correlated with (1R,7S,E)-7-isopropyl-4,10dimethylenecyclodec-5-enol, isoaromadendrene epoxide, (À)-isolongifolol, beta-dehydroelsholtzia ketone and so on. Elsholtzia ketone and 3-methyl-butanoic acid (hemiterpene derivatives), 3-methyl-butanoic acid (hemiterpene derivatives), caryophyllene (sesquiterpene) are all terpenoid metabolites and positively affected by terpenoid biosynthesis. 16 Dehydroelsholtzia ketone may be obtained by dehydrogenation of Elsholtzia ketone, so they are negatively correlated. Therefore, Elsholtzia ketone is regulated by terpenoid metabolism. The results show that different geographical conditions have a great impact on terpenoid metabolic pathways.

E. rugulosa origin traceability functions and verication
From the above analysis, it can be seen that the volatile compounds of E. rugulosa from different origins have certain characteristics. Fisher discriminant is established based on the idea of variance. 40 The principle is to minimize the variance between groups and maximize the variance between groups. Fisher discrimination is the most commonly used classic discrimination method. Compared with "black box" discrimination such as neural network and fuzzy mathematics, etc., Fisher discrimination is "white box" discrimination. It can give the most inuential components and their relative weights. 40 Therefore, it is oen used in the discriminant analysis of samples.
To further trace the origin of E. rugulosa from different origins, we constructed the origin traceability functions using the Fisher discriminant analysis. The GC-MS data were imported into the SPSS soware, followed by step-by-step analysis and build the Fisher discrimination functions. Final screening of nine compounds as the representative function variable components are shown in Table 2.
The results of the functions are shown in Table 2. The four discriminant functions formed by these 9 indicators have variances of 82.0%, 13.5%, 3.4%, and 1.1% of the total variance, respectively, and the cumulative variance explanation rate is 100%. The resulting Fisher discriminant functions were the E. rugulosa origin discriminant function of 4 origins. In practical application, it is only necessary to substitute the contents of the above 9 indicators detected in the blind sample into the above functions, and the largest function value is the origin of the sample.
The validity of the model was veried by initial validation and LOOCV, and the results are shown in Table 3. In the initial  verication results, all E. rugulosa from the four origins were correctly classied, and the initial validation accuracy was 100%. The initial verication training set and the verication set are the same set, and its results have a certain bias, and the accuracy rate is relatively high. Therefore, it is necessary to verify it with LOOCV. 41,42 Leave-one-out cross-validation (LOOCV) is a special case of cross-validation. In each validation, nearly all the data except for a single observation are used for training, and the model is tested on that single observation. An accuracy estimate obtained using LOOCV is known to be almost unbiased. It is widely used when the available data are very rare. 41,42 In the LOOCV results as shown in Table 3, the E. rugulosa of Lijiang and Fumin were all correctly classied, and the LOOCV accuracy was 100%; while the E. rugulosa of Dali, three were classied as Dali and one was classied as Yongsheng, and the LOOCV accuracy was 75%; E. rugulosa of Yongsheng, three classications are Yongsheng, one is classied as Dali, and the LOOCV accuracy was 75%; the overall leave-one-out crossvalidation accuracy rate of the model was 87.5%.
The analysis results of LOOCV were consistent with the results of PCA and HCA, which may be caused by the similarity of the volatile components of Yongsheng and Dali E. rugulosa. It indicates that the GC-MS combined with multivariate statistical analysis for origin traceability has certain limitations, namely, when the geographical climate of the origin is similar and then result in the difference of volatile herbal volatile components may not be obvious, the classication is prone to misjudgment. However, herbal medicines with insignicant differences in chemical composition from different origins also have small differences in efficacy. Therefore, this misjudgment of origin has a limited impact on the efficacy and use of Chinese herbal medicines.
To sum up, the E. rugulosa origin traceability based on Fisher's stepwise discriminant analysis, the initial verication accuracy was 100%, and LOOCV accuracy was 87.5%, which can realize the identication of most E. rugulosa origins.

Conclusions
In this study, volatile components were determined in 16 E. rugulosa ower samples from 4 regions including Lijiang, Dali, Yongsheng, and Fumin. The GC-MS data were analyzed using ANOVA. The results showed that more than 17 active ingredients were identied in E. rugulosa. The origin characteristic medicinal components of E. rugulosa are Elsholtzia ketone in Fumin, (1R, 7S, e)-7-isopropyl-4,10-dimethylenecyclodec-5-enol in Dali, squalene in Lijiang, respectively, while Yongsheng has no obvious high medicinal components. PCA and HCA analysis show that the E. rugulosa samples of Fumin and Lijiang can be clearly classied, Dali and Yongsheng E. rugulosa can be partially distinguished but there is a certain overlap. This is due to the large difference in geography and climate between Fumin and Lijiang, while Dali and Yongsheng belong to the middle of the geographical climate transition zone with a certain crossover. In addition, the Pearson correlation coefficient was used to study the correlation of the chemical components of the E. rugulosa components. The results showed that E. rugulosa characteristic component Elsholtzia ketone is regulated by terpenoid metabolism. The effective identication of E. rugulosa origin is achieved by Fisher step-by-step discriminant analysis, and the initial verication and LOOCV accuracy of the E. rugulosa origin traceability functions are 100% and 87.5%, respectively.
The GC-MS combined with multivariate statistical analysis can not only determine the volatile components of Chinese herbal medicines, but also realize their origin traceability, which can be widely used in the efficacy evaluation and origin traceability. The E. rugulosa volatile compounds are rich in active antiviral ingredients, which may have efficacy in preventing, anti-or/and ease COVID-19, and it will be further studied in the future.

Conflicts of interest
There are no conicts to declare.