DOI:
10.1039/D4MO00161C
(Research Article)
Mol. Omics, 2025,
21, 7-18
Competing technologies: determining the geographical origin of strawberries (Fragaria × ananassa) using laboratory based near-infrared spectroscopy compared to a simple portable device†
Received
18th August 2024
, Accepted 21st November 2024
First published on 22nd November 2024
Abstract
The application and development of fast and simple screening methods for the authentication of foods has increased continuously in recent years. A widely used analytical technique is Fourier transform near-infrared spectroscopy (FT-NIR). Despite the simple application of FT-NIR analysis, the analyses are usually carried out on benchtop devices in the laboratory. However small, inexpensive and mobile NIR devices could be used on-site. Despite the simple use of FT-NIR analysis, the examinations are usually carried out on a stationary benchtop device in a laboratory. However, in order to be able to perform the application directly on site, the application of small, cost-effective and mobile NIR devices for food analysis is crucial. In this study, both, a benchtop NIR instrument and a handheld NIR device with a lower resolution and analyzed wavenumber range were applied for the differentiation of strawberries from different geographical origins. Distinguishing German and non-German strawberries using linear discriminant analysis (LDA) yielded an accuracy of 91.9% and 84.0% using the benchtop and the handheld devices, respectively. Relevant variables could be assigned to lipids, carbohydrates and proteins. Overall, our study demonstrated for the first time that analyzing the geographical origin of strawberries using NIR spectroscopy is also possible by means of a handheld device.
1. Introduction
Fourier transform near-infrared spectroscopy (FT-NIR) is an established method for simple and rapid authentication of food as it is a robust technique, which requires minimal sample preparation. It has proven to be a comprehensive analytical tool in omics technology and particularly in metabolomics. Compared to other spectroscopic techniques, NIR spectroscopy has limited resolution despite its broad applicability. In omics research, NIR spectroscopy is therefore often used as a complementary technology in combination with high-resolution techniques such as LC-MS.1 Furthermore, it is an environmentally friendly method as no solvents or hazardous chemicals need to be used. For this reason, small, handheld NIR devices that are portable and can be operated with a laptop or smartphone have increasingly come on the market in recent years.2 In order to assess the performance of this option for mobile analysis, a detailed comparison with already established benchtop devices is necessary. In recent studies, NIR applications have been used to analyse food ingredients, but also to answer more complex questions about authenticity. For example, analyses of the geographical origin of green coffee beans, almonds, soybeans and hazelnuts were successfully carried out.3–6 For foods with a low water content, it is possible to measure either the whole food or a homogenized and freeze-dried powder. The additional workload of freeze-drying leads to better classification rates.7 Determining the origin of water rich samples without freeze-drying, on the other hand, has not been possible up to now, as the large water bonds lead to unspecific spectra with little information.8,9 For this reason, the established method of sample application was chosen to enable a comparison between different NIR devices. All of these example studies were performed on Benchtop NIR devices, which normally require a laboratory with a stable temperature. However, there are already a few comparative studies in the area of food authentication that were carried out with handheld devices. The first publication dealt with the adulteration of South African honey and compared several handheld NIR devices based on their classification performance, but only a small number of samples were used.10 In addition, the performance of different handheld NIR devices was compared for the analysis of the adulteration of coriander seeds with salt, sawdust and starch with regression models.11 Furthermore, handheld NIR and benchtop devices were applied to distinguish different truffle varieties with 100% accuracy, which was to be expected as the differences in truffle varieties are large, and also other analytical techniques achieved very good results.12–15 Moreover, a trend to separate thirty truffle samples according to their geographical origin was observed.12 Recently, a study was published in which FT-NIR analyses of Spanish ham from different types of pigs and feeds were compared using handheld and benchtop instruments. Sixty ham samples from two ham types were compared and achieved classification rates of 50–92% depending on the NIR device.16 In all four comparative studies, different handheld devices were used and compared with the established benchtop devices. However, as far as we are aware, there are no studies that have used a large sample set to perform an in-depth comparison between a benchtop and a handheld NIR device for authentication of food regarding geographical origin.
Strawberries are a food that is grown and traded worldwide, with the price depending on their origin. It is therefore important to have a fast and cost-effective authentication process in place to differentiate between different geographical origins in order to prevent and detect food fraud. Important strawberry-growing countries for the European market are Spain, Greece, Germany and Egypt. Germany in particular has a high strawberry consumption and imports 127
000 tons of strawberries annually, making it an important target market for strawberry exporters.17
To date, almost all available handheld NIR devices measure reflection scattering, which is particularly suitable for powders and granulates.18,19 The measurement of liquid samples, for which an NIR device based on transmission mode would be necessary, is not part of this study, as handheld devices based on transmission measurements have not yet achieved any commercial significance.9 Furthermore, freeze-drying can avoid the negative influence of water in the NIR spectrum, which often superimposes relevant signals.7
The focus of this work is on the detailed comparison of a benchtop and a handheld NIR device using a dataset of 198 strawberries from four different countries (German 110 samples, Egyptian 22 samples, Greek 21 samples and Spanish 45 samples). In addition, the measuring ranges of the two NIR devices are analysed with regard to their different information content. For data evaluation, a comparison of the two NIR devices is carried out using principal component analysis (PCA) and a classification is performed using linear discriminant analysis (LDA). In addition, an analysis of variance (ANOVA) was carried out for each variable from the datasets in order to identify the most relevant wavelengths that enable the sample groups to be distinguished. In this way, both NIR devices can be compared on the basis of their spectra, classification rate and information content of the significant variables.
2. Materials and methods
2.1. Sample acquisition
A total of 198 strawberry samples were used for the study. Of those samples 110 German, and 22 Egyptian, 21 Greek and 45 Spanish. The acquisition of the German samples was carried out directly in the field, which guarantees that this is authentic sample material. For this purpose, 500 g of strawberries were collected from various rows in a field and from various plants, to cover the variance of one location as broad as possible. The samples of Egypt, Greek and Spain were provided by traders and distributors. In order to achieve a robust comparison of the two NIR devices, a sample pool was selected that covers various exogenous parameters such as different harvest years from 2021–2024, different varieties and regions (Table S1 in the ESI†), a summary of the information is shown in Table 1. To determine the influence of the water content of intact, homogenized and freeze-dried sample material in the NIR spectrum, 12 randomly selected strawberry samples were used.
Table 1 Summary of the meta-information of the analyzed samples
Origin |
Harvest year |
Cultivation |
Variety |
Egypt |
22 |
2021 |
12 |
Organic |
37 |
Clery |
12 |
Germany |
110 |
2022 |
71 |
Conventional |
161 |
Asia |
7 |
Greece |
21 |
2023 |
107 |
|
|
Fortuna |
6 |
Spain |
45 |
2024 |
8 |
|
|
Verdi |
5 |
|
|
|
|
|
|
Alegro |
4 |
|
|
|
|
|
|
Malwina |
4 |
2.2. Sample preparation
Sample preparation was started by removing the stems, quartering fruits and flash freezing it in liquid nitrogen. The quartered strawberries were stored at −20 °C until further processing. To homogenize the samples, they were mixed with the same amount of dry ice in a knife mill (Grindomix GM 300, Retsch, Haan, Germany). The following grinding program was used: first pre-grinding with the blade running backwards to protect it for 1 minute at 1000 rpm. Then the main grinding process took place for a total of 2 minutes at 4000 rpm. The dry ice was evaporated for approx. Three days by storage at −20 °C. In order to apply the NIR devices to strawberries, all samples were converted into a powder using freeze-drying. Freeze-drying was carried out by using a means of a freeze-dryer (Beta 2-8 LSCplus, Martin Christ, Osterode am Harz, Germany). This procedure has already been used in a large number of NIR studies and improves classification, even for matrices with low water content.7 The samples were freeze-dried for a total of four days, with the samples being stirred using a glass rod on the third day to allow the water to escape completely. After the freeze-drying the samples were ground using a mortar (Haldenwanger, Waldkrainburg, Germany) and stored at −80 °C until further analysis. The sample preparation was identical for both NIR devices.
2.3. Near-infrared spectroscopy
The analysis of the 198 freeze-dried strawberries was performed according to standardized workflows of our group.20 A total of 1.25 ± 0.10 g of the freeze-dried sample material was weighed directly into glass vials (52.0 mm × 22.0 mm × 1.2 mm, Nipro Diagnostics Germany GmbH, Ratingen, Germany) and tempered to 22 °C ± 2 °C. For analysis with a benchtop spectrometer, an instrument from Bruker with an integration sphere, which was controlled with the OPUS software (TANGO, Bruker Optics, Bremen, Germany) was used. Per spectrum 50 scans were recorded in a wavenumber range of 11
550–3950 cm−1 with a resolution of 4 cm−1. Five technical replicates were measured per sample.
The novel NIR scanner “SenoCorder” from Senorics (Dresden, Germany) was used for the analysis of the 198 strawberry samples with a handheld NIR device. This device uses a Sensosense chip that emits the NIR radiation generated by two wolfram lamps and detects the reflected radiation from the sample. The device uses narrow-band sensors made of an organic, radiation-absorbing material, thus requiring only single wavenumbers and no monochromator.21 There are 16 pixels located on the chip, which enables the analysis of 16 different wavenumbers simultaneously and measurements within 2 seconds. Hence, the data generated by this device contains 16 values (spacing between 30 and 36 nm) in the wavenumber range from 8446–5938 cm−1 (second overtone), as it has been shown that the most relevant information for such studies is in this range.7,22 Due to the large distance between the measuring points, the handheld NIR has a low resolution with a S/N ratio of 4500
:
1. The NIR Scanner has automatic temperature control, which makes it ideal for robust use outside the laboratory. Ten technical replicates were measured for each sample to cover the sample's variance as best as possible. Between measurements of the replicates, the glass vial was shaken thoroughly and levelled evenly.
In addition to the freeze-dried samples, 12 intact strawberry samples and the same samples in homogenized form were measured with both NIR devices. For this purpose, NIR spectra were recorded from each individual strawberry at 5 points on the shell. The homogenized, non-freeze-dried strawberries were analysed by transferring 2.0 g of each into a glass vial. The NIR measurements of the 12 homogenized and freeze-dried samples were carried out in the same way as that of the freeze-dried samples.
2.4. Determination of the water content
To ensure that the freeze-drying process was carried out correctly, the water content of 12 strawberry samples was determined before and after freeze-drying using a drying balance (SMART 6, CEM Corporation, Matthews, USA). An intact strawberry from each sample was used to analyse the dry matter and resulting water content in the strawberry. To determine the water content of freeze-dried strawberries, 1.25 g of each sample processing stages was dried with the drying scale and the resulting water content was calculated.
2.5. NIR spectra pre-processing
For the data of the benchtop device, multiplicative scatter correction (MSC) was first applied to normalize the data and reduce scattering. After that, the data was binned, meaning that five adjacent variables were averaged into one bin, resulting in a reduction of the 3720 variables to 744 NIR wavenumber bins as binning reduces background noise. The formation of buckets does not necessarily result in the loss of information, as neighbouring variables are strongly correlated.5 In the final step, the arithmetic mean of the five spectra of a sample was generated. For pre-processing of the data from the handheld NIR device, MSC and mean centering of 10 spectra of each sample was performed. The entire data pre-processing was performed using Unscrambler 11 (Aspen Technology Inc., Bedford, MA, USA) and MATLAB R2020b (The MathWorks Inc., Natick, MA, USA).
2.6. Statistical analysis
For the application of multivariate approaches, R (V.4.3.3) and R-Studio (V 2023.12.1) were used. PCA was performed with scaling using the package factoextra (V 1.0.7). For LDA the packages mlr3 (V 0.17.2), mlr3verse (V 0.2.8) and mlr3filters (V 0.7.1) were used. All classification models were validated with a 10-fold cross validation, which was repeated 100 times. The results of the 10-fold cross-validation and 100 repetitions were averaged. In the first run of a 10-fold cross-validation, the model is trained with 90% of the data set and tested with the remaining 10%. In the second run, the next 10% of the data set is used as test dataset. This continues until all data has been used as test dataset for validation. This procedure ensures that the method has been validated with different test data sets. To identify the important variables, an analysis of variance was used with the ANOVA-filter function of the package mlr3, which includes a correction for multiple testing. All variables with a p-value of 0.05 and lower were considered as significant.
3. Results and discussion
3.1. NIR spectra of strawberries
In order to determine the differences in the data, obtained from the two different NIR spectrometers, the averaged spectra were investigated. The measured water contents in Table S2 (ESI†) shows that freeze-drying has been carried out correctly. Fig. 1A shows the FT-NIR spectra of the freeze-dried strawberry samples measured with the FT-NIR benchtop device in the wavenumber range of 11
550–3950 cm−1.
 |
| Fig. 1 NIR-spectra of freeze-dried strawberries obtained by the benchtop (A) and handheld (B) device. | |
The spectra show signals in the range between 8600 cm−1 and 8200 cm−1, which could be assigned to the second overtone of C–H stretching vibrations and the second overtone of the HC–CH stretching vibration, e.g., of unsaturated fatty acids. Additionally, signals between 5800 and 5600 cm−1 could be associated with aliphatic hydrocarbons, potentially resulting from the first overtone of the symmetric CH2 vibration and C–H stretching. Furthermore, bands at 4350 cm−1 could be caused by the second overtone of the C–H stretching and bending vibration of C–H2. Bands between 7000 and 6000 cm−1 are also visible and could be associated to the first overtone of the C–H stretching vibration and the third overtone of the C
O stretching vibration. In addition, they could also be assigned to the first overtone of the O–H stretching vibration and the N–H stretching vibration of proteins, and the first overtone of the symmetric and asymmetric O–H stretching vibrations. These absorption bands are characteristic of amines, amides and proteins. The signal at ∼5200 cm−1 could be assigned to the O–H stretching vibration and the H–O–H bending vibration of water. Bands at 4800 cm−1 could be assigned to O–H, C–O and C = O–O stretching vibrations as well as to a combination band of CONH2, which is typical for carbohydrates and proteins.7,23
Fig. 1B shows the NIR spectra of freeze-dried strawberries obtained by the handheld device covering the spectral range between 8500 cm−1 and 6000 cm−1, which can be associated to the second overtone of O–H, N–H and the C–H stretching vibrations of proteins, water and lipids 23. Since the handheld NIR device only records 16 data points with a comparatively small wavenumber range, a detailed interpretation of the spectra was only possible to a limited extent. For this reason, interpretation of detailed information about the chemical profile of strawberries in the spectra is only possible with the benchtop FT-NIR.
3.2. Principal component analysis
Principal component analysis was applied to the datasets obtained by the benchtop and handheld device containing 744 and 16 variables, respectively. The scores of the principal components (PCs) 1 and 2, 1 and 3, as well as the loadings of PC 1, PC 2 and PC 3 are shown in Fig. 2 for the data obtained by the benchtop and handheld device.
 |
| Fig. 2 Results of the PCA conducted on the NIR spectra obtained by the benchtop (A + C + E + G + I) and handheld (B + D + F + H + J) device. The scores of the first two principal components (A + B), the scores of the first and third principal component (C + D), as well as the loadings (E + F + G + H + I + J) of the first, second and third principal component are shown, with the labelled maxima and minima. The scores were coloured according to the origin of the samples from Germany or other countries (Egypt, Greece and Spain). | |
For the analysis of the data of the benchtop device, the predominantly negative values of PC 3 (Fig. 2C) can be used to separate the German samples from the non-German samples with predominantly positive values. Looking at the same plot with the colouring of the non-German samples according to specific countries (Fig. 3C), it can be seen that many of the non-German samples with negative values for the scores of the third principal components are Egyptian samples.
 |
| Fig. 3 Results of the PCA conducted on the NIR spectra of the benchtop (A + C) and handheld (B + D) device. The scores (A + B) of the first two principal components as well as the first and third principal component (C + D) are shown and the scores were coloured according to the origin of the samples from Egypt, Germany, Greece and Spain. | |
The differentiation between German and Egyptian samples is therefore difficult on the basis of PCA, while the separation of German from Spanish and Greek samples is clearly possible by the scores of the third principal component. The loadings of this principal component (Fig. 2I) shows high values in the spectral regions 8600–8300 cm−1, 6900–6600 cm−1, 5900–5200 cm−1 and 4900–4800 cm−1 associated with lipids, proteins and carbohydrates (see previous section) leading to the conclusion that different molecule classes are responsible for this differentiation. The scores of the first two PCs (Fig. 2 and 3A), which account for more than 80% of the explained variance, however, do not allow a distinction to be made between the samples of the different geographical origins. This plot shows a broad distribution of the German samples, which could be explained by a comparatively high diversity of the German samples. This could be caused by the broad geographical origin of the samples within Germany since the samples originated from 13 different federal states or by the great diversity of the samples in terms of taxonomic varieties since the German samples contained 41 different varieties.
The results of the PCA applied to the dataset of the NIR handheld device are shown in Fig. 2B, D, F, H and J. Most of the German samples show positive values for the first PC and negative values for the second PC, while non-German samples are characterized by the opposite (Fig. 2B). In contrast to the data from the benchtop device, the main variances of the dataset, adding up to about 79%, are therefore relevant for the specific characterization of the German samples, while PC 3 does not contain any relevant information (Fig. 2D). Again, however, when the non-German samples are coloured according to their country of origin, it is evident that Egyptian samples in particular have similar scores to German samples, and that Greek and Spanish show quite similar spectra. Thus, the PCA of the data from the handheld device also allows a distinction to be made between the two groups of German and Egyptian and Spanish and Greek samples rather than a precise distinction between the spectra of the individual countries. The loadings show that the spectral regions 8440–7800 cm−1, 7390–6882 cm−1 and around 6300 cm−1 (Fig. 2F) assigned to aliphatic hydrocarbons and proteins (see previous section), and around 6600 cm−1 (Fig. 2H) assigned to proteins have a great influence on the first and second PC, respectively. The analyses of the samples with the handheld device therefore indicate that proteins in particular are relevant for differentiating between German/Egyptian and Spanish/Greek samples.
When using unsupervised PCA, no complete separation between all the strawberry samples of the different origins could be obtained, neither by using the data of the benchtop nor the handheld device. Hence, a supervised approach was applied for classification in the following sections.
3.3. Differentiation of German and non-German strawberries
For differentiation of German and non-German samples, LDA was applied, and the results are shown in Table 2. The classification based on data of the benchtop device resulted in an accuracy of 91.9% with a correct classification rate of 93.6 and 89.9% for German and non-German samples, respectively. For the differentiation based on the data of the handheld device, an accuracy of 84.0% was obtained and 83.7 and 84.4% of the German and non-German samples were correctly classified, respectively. As expected, the benchtop device achieves a better accuracy for the identification of German samples, but the results of the much simpler and cheaper handheld device are also comparatively good.
Table 2 Results of the classification for the differentiation of German and non-German samples analysed with the benchtop and handheld device. An LDA with 10-fold cross validation and 100 repetitions was applied and the mean values of all repetitions in percent are shown together with the total number of samples
|
|
Response class |
|
Germany [%] |
Other origin [%] |
Number of samples |
True class |
Benchtop |
Germany [%] |
93.6 |
6.4 |
110 |
Other origin [%] |
10.1 |
89.9 |
88 |
|
Handheld |
Germany [%] |
83.7 |
16.3 |
110 |
Other origin [%] |
15.6 |
84.4 |
88 |
3.4. Differentiation by country of origin
Subsequently, the potential of NIR to distinguish specific geographical origins was analysed and four-class models were trained with LDA to distinguish between strawberry samples from the four different countries. The results are shown in Table 3. The accuracy for the benchtop device was 80.5%. Egyptian and German samples were classified with an accuracy of 79.2% and 93.3%, respectively, while 16.4% of the Egyptian samples were wrongly assigned to the German group. Therefore, mainly Egyptian samples were classified as German, while misclassifications were rare for the other two classes. The reason for this could be that German samples are overrepresented in the dataset and in these cases machine learning methods often tend to incorrectly assign samples to these groups. However, the fact that approximately 5% of the German samples are classified as Egyptian also indicates a similarity between the NIR spectra of these two classes, which was also observed in the PCA in the previous section. The accuracies of the Greek and Spanish samples were 42.4 and 69.1%, respectively, with frequent misclassification of 49.2% and 24.4% between those two classes. This could be explained by the similarity of the sample composition caused by similar climate conditions of the Spanish and Greek growing regions as Spanish samples originated mainly from the Huelva region and the Greek samples from the Pyrgos region. Both regions are located directly on the Mediterranean and at approximately 37° north latitude, and are characterized by a very similar maritime climate. The similar climatic conditions in Spain and Greece go hand in hand with similar cultivation methods. On large farms, strawberries are grown using irrigation systems that are very much geared towards large production volumes. In contrast, Germany and Egypt have positioned themselves as producers of high-quality strawberries, which in turn leads to similar growing methods. Several studies have already shown that similar exogenous influences such as climatic conditions can lead to similar plant metabolomes.24–26 For example, Hou et al. demonstrated that high temperatures change the glycerolipid composition of soybeans by reducing the proportion of polyunsaturated fatty acids.27–29
Table 3 Results of the classification for the differentiation by country of origin analysed with the benchtop and handheld device. An LDA with 10-fold cross validation and 100 repetitions was applied and the mean values of all repetitions in percent are shown together with the total number of samples
|
|
|
Response class |
Egypt |
Germany |
Greece |
Spain |
Number of samples |
A |
True class |
Benchtop |
Egypt |
79.2 |
16.4 |
0.0 |
4.4 |
22 |
Germany |
4.9 |
93.3 |
0.4 |
1.5 |
110 |
Greece |
0.5 |
7.9 |
42.4 |
49.2 |
21 |
Spain |
1.9 |
4.8 |
24.2 |
69.1 |
45 |
B |
True class |
Handheld |
Egypt |
52.7 |
36.1 |
0.0 |
11.2 |
22 |
Germany |
10.3 |
82.7 |
0.5 |
6.5 |
110 |
Greece |
0.0 |
0.6 |
55.3 |
44.1 |
21 |
Spain |
5.5 |
16.0 |
21.3 |
57.1 |
45 |
The results from the handheld device (Table 3B) show a total classification accuracy of 71.8% and 52.7, 82.7, 55.3 and 57.1% of the Egyptian, German, Greek and Spanish samples were correctly assigned. Hence, the accuracies are obviously lower than for the analysis of the data obtained by the benchtop device. The results of the handheld device, however, show the same misclassification patterns, as German and Egyptian and Greek and Spanish samples were often misclassified among each other. It can therefore be concluded that the data from the handheld device are limited for the differentiation of strawberry samples from individual countries of origin.
Overall, the reason for the comparatively low classification with regard to individual countries of origin could be that the variance of the individual groups is not sufficiently represented in the model. This could be particularly the case if samples from Spain and Greece, which are very similar, are to be differentiated and could improve if an even larger number of samples is used to train the models. This is supported by the fact that a distinction between German and non-German samples, which is based on a much larger number of samples, is possible with higher accuracy. In addition, the misclassifications of 21.3 and 44.1% between the Greek and Spanish classes could be caused by the similar climatic and geographic conditions, as discussed above. In addition, the dataset was also checked using a completely independent test dataset. However, fully independent test datasets are considered a high standard for method verification. The results of the independent test datasets for both NIR devices can be found in the electronic appendix in Tables S5 and S6 (ESI†). With the benchtop device, 95.0% and 80.0% accuracy were achieved for the two- and four-class models, respectively. For the NIR handheld device, the accuracies were 85.0% and 70.0%. Due to the total number of samples, the test datasets are only based on a smaller number of samples, which is a limitation of validation using test datasets. These accuracies were comparable to the validation using repeated cross validation which demonstrated that both validation methods lead to similar results. In the independent test datasets, the same confusions between the Greek and Spanish samples and Egyptian and German samples were found, which were already determined during the cross validation and have already been discussed. Due to the better comparability of the cross-validation, this was used for the class comparison.
3.5. Selection and interpretation of the relevant variables
In order to identify the variables relevant for the classification, an ANOVA was performed, and the results are reported in Tables S2 and S3 (ESI†). Fig. 4A and B show the averaged spectra with the respective variables marked that have a p-value of less than 0.05 and are therefore relevant for the classification of geographical origin.
 |
| Fig. 4 Mean NIR-spectra obtained by the benchtop FT-NIR device (A) and the handheld NIR device (B) with highlighted via ANOVA selected significant variables (orange). In addition, the spectra of the benchtop FT-NIR device (C) and the handheld NIR device (D) averaged by country are shown and the relevant area are expanded in detail. | |
The benchtop FT-NIR spectra (Fig. 4A) shows two relevant spectral regions between 7250 and 6500 cm−1, as well as between 5250 and 4250 cm−1. The fact that these spectral regions are indeed relevant for the differentiation of geographical regions is also confirmed by the spectra of the individual countries shown in Fig. 4C, since the averaged spectra of German and Egyptian samples show higher intensities in the range between 7250 and 6500 cm−1, while the spectra of Spanish and Greek samples show higher intensities in the range between 4750 and 4250 cm−1. Of these two relevant regions, the handheld device only detects the region around 7000 cm−1, which is also marked as relevant in Fig. 4C. The fact that no information about the spectral range between 5250 and 4250 cm−1 is obtained by this device is probably the reason for the lower accuracy of the differentiation of the geographical origins.
In order to examine the class differences in more detail, representative variables of the different datasets were analysed. Boxplots of those variables are shown in Fig. 5 for the data obtained by the benchtop device. The variable at 4340.82 cm−1 (Fig. 5A), which could be assigned to aliphatic hydrocarbons, e.g. from lipids, generally shows lower intensities for German samples and higher intensities for non-German samples. However, the distinction is not entirely accurate as German and non-German samples also show high and low intensities respectively. Considering the intensity values of the individual countries (Fig. 5D), it becomes clear that many Egyptian samples have low intensity values, just like the German samples. In addition, it is evident from this figure that the Spanish samples show slightly lower intensities than the Greek samples, but there is also a large overlap between the intensities of these two groups. The boxplot of this variable thus shows a relatively clear distinction between the two groups of samples from Germany/Egypt and Spain/Greece, but also overlaps within these groups. The same can be observed for the variable at 6902.04 cm−1 (Fig. 5F) which was assigned to proteins, as the former group shows comparatively low intensities and the latter group high intensities, and is also evident from the classification results in the previous sections and the results of the PCA in Section 3.2.
 |
| Fig. 5 Boxplots of the variables at 4340.82, 4881.62 and 6902.04 cm−1, which were selected for the differentiation of the different geographical origins of strawberries using the benchtop device. The respective intensities for the two-class (A–C) and four-class models (D–F) are shown. | |
For other foods, it has already been shown that the protein band is relevant for classification. For example, an accuracy of 90.0% was achieved in grain maize solely on the basis of the origin classification using the protein band.30 This is possible because samples from individual countries can differ significantly based on their protein and fat content. In the case of strawberries, it has already been shown that different protein concentrations are present for different cultivation methods, while the variety has only a limited influence. As the cultivation methods vary in different countries and the peak season takes place in different months, this could have an impact on the content of different proteins and fats. In addition, different NaCl contents of the soil can affect the amino acid metabolism. In strawberries, it has already been shown that increased NaCl stress is associated with a low protein content. The soil therefore has an influence on amino acid metabolism and protein profiles.31
For the analysis of the data obtained by the handheld device, boxplots of the two variables with the lowest p-values at 5938 cm−1 and 6734 cm−1 are shown in Fig. 6A–D. These variables, which could be assigned to proteins, also allow a comparatively clear distinction between Egyptian and German and Spanish and Greek samples. The variable a 5938 cm−1 (Fig. 6A) is characterized by relatively high values for the first group and relatively low values for the second group, while the variable at 6734 cm−1 (Fig. 6B) shows the opposite. Considering these variables, the comparatively successful separation of German and non-German samples using the handheld device, is comprehensible.
 |
| Fig. 6 Boxplots of the variables at 5938 and 6734 cm−1, which were selected for the differentiation of the different geographical origins of strawberries using the handheld device. The respective intensities for the two-class (A and B) and four-class models (C and D) are shown. | |
Hence since no clearer distinction between the different classes can be recognized based on the variables of the data obtained by the benchtop device, the better performance seems to be achieved mainly by a combination of different variables also including those from the spectral range that is not covered by the handheld device. However, it should be noted that a limitation of the dataset is that only 12 samples from Germany were available for 2021, so that a classification based on the harvest year cannot be completely ruled out. To minimize this influence, additional samples were taken from all four countries, including Germany, in the remaining three years. Another parameter that can influence the classification is the cultivation of organic strawberries. Only 37 strawberries from Spain that were produced for the German market are organic. In contrast, there were no organic samples from Egypt and Greece. The cultivation of organic food is of great importance in Germany and is therefore part of a standard market sample dataset. However, it cannot be ruled out that samples are categorized as German or Spanish due to their organic cultivation. However, the low risk of confusion between German and Spanish samples suggests that this effect does not exist.
The variety is only known in about half of the samples, with the most common varieties being Clery with 12, Asia with 7 and Fortuna with 6 samples. The difference in variety could also be another parameter that influences the classification. For example, the varieties Clery and Asia are only present in German samples, while Fortuna is present in Egyptian, Greek and Spanish samples. Nevertheless, the varieties are adapted to local climatic conditions and harvest times, so that it is not possible to obtain samples of the same varieties from all regions. The varieties are therefore part of the regional authenticity of the samples, but an influence of the different varieties on the classification cannot be excluded and can only be minimized by obtaining as many varieties as possible. Comparing the results of the origin classification with the results of other water-rich matrices, some similarities can be seen. For example, asparagus shows a similar tendency towards classification with 88.5% accuracy in 5 classes. This classification model is performing better than the 4-class model shown here with 80.5%, but asparagus is a more homogeneous matrix. Only a few varieties of asparagus are grown commercially and the cultivation methods are very similar between countries, which leads to a lower variance and therefore better classification.
Classification between Korean and non-Korean soybeans was also successful with a correctness of 95.9%. This model shows similar classification rates to the 91.9% classification rate of German and non-German strawberries shown here. Furthermore, the protein and fat bands of the NIR spectrum were recognized as relevant for the classification of soybeans. Freeze-drying was also used for less water-rich foods such as soybeans so that the relevant protein and fat bands were not obscured by the water bands. This shows that the processing of the samples, the results and detection of relevant wavelengths are consistent with previous studies. Other established methods for determining the origin of food, such as LC and GC coupled with mass spectrometry, have a higher resolution and therefore potentially distinguish better between countries.32,33 However, these methods are associated with greater technical effort, such as higher consumption of chemicals.34 For this reason, the trend towards chemical-free analysis such as FT-NIR is necessary in terms of sustainability. The NIR method developed here could be utilized as a rapid screening method in routine laboratories to check for suspicious samples. In addition, developed method could also be used for incoming goods inspection. It is particularly well suited for the application of a fast and environmentally friendly method.
Overall, the results show that for a rough differentiation between German and non-German strawberries, a handheld device measuring only a few variables is sufficient to perform a quick and initial analysis on site. However, for a more precise classification of the geographical origin of strawberries, a benchtop device with higher resolution and a wider spectral range should be used.
4. Conclusion
In this study, the classification of strawberries according to their geographical origin was analysed with NIR spectroscopy using a benchtop and a handheld device. The separation of German and non-German samples showed an accuracy of 91.7% and 84% for the benchtop and handheld device, respectively, while the differentiation between individual countries of origin showed classification accuracies of 80.5% and 71.8%, respectively. A more detailed analysis of the relevant variables showed that the better performance of the benchtop instrument was due to both higher resolution and a larger spectral range analysed. The handheld device offers the advantage of lower acquisition costs and a laboratory-independent use, while the benchtop NIR device proved to be superior for the classification of complex questions. However, based on our results, it seems possible to use the handheld device for a quick and rough on-site analysis to determine whether the strawberries originate from Germany or not.
Author contributions
Conceptualization, J. B., F. S. and M. F.; methodology, J. B. and F. S.; software, J. B. and F. S. Validation, J. B. and F. S.; formal analysis, J. B., F. S., K. B. and S. S.; investigation, J. B., F. S., and M. C.; resources, S. S. and M. F.; data curation, J. B. and F. S.; writing – original draft preparation, J. B. and F. S.; writing – review & editing, K. B., S. S., M. C., and M. F.; visualization, J. B. and F. S.; supervision, M. F.; project administration, M. F.; funding acquisition, M. F., S. S.
Data availability
The datasets of the NIR spectroscopy were created as part of the current study are available in the Zenodo repository at: https://doi.org/10.5281/zenodo.13331169.
Conflicts of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This IGF Project (01IF22909N) of the research association FEI, Godesberger Allee 125, 53175 Bonn, is/was funded as part of the program for promoting joint industrial research (IGF) by the Federal Ministry for Economic Affairs and Climate Protection based on a resolution of the German Bundestag. The authors thank Robin Dammann, Lina Cvancar, Nils Wax, Marie-Sophie Müller and Christian Ahlers for their help with sampling and helpful discussions.
Notes and references
- H. Lösel, J. Brockelt, F. Gärber, J. Teipel, T. Kuballa, S. Seifert and M. Fischer, Comparative Analysis of LC-ESI-IM-qToF-MS and FT-NIR Spectroscopy Approaches for the Authentication of Organic and Conventional Eggs, Metabolites, 2023, 13, 882 CrossRef.
- J. Müller-Maatsch and S. M. van Ruth, Handheld devices for food authentication and their applications: A review, Foods, 2021, 10, 2901 CrossRef PubMed.
- A. Giraudo, S. Grassi, F. Savorani, G. Gavoci, E. Casiraghi and F. Geobaldo, Determination of the geographical origin of green coffee beans using NIR spectroscopy and multivariate data analysis, Food Control, 2019, 99, 137–145 CrossRef CAS.
- B. Richter, M. Rurik, S. Gurk, O. Kohlbacher and M. Fischer, Food monitoring: Screening of the geographical origin of white asparagus using FT-NIR and machine learning, Food Control, 2019, 104, 318–325 CrossRef CAS.
- J. H. Lee, J. M. An, H. J. Kim, H. C. Shin, S. H. Hur and S. H. Lee, Rapid discrimination of the country origin of soybeans based on FT-NIR spectroscopy and data expansion, Food Analytical, Methods, 2022, 15, 3322–3333 Search PubMed.
- N. Shakiba, A. Gerdes, N. Holz, S. Wenck, R. Bachmann, T. Schneider, S. Seifert, M. Fischer and T. Hackl, Determination of the geographical origin of hazelnuts (Corylus avellana L.) by Near-Infrared spectroscopy (NIR) and a Low-Level Fusion with nuclear magnetic resonance (NMR), Microchem. J., 2022, 174, 107066 CrossRef CAS.
- M. Arndt, M. Rurik, A. Drees, K. Bigdowski, O. Kohlbacher and M. Fischer, Comparison of different sample preparation techniques for NIR screening and their influence on the geographical origin determination of almonds (Prunus dulcis MILL.), Food Control, 2020, 115, 107302 CrossRef CAS.
- H. Lösel, N. Shakiba, S. Wenck, P. Le Tan, M. Arndt, S. Seifert, T. Hackl and M. Fischer, Impact of Freeze-Drying on the determination of the geographical origin of almonds (Prunus dulcis Mill.) by near-infrared (NIR) spectroscopy, Food Analytical, Methods, 2022, 15, 2847–2857 Search PubMed.
- L. Wang, D.-W. Sun, H. Pu and J.-H. Cheng, Quality analysis, classification, and authentication of liquid foods by near-infrared spectroscopy: A review of recent research developments, Crit. Rev. Food Sci. Nutr., 2017, 57, 1524–1538 CrossRef CAS PubMed.
- A. Guelpa, F. Marini, A. Du Plessis, R. Slabbert and M. Manley, Verification of authenticity and fraud detection in South African honey using NIR spectroscopy, Food Control, 2017, 73, 1388–1396 CrossRef CAS.
- C. McVey, U. Gordon, S. A. Haughey and C. T. Elliott, Assessment of the analytical performance of three near-infrared spectroscopy instruments (benchtop, handheld and portable) through the investigation of coriander seed authenticity, Foods, 2021, 10, 956 CrossRef CAS.
- C. Kappacher, B. Trübenbacher, K. Losso, M. Rainer, G. K. Bonn and C. W. Huck, Portable vs. Benchtop NIR-sensor technology for classification and quality evaluation of black truffle, Molecules, 2022, 27, 589 CrossRef CAS.
- M. Creydt and M. Fischer, Food authentication: Truffle species classification by non-targeted lipidomics analyses using mass spectrometry assisted by ion mobility separation, Mol. Omics, 2022, 18, 616–626 RSC.
- K. Losso, H. Wörz, C. Kappacher, S. Huber, T. Jakschitz, M. Rainer and G. K. Bonn, Rapid quality control of black truffles using Direct Analysis in Real Time Mass Spectrometry and Hydrophilic Interaction Liquid Chromatography Mass Spectrometry, Food Chem., 2023, 403, 134418 CrossRef CAS.
- T. Segelke, S. Schelm, C. Ahlers and M. Fischer, Food authentication: Truffle (Tuber spp.) species differentiation by FT-NIR and chemometrics, Foods, 2020, 9, 922 CrossRef CAS PubMed.
- M. Hernández-Jiménez, I. Revilla, A. M. Vivar-Quintana, J. Grabska, K. B. Beć and C. W. Huck, Performance of benchtop and portable spectroscopy equipment for discriminating Iberian ham according to breed, Curr. Res. Food Sci., 2024, 100675 CrossRef.
-
Statistisches Bundesamt, Import von Erdbeeren, https://www-genesis.destatis.de/genesis/online, (accessed 31.03-2024) Search PubMed.
- H. Cen and Y. He, Theory and application of near infrared reflectance spectroscopy in determination of food quality, Trends Food Sci. Technol., 2007, 18, 72–83 CrossRef CAS.
- M. C. Pasikatan, J. L. Steele, C. K. Spillman and E. Haque, Near infrared reflectance spectroscopy for online particle size analysis of powders and ground materials, J. Near Infrared Spectrosc., 2001, 9, 153–164 CrossRef CAS.
- A. Drees, J. Brockelt, L. Cvancar and M. Fischer, Rapid determination of the shell content in cocoa products using FT-NIR spectroscopy and chemometrics, Talanta, 2023, 256, 124310 CrossRef CAS.
- H. Yan, M. de Gea Neves, I. Noda, G. M. Guedes, A. C. Silva Ferreira, F. Pfeifer, X. Chen and H. W. Siesler, Handheld near-infrared spectroscopy: State-of-the-art instrumentation and applications in material identification, food authentication, and environmental investigations, Chemosensors, 2023, 11, 272 CrossRef CAS.
- M. Arndt, A. Drees, C. Ahlers and M. Fischer, Determination of the geographical origin of walnuts (Juglans regia L.) using near-infrared spectroscopy and chemometrics, Foods, 2020, 9, 1860 CrossRef PubMed.
-
J. Workman and L. Weyer, Practical guide and spectral atlas for interpretive near-infrared, CRC, 2012 Search PubMed.
-
Deutscher Wetterdienst, Climate Table Athen, Attika, Greece, https://www.dwd.de/DWD/klima/beratung/ak/ak_167160_kt.pdf, (accessed 19 March 2024) Search PubMed.
-
Deutscher Wetterdienst, Climate Table of Huelva, Andalusia, Spain, https://www.dwd.de/DWD/klima/beratung/ak/ak_083830_kt.pdf, (accessed 19 March 2024) Search PubMed.
- D.-M. Ma, S. V. S. Gandra, R. Manoharlal, C. La Hovary and D.-Y. Xie, Untargeted metabolomics of Nicotiana tabacum grown in United States and India characterizes the association of plant metabolomes with natural climate and geography, Front. Plant Sci., 2019, 10, 470291 Search PubMed.
-
R. F. Wilson, Seed composition, Soybeans: improvement, production, and uses, 2004, vol. 16, pp. 621–677 Search PubMed.
- G. Hou, G. R. Ablett, K. P. Pauls and I. Rajcan, Environmental effects on fatty acid levels in soybean seed oil, J. Am. Oil Chem. Soc., 2006, 83, 759–763 CrossRef CAS.
- R. G. Upchurch, Fatty acid unsaturation, mobilization, and regulation in the response of plants to stress, Biotechnol. Lett., 2008, 30, 967–977 CrossRef CAS PubMed.
- D. Schütz, J. Riedl, E. Achten and M. Fischer, Fourier-transform near-infrared spectroscopy as a fast screening tool for the verification of the geographical origin of grain maize (Zea mays L.), Food Control, 2022, 136, 108892 CrossRef.
- A. J. Keutgen and E. Pawelzik, Quality and nutritional value of strawberry fruit under long term salt stress, Food Chem., 2008, 107, 1413–1420 CrossRef CAS.
- P. Zhong, X. Wei, X. Li, X. Wei, S. Wu, W. Huang, A. Koidis, Z. Xu and H. Lei, Untargeted metabolomics by liquid chromatography-mass spectrometry for food authentication: A review, Compr. Rev. Food Sci. Food Saf., 2022, 21, 2455–2488 CrossRef PubMed.
- G. Sammarco, D. Bardin, F. Quaini, C. Dall'Asta, J. Christmann, P. Weller and M. Suman, A geographical origin assessment of Italian hazelnuts: Gas chromatography-ion mobility spectrometry coupled with multivariate statistical analysis and data fusion approach, Food Res. Int., 2023, 171, 113085 CrossRef.
- M. Tobiszewski, Metrics for green analytical chemistry, Anal. Methods, 2016, 8, 2993–2999 RSC.
|
This journal is © The Royal Society of Chemistry 2025 |
Click here to see how this site uses Cookies. View our privacy policy here.