Open Access Article
Onkar Sarma
a and
Kavya Dashora*ab
aAgricultural-Nanotechnology Lab, Center for Rural Development and Technology, IIT Delhi, Delhi, 110016, India. E-mail: kdashora@rdat.iitd.ac.in
bYardi School of Artificial Intelligence (Yardi ScAI), IIT Delhi, India
First published on 2nd February 2026
Blending various types of teas with complementary biochemical profiles offers a promising path for enhancing the sensory quality of tea blends. Traditionally, blending is done by professional tea tasters, which makes it dependent on human sensitivity, time-consuming and difficult to scale. Thus, the prediction of tea's sensory quality and identification of its biochemical drivers by reducing the cost of experimentation and expediting the blending process could have tremendous techno-economic value. In this study, we curated a meta-dataset from our own experiments involving blends of four major varieties of Assam tea. Their key biochemical composition was analyzed, and sensory score was estimated using a lexicon-based descriptive technique with a semi-trained consumer-focused panel. Combining it with multiple machine learning models, we showed that while an increase in the (+)-C, TP, TR, and pH values led to a lower sensory score, higher protein/TP, TSS, organic acid, CAF, CAF/TP and TF/TR values resulted in an increased sensory score. Further, we adopted a game theory-based model agnostic interpretation technique called SHAP (Shapley Additive exPlanations) to identify features contributing to higher sensory scores and their relative significance. The key quality indicators were found to be the (+)-C, protein/TP, protein, TP, TSS, TR, citric acid, malic acid, and ascorbic acid contents. By integrating a consumer-centric evaluation with interpretable machine learning, we demonstrated how meta dataset, cutting-edge machine learning techniques, and model interpretability methods could be seamlessly integrated to reduce the number of experiments, minimize dependency on expert intuition, and enable automated quality assessment for developing superior tea blends.
Sustainability spotlightThis study addresses the critical need for objective and efficient sensory evaluation in the tea industry by analyzing key biochemical features alongside consumer-driven sensory scores from a semi-trained panel. By leveraging artificial intelligence, this study reveals how each biochemical feature influences quality characteristics, thereby enabling manufacturers to optimize tea blends for superior taste. Adopting a semi-trained consumer panel as an alternative to expert tasters significantly reduces costs and time while analyzing consumers' perspectives about the product at the production level. This sustainable advancement aligns with UN SDGs 9 (industry, innovation, and infrastructure), 12 (responsible consumption and production), and 8 (decent work and economic growth), promoting innovation, resource efficiency, and inclusive economic practices. |
Tea blending has long been a practice in the tea industry to develop products with superior sensory qualities. Tea blending is performed by mixing two or more varieties and grades of tea from different regions or batches to maintain its sensory quality as per consumer demand.6 Most studies have investigated the blending of black tea variants belonging to different grades. For instance, Xia fused images and spectral features to predict the sensory characteristics of different blends of black tea.7 Ling employed a digital blending strategy in which NIR spectra of different blends of black tea were used to predict the sensory score.8 Tie addressed the blending problem by developing hierarchical spatial clustering-based algorithms that achieved a universal, low-cost, and efficient blending program.9 Turgut compared three NIR equipment for calibrating biochemicals with spectral data for the sensory prediction of black tea.10 Recent studies, such as those by Chen and co-authors, have experimented with blending fresh tea leaves of three clonal varieties to manufacture black tea with improved flavor and aroma.11 Most of these studies have investigated a particular tea type, and there is a lack of reports on blends made from differently processed teas, such as green tea, white tea, oolong tea, and black tea. The underlying biochemical properties responsible for each tea type's contribution to a blend's sensory attributes have not been explored thoroughly. Moreover, previous studies that employed machine learning (ML) for sensory prediction primarily relied on trained sensory panels to generate sensory data. Although such expert panels offer consistency and domain knowledge, they may not accurately reflect the preferences of everyday consumers. In contrast, an alternative and more scalable approach would be to train consumers directly to form a semi-trained panel and analyze the sensory profile of tea infusions using the Check-All-That-Apply method. This Semi Trained-Check-All-That-Apply (ST-CATA) approach is important in the context of the tea industry, where market success is largely influenced by consumer preferences, acceptance, and purchase behaviors. Accurately predicting the sensory quality and identifying its biochemical drivers can thus offer significant techno-economic benefits by reducing the experimental costs and accelerating the development of superior tea blends. However, this remains a challenge due to the inherent subjectivity of sensory evaluation and the complex, non-linear relationship between biochemical composition and human perception. To address this and make the process more objective, replicable, and aligned with market needs, there is a pressing need for a cost-effective, consumer-inclusive methodology that enables the automated assessment of tea quality.
This study aims to explore the effects of incorporating diverse aroma and flavor profiles from differently processed teas into a product and to assess consumer preferences. Various blends of differently processed teas were prepared and analyzed for key biochemical features. Then, a lexicon-based descriptive sensory evaluation technique was used, employing a semi-trained panel to assess the blend's sensory score. Finally, the ML models were trained to predict the sensory grades of the tea blends. This study aims to address the following key questions:
• What are the key biochemical features and their order of relevance in influencing tea sensory quality?
• How to integrate interpretable ML with sensory data obtained from a semi-trained panel to objectivize tea sensory assessments?
• What is the effect of differently processed teas on the overall sensory quality of a tea blend?
This research will aid tea producers in choosing blend ingredients and offer a scientific framework for consumers to identify high-quality tea. Furthermore, in the context of tea research and development, understanding the connections between biochemical content and the sensory qualities of tea can aid in developing superior blends and optimizing processing strategies to elevate the overall quality of processed tea.
Four varieties of Assam tea (Camellia assamica), including green tea, white tea, oolong tea, and black tea (CTC), were procured from the Tea Auction Centre, Guwahati, Assam. These tea varieties included multiple Tocklai vegetative cultivars (developed at the Tocklai Tea Research Institute, Assam, India).
| Sample code | Green tea (gram) | White tea (gram) | Oolong tea (gram) | Black tea (gram) |
|---|---|---|---|---|
| 01 | 0.5 | 0.5 | 2 | 1 |
| 02 | 1.5 | 0.5 | 1.5 | 0.5 |
| 03 | 1 | 2 | 0.5 | 0.5 |
| 04 | 1.5 | 1.5 | 0.5 | 0.5 |
| 05 | 1 | 0.5 | 2 | 0.5 |
| 06 | 1 | 0.5 | 1 | 1.5 |
| 07 | 0.5 | 1 | 1 | 1.5 |
| 08 | 1 | 1.5 | 1 | 0.5 |
| 09 | 0.5 | 0.5 | 0.5 | 2.5 |
| 10 | 1.5 | 0.5 | 0.5 | 1.5 |
| 11 | 0.5 | 1 | 0.5 | 2 |
| 12 | 1.5 | 1 | 0.5 | 1 |
| 13 | 0.5 | 0.5 | 1.5 | 1.5 |
| 14 | 1 | 0.5 | 1.5 | 1 |
| 15 | 0.5 | 2 | 0.5 | 1 |
| 16 | 1 | 1 | 1 | 1 |
| 17 | 0.5 | 1.5 | 1.5 | 0.5 |
| 18 | 0.5 | 0.5 | 1 | 2 |
| 19 | 1 | 0.5 | 0.5 | 2 |
| 20 | 1 | 1.5 | 0.5 | 1 |
| 21 | 1 | 1 | 0.5 | 1.5 |
| 22 | 0.5 | 1 | 2 | 0.5 |
| 23 | 0.5 | 2 | 1 | 0.5 |
| 24 | 0.5 | 1.5 | 1 | 1 |
| 25 | 2 | 0.5 | 1 | 0.5 |
| 26 | 2 | 1 | 0.5 | 0.5 |
| 27 | 1.5 | 1 | 1 | 0.5 |
| 28 | 2 | 0.5 | 0.5 | 1 |
| 29 | 1 | 1 | 1.5 | 0.5 |
| 30 | 1.5 | 0.5 | 1 | 1 |
Samples were prepared for sensory analysis based on ISO standards for brewing.22,23 Each blend (0.4 g) was placed in disposable fiber-made tea bags and placed in porcelain cups maintained at 50 °C, and 100 mL of boiled water was added. The total sensory score was evaluated based on five sensory-evaluation factors, totaling a score of 100, using the formula of Xiong:24
| Sensory score = (tea appearance × 25%) + (infusion color × 10%) + (aroma × 25%) + (taste × 30%) + (solubility × 10%) |
Four ML models: Extreme gradient boosting (XGB), support vector machine (SVM), logistic regression (LR), and multilayer perceptron (MLP) were calibrated and compared for their strength in classifying the samples into categories obtained from K-medoids. The methodology involved in the modelling and interpretation of the results is depicted in Fig. 1.
![]() | ||
| Fig. 1 Flow diagram depicting the methodology of dataset building and modelling for tea quality grade prediction. | ||
XGB constructs an ensemble of trees one after another, correcting the node's errors at every level to develop an optimized gradient boosting network. This process is repeated continuously for XGB to be able to capture the intricate correlations between quality metrics and progressively improve predictions. LR is typically used for classification and works on a supervised learning approach. It uses a sigmoid function to map the output of a linear model to the probability of a specific category. SVM is suitable for performing qualitative classification problems by mapping data points on an optimal separating hyperplane in a high-dimensional space.33 MLP is a type of artificial neural network consisting of at least one hidden layer. MLP is suitable for deciphering complex, nonlinear relationships with the capacity to handle noisy and diverse data.
Hyperparameters for each model, such as the number of trees, learning rate and maximum depth of trees for XGB, kernel and penalty parameters for SVM, solver, penalty, and iteration number for LR, were optimized using Bayesian optimization, which iteratively assessed various parameter combinations. Five-fold cross-validation was employed to obtain the final performance metrics for model evaluation. The performance of each model was evaluated based on four aspects: accuracy, precision, recall, and F1 score. The model with the best performance was chosen for further analysis. The mathematical definitions of these performance metrics are as follows:
![]() | (1) |
![]() | (2) |
![]() | (3) |
The F1 score is the harmonic mean of the precision and recall metrics:
![]() | (4) |
The violin plots (Fig. 2) present the distribution pattern of L-theanine, protein, TSS, TP, C, CAF, total organic acid, pH, TF, and TR in both grades. The plot width shows the density of data along with a comparative demonstration of data distribution across both grades. The embedded box plot depicts minimum, first quartile (Q1), mean, third quartile (Q3), and maximum values. Points appearing outside the box were designated as “outliers”.
There was a difference in the distribution pattern of L-theanine content among the low-grade and high-grade samples (Fig. 2a). Although the mean content of the two groups was similar, the high-grade samples exhibited a wider and higher distribution range. In high-grade samples, L-theanine was in the interquartile range (IQR) of 1.64–2.26 mg g−1 dry weight (DW) and an average of 1.96 mg g−1 DW. L-Theanine was in the IQR of 1.77–2.16 mg g−1 DW for the low-grade samples with an average of 1.95 mg g−1 DW. A similar range of 2–5 mg g−1 of L-theanine content is noted in commercial tea, and it further varies based on the growing location and processing conditions of the tea.34,35
The protein and TSS contents in the high-grade samples were higher than those of the low-grade samples. The IQR for protein was 0.07–0.091 mg g−1 DW with an average content of 0.086 mg g−1 DW in high-grade samples and 0.066–0.079 mg g−1 DW IQR with an average content of 0.072 mg g−1 DW in low-grade samples (Fig. 2b). TSS was in the IQR of 2.172–2.877 mg g−1 DW with an average of 2.574 mg g−1 DW in high-grade samples and IQR of 1.867–2.709 mg g−1 DW with an average TSS content of 2.246 mg g−1 DW in low-grade samples (Fig. 2c). A higher content of L-theanine, TSS, and protein indicates a fresh, mellow, and sweet taste, which improves the sensory quality of tea. Studies have reported high-grade tea containing more soluble sugar and L-theanine content, which is in line with this study's findings.4,25 In our study, high-grade samples (samples 21, 23, and 24), which contained higher proportions of white tea, showed L-theanine levels of up to 2–2.7 mg g−1. A higher amount of L-theanine in the high-grade samples must be contributed by amino acid-rich white tea. Similarly, higher TSS and protein were able to counterbalance the aversive taste and ultimately improve the taste score to make the high-grade sample.
There was an opposite trend observed for TP and C content among the samples. TP was in the IQR of 36.3–44.5 mg GAE per g DW (average of 39.9 mg GAE per g) in high-grade samples and 42.4–45.9 mg GAE per g (average of 45 mg GAE per g) in low-grade samples (Fig. 2d). C varied with an IQR of 0.17–0.26% (average of 0.23%) in the low-grade sample and an IQR of 0.12–0.18% with an average of 0.16% in the high-grade sample (Fig. 2e). The astringent and bitter taste of green tea generally comes from its higher polyphenol and C content. Therefore, the higher TP and C in the lower-grade samples were due to the higher proportions of green tea compared to oolong and black tea. Nonetheless, the distribution of C in high-grade samples is not normal and suggests the potential contribution of a higher C quantity by white tea. CAF in low-grade samples was in the IQR of 6.3–8.7% (average of 7.5%) and 6.6–7.8% (average of 7.3%) in high-grade samples. There was not much variation in the CAF content among both the grades (Fig. 2f). CAF content mainly depends on plant maturity and physiological processes. Therefore, this feature is not affected by the blending process and maintains the ideal range for the tea to have favorable sensory characteristics.
The contents of citric acid, malic acid, ascorbic acid, oxalic acid, gallic acid, and succinic acid were summed to calculate the total organic acid content (Fig. 2g). For the lower grade, total organic acid was in the IQR of 15.3–18.2 mg g−1 with an average of 16.4 mg g−1. For high grade, the IQR was 20–24.9 mg g−1 with a mean quantity of 21.5 mg g−1 DW. The pH was in an IQR of 4.73–4.93 (average of 4.86) for low-grade samples and an IQR of 4.6–4.85 (average of 4.72) for high-grade samples (Fig. 2h). The lower pH in high-grade samples was attributed to their higher organic acid content. Moreover, the pH of tea infusions depends on the proton concentration of water, which directly measures the sour intensity of the tea infusion.26 The average quantity of organic acid was present in the order of succinic acid (4.12 mg g−1) > malic acid (3.87 mg g−1) > citric acid (3.53 mg g−1) > oxalic acid (2.91 mg g−1) > ascorbic acid (2.54 mg g−1) > gallic acid (2.35 mg g−1). A similar trend of high succinic acid is found in tea, which provides the sour and light umami note, and citric, oxalic, and malic acids provide sour and gentle astringency to tea infusion.26
TF was in the IQR of 0.15–0.23% (average of 0.19%) for low-grade samples and in the IQR of 0.13–0.33% (average 0.23%) for high-grade samples (Fig. 2i). In an opposing trend, TR was in the IQR of 0.9–1.8% (average of 1.3%) for low-grade samples and an IQR of 0.5–1.6% (average of 1.1%) for high-grade samples (Fig. 2j). Depending on processing conditions, TF ranges from 0.5% to 2% of DW and TR ranges from 6% to 18% DW.27 Contrastingly, TR content was very low in the sample blends in this study.
![]() | ||
| Fig. 3 Comparative demonstration of the performance parameters of the four ML models. The parameters are accuracy, the weighted average of precision, recall score, and F1 score. | ||
The performance of the models is summarized in Fig. 4 by the confusion matrices. Confusion matrices offer comprehensive insight into the performance of a classification task by comparing the model's predictions with actual values. They reveal whether a model is struggling with a particular class and how often it confuses one class with another. The predicted categories are denoted by columns and the actual categories by rows, as shown in Fig. 4. If a simple 80
:
20 train-test split is used, and a single confusion matrix is generated based on the test data, the matrix heavily depends on the specific test data chosen. This dependency introduces high variance, which can lead to an unreliable estimate of the model's overall performance. To mitigate this issue, a 5-fold cross-validation approach was employed. This produced 5 confusion matrices, which were then averaged and normalized to create a final confusion matrix. This approach reduces the variance that can arise from using a single test set, offering a more accurate reflection of the model's performance.
The SVM model closely mirrored XGB in terms of true positive and true negative, implying their efficiency in correctly predicting both the sample grades. MLP exhibited the highest false positive and false negative rates. The predictive performance for the models for lower-grade samples is higher than for high-grade samples. This indicates the noise and high subjective variations among sample scores, especially in the high-grade category. Moreover, high-grade samples might depend more on other features such as ester-to-non-ester catechin varieties, volatile organic compounds and flavonol glycosides. Inclusion of these features further improves the true positive or predictive performance for high grades. The reason for XGB's superior performance lies in its iterative process, which corrects its node's error at every level. SVM has also been effective in handling this dataset, managing noisy patterns, and identifying the hyperplane that maximizes the margin between these two classes. XGB was further considered for feature interpretation in this study.
The SHAP value in Fig. 5 indicates that protein/TP, TSS, citric acid, ascorbic acid, oxalic acid, gallic acid, CAF, CAF/TP, and TF/TR were positively related to the model prediction of high grade. An elevated level of these features, denoted by red color, corresponds to a high-grade sample, as evidenced by a positive SHAP value. Protein/TP ratio is an important feature that demonstrated a positive relation with tea sensory score, as depicted by the SHAP summary plot (Fig. 5). A higher protein/TP means more protein content, which can eventually bind with polyphenols to alter their structure and influence how they interact with taste receptors.36 The mode of interaction between protein and polyphenol depends on several factors, such as temperature, their structure, and, most importantly, the protein/TP ratio. Polyphenols, such as EGCG, saturate the interaction sites in proline-rich proteins by binding when the protein/TP ratio is low. Then, polyphenols bridge the saturated soluble protein molecules to form a colloid; finally, haze formation takes place to negatively influence tea appearance.37 It is important to note that these interactions are highly dependent on several abiotic factors, including tea brewing time, infusion temperature, pH, and ionic strength, whose interactive effect makes it a complex and dynamic system requiring in-depth exploration to explain their influence on tea sensory phenomena.4,38 TSS is a vital component that adds sweetness to tea infusion.39 Moreover, a variety of soluble sugars continuously interact and form complexes with other biochemicals to develop the unique taste of tea. This is the reason why TSS was observed to have a positive relation with sensory scores in our model.
Organic acids showed a positive association with the sensory quality of tea. Succinic acid and gallic acid have been reported as umami-enhancing constituents in tea liquor, where they can intensify the umami perception of amino acids.40 In addition, citric, ascorbic, malic, and succinic acids act as natural antioxidants that can decrease the pH of the infusion and limit the formation of hydrogen peroxide, thereby contributing to an overall improvement in the taste profile of the tea liquor.41 The extent of influence of different organic acids on the sensory of tea blends was citric acid > malic acid > ascorbic acid > oxalic acid > gallic acid > succinic acid.
CAF is an alkaloid that imparts bitterness and was found to improve the sensory quality of tea blends. The fact that white tea contains more CAF42 supports our observation of a positive correlation between CAF and high-grade samples that had higher proportions of white tea. CAF and TP are among the principal metabolites that form the distinct sensory characteristics of tea. Their ideal ratio is crucial to a balanced note of both bitterness and astringency in high-grade tea. Caffeine forms complexes with polyphenols through hydrogen bonding. This interaction prevents polyphenols from binding to salivary proteins, ultimately enhancing the overall taste by improving umami flavors and reducing bitterness.43 Given that green tea contains more TP and black tea has more CAF, adequate proportions of both green and black tea can improve sensory quality by contributing a balanced CAF/TP ratio in a blend. The relationships among TF, TR, their ratio, and tea sensory quality are well established. TF influences not only the appearance and taste of tea but is also a critical determinant of market value.27 CAF in combination with TF contributes to the characteristic briskness of tea liquor.44 The samples with high sensory scores had a positive relationship with the TF/TR ratio, which is in line with several prior findings.27,31,45
The SHAP summary plot depicts C, TP, TR, and pH as negatively related features, and C as the most important feature in the classification of tea grades. This is because C is one of the primary compounds in tea that contributes to its intense and aversive taste. A high-grade sample should have a balanced C to amino acid ratio so that the unpleasant taste from C can be balanced by umami or sweet notes.30 It is also clear from Fig. 5 that higher TP content corresponds to a negative SHAP value, i.e. low-grade sample. TP had a prominent effect on model output, and Shapley interprets this feature as having a depleting effect on the sensory quality of tea blends. TR, as a highly polymerized pigment, is primarily responsible for the depth and intensity of liquor color. The total TR content was found to have an opposing relationship with sensory quality, which is in accord with the findings of Ngure.46 In addition, the high-grade sample had a lower pH compared to the low-grade ones due to their higher organic acid content.
The relation of protein, malic acid, succinic acid, L-theanine, TF, and TP/theanine with sensory was not well-defined in the SHAP summary plot. These features demonstrated low and high values along both positive and negative directions along the X-axis. Generally, protein is related to the good sensory characteristics of tea infusion. However, high protein can negatively affect tea's appearance at high temperatures.47 This could be attributed to the low feature value depicted in Fig. 5 at the extreme right. Higher contents of citric acid, ascorbic acid, oxalic acid, and gallic acid were observed in the higher grade sample. However, malic acid and succinic acid demonstrated a complex relationship with a mix of high and low feature values corresponding to low grade (Fig. 5). The concentration of malic acid increases as tea freshness decreases.48 This explains the high value of this feature associated with the model prediction of low grade due to a decrease in freshness. Additionally, a study has reported a decrease in malic acid levels by 85.8% and an increase in succinic acid content by 8.42-fold after fermentation.49 Hence, blends that involve black and oolong teas tend to have low malic acid but higher succinic acid content. Usually, succinic acid improves tea taste by imparting sourness and enhancing umami notes; however, different proportions of tea varieties contribute to varying levels of succinic acid, with no direct relationship to the sensory quality of tea blends.
L-Theanine accounts for 40–70% of the total free amino acids in tea and is a primary contributor to its freshness and umami taste.50 Besides flavor, L-theanine acts as a crucial precursor for aroma formation during tea processing, making it a key quality indicator for premium teas.51 Minimal variations were found in the L-theanine content among the tea blends, which suggests that tea processing conditions have little impact on this metabolite. However, factors such as growing location and conditions could affect the synthesis and accumulation of L-theanine in tea leaves.50 Such confounding factors and the limited variation in L-theanine content across tea blends make it difficult for the model to clearly visualize its influence in the SHAP summary plot.
TF content did not emerge as a significant predictor of blend quality. TF is a reddish-yellow and bright red pigment, which mainly contributes to liquor strength, brightness, and the characteristic golden ring.44 Different proportions of TF fractions (for example, galloyl TF) with different astringency levels are believed to form the basis of variation in the perception of astringency and umami notes among tea varieties.52 In this study, high TF feature values for blends predicted as high grade most likely reflect their richer appearance and higher brightness although TF simultaneously hampered taste due to its low astringent threshold.31
High amino acid content and a lower TP level correspond to less bitterness and therefore a favorable taste although the relationship is highly non-linear. TP to theanine ratio has been linked to a high sensory score with fresh, soft, and more brisk infusion characteristics. Nonetheless, the TP/theanine ratio demonstrated a mixed nature with sensory quality in this study. This shows that the proportions of TP and L-theanine do not directly influence the sensory quality of tea. There can be other interacting features, such as acidic glycoproteins; flavonols, such as kaempferol, myricetin, and quercetin; alkaloids, such as theobromine and theophylline; and volatile compounds, that were not considered in this study. In general, numerous studies have correlated a low TP/theanine ratio to favorable tea infusion characteristics, such as freshness and mellowness.4,29,30 Besides being a taste indicator of tea infusion, TP/theanine content is also a crucial parameter in judging tea variety.53 Therefore, the TP/theanine ratio can be used as an indicator of the proportions of tea varieties in a blend.
SHAP can reveal only the extent to which each feature contributes to the model's predictions. However, this may not always clarify the true causes of specific outcomes. For instance, if a model misinterprets the effect of TP on tea sensory attributes due to confounding variables, like TSS (given that TP and TSS are complex positively influenced, while TP has a negative relation with sensory scores), a SHAP summary plot might misleadingly suggest that higher TP levels are linked to better sensory scores. This finding contradicts the results of experimental studies. Therefore, it is essential to combine SHAP's interpretations with experimental knowledge because SHAP alone is not suitable for identifying the actual causes behind specific events.
Further elaboration on the relation between biochemical features and tea grade has been done by the SHAP dependence plot (Fig. 6) with the top nine features. The reason for focusing only on the top-ranked nine features is to gain clear insight into the model behavior. It also helps center the narratives around the features that have the highest relevance to tea sensory and avoids less relevant details. The SHAP dependence plot displays feature values on the X-axis and their corresponding SHAP values on the Y-axis for various data points. This approach enables the interpretation of the importance of a feature and its interactions with other features as their values change. This technique captures the actual scenario of how changes in the feature value are related to the model's prediction. The SHAP dependence plot visualizes how a feature affects the outcome and how this effect can change depending on the context and interactions with other features. The scatter plots were color coded as red for positive outcome, i.e. high-grade classification and blue for negative outcome i.e. low-grade classification. This highlights the influence of a particular feature on the SHAP value of a target outcome. The red trendline depicting data fit is based on third order polynomial regression. The density and sparseness of the plotted histograms represent the accuracy of this analysis, with dense areas depicting a more accurate prediction. Histograms of the X-axis and Y-axis are on the top and right sides of the diagram, respectively.
Tea catechin showed a complex relation with sensory score as confirmed by the SHAP dependence plot (Fig. 6a). Non-ester type catechins, such as +(−) C, contribute less bitterness and improve tea flavor compared to ester type catechins, which are abundant in green tea.30 The vertical spread in the values of C in the dependence plot suggested an interactive effect with polysaccharide, protein and other biochemicals in tea. Furthermore, an increase in C content after 0.8 (Fig. 6a) indicated an interesting observation, which was not clear from the SHAP summary plot (Fig. 5). This could be attributed to several reasons, such as a high non-ester-to-ester catechin ratio, tea particle size, and high water extract. High-grade samples in this study had a low ratio of green to white/black tea varieties, which implies a balanced ratio of esterified to non-esterified catechin, and therefore a higher sensory score with a pleasant aftertaste. Studies have suggested that tea processing conditions, such as hot air drying and drum rolling at low temperatures, induce the isomerization of EGC, EC, ECG, and gallic acid by polyphenol oxidase and other hydrolases to convert cis-catechins to trans-catechins to ultimately form CG, C, and GCG, resulting in a lower bitterness index.54 This could be a reason for the high-grade sample to have high C and a lower esterified-catechin to simple-catechin ratio, which promotes its mellow and brisk taste. In a recent attempt, EGC content in tea blends was found to be increased with a smaller particle size (100–120 µm) although it induced less bitterness compared to the whole leaf tea.55 In our study, blending was done after grinding and sieving the tea samples through a mesh size of 150 µm. These particles inside the tea bag rapidly release C, improving its content without compromising its taste. The rise in C after 0.8 (Fig. 6a) can also be attributed to its rapid release along with other water-soluble carbohydrates and theanine from smaller-sized particles. The resultant water extract could have synergistically helped mask bitterness and aversive taste from the total catechin.56
High protein/TP was related to a higher sensory index. Dependence plot shows a slight decrease throughout 0.1–0.25, followed by a sharp rise in sensory score along with protein/TP value, as depicted in Fig. 6b. The ratio of protein to polyphenol in tea dictates the nature of the interaction between these two compounds and their subsequent effects on sensory properties. Studies have suggested that polyphenols form bonds with proteins through multisite and multidentate interactions, resulting in less TP content.57 EGCG and ECG are the main tea polyphenols that induce astringency by reducing the lubrication of salivary proteins by interacting with them.36 However, externally added proteins can alter such polyphenolic properties to reduce bitterness and astringency in tea. The concentrations of EGCG and ECG were found to be reduced by 70–80% in the presence of proteins, such as casein and albumin.57 Therefore, a higher ratio of protein to TP in tea overpowers the puckering sensation of polyphenols and improves the overall sensory quality, as depicted in Fig. 6b. However, excess protein in tea can lead to cream formation and negatively impact the appearance of tea at higher temperatures.47
Protein also exhibited a positive relation with tea sensory, except for a slight fall across the 0.2–0.4 range (Fig. 6c). Proteins containing L-amino acid residues, such as Val–Phe and Val–Gly–Val, impart umami and sweet taste.58 The kokumi peptides contain γ-glutamyl peptides, which are known to intensify sweet and umami taste.59 The umami taste of processed tea is mainly attributed to the presence of pyroglutamic acid, pyroglutamyl peptides, and some other peptides, particularly aspartic acid, serine, threonine, and α-glutamyl-di and tri peptides.60 Besides taste improvement, proline and hydroxyproline in tea also participate in Stecker degradation to form pyrrole, which provides tea with a good flavor and aroma.61 Studies have also mentioned polyphenol–protein–polysaccharide interactions that can significantly influence the sensory, functional, and nutritional properties of the food system.62
Flavonoids represent the major polyphenol class that contributes to the sensory quality of tea. Flavonoids can be subclassed into several categories, of which flavanols are primarily catechins.3 Catechins form the bulk of the TP in tea and provide the characteristic astringency and bitterness of tea infusions. The negative effect of TP on the sensory quality of tea blends is apparent from the SHAP summary plot (Fig. 5) and dependence plot (Fig. 6d). The vertical spread of SHAP values throughout 0.4–0.6 shown in Fig. 6d indicates model prediction of high grade and low grade at similar TP values. This underscores the complex relationship that tea sensory has with TP content. TP alone negatively affects tea sensory, but the complex formation between TP and TSS or polysaccharide-polyphenol conjugates helps develop a more rounded, mellow, and thicker taste profile by masking sharp and harsh taste.63 Beyond catechins, TF and TR are another class of polyphenols that affect the appearance and taste of tea blends. Tea polyphenols also include phenolic acids, hydrolysable tannins, flavonols and their glycosyl derivatives, which impart fresh sour and bitter notes, colour stability, and general strength to the tea liquor.3
TSSs, such as monosaccharides and disaccharides, are the key compounds that are closely related to sweetness and therefore correlate to a higher sensory score (Fig. 6e). TSSs also reduce bitterness and astringency perception of catechins, alkaloids, and flavonol glycosides by increasing the best estimate threshold for those compounds.64 Tea leaves naturally contain glucose, fructose, maltose, oligofructose and other soluble sugars whose content and composition change during tea processing.5 Therefore, differently processed tea can contribute to varying TSS levels to enhance the overall sweetness and mellow flavor of tea blends.
TR showed an inverse relation with sensory quality (Fig. 6f). TR is responsible for the mouthfeel of tea liquor, but a high content of TR can decrease the brightness and taste of tea infusion.27 The percentage of TR in tea blends generally varied from 0.5% to 1.8% depending on the ratio of tea varieties, pH, and drying temperatures, with a stronger dependence on the conditions of fermentation time and temperature of black and oolong tea in blends. TR mainly consists of the high-molecular-weight oxidation products of catechins, which are oligomers and polymers. Beyond that, the degradation products of theacitrin-like intermediate compounds may also contribute to the color of tea infusion.31 It has been reported that both TR and TF are positively related to the aftertaste of astringency, while TR is negatively related to bitterness.65 On cooling, the interaction between TF, TR, and CAF may form cream, thereby causing discoloration, precipitation, and loss of tea liquor stability, which seriously affects visual appeal, flavor, and color.31 This further substantiates the findings of this study: TR has a negative relation with sensory quality.
Citric acid (Fig. 6g) and ascorbic acid (Fig. 6i) exhibited a positive association with the sensory quality of the tea blends. Malic acid also displayed a direct link with the SHAP score but slightly fell after 0.5 (Fig. 6h). Citric acid and ascorbic acid are established contributors to the sour taste of tea infusions. Moderate levels of sour compounds can enhance taste fullness, while excessive concentrations may have a negative effect on the perceived quality.48 The SHAP analysis also showed a generally positive contribution of organic acids to sensory quality. Although malic and succinic acids had a mixed effect in the SHAP summary plot (Fig. 5), some studies have mentioned that succinic acid, along with gallic acid, can enhance the umami taste contributed by amino acids.40 Additionally, citric, malic, ascorbic, and oxalic acids have been documented to positively contribute to taste while being negatively associated with turbidity and cream formation in tea liquor.66 The citric acid curve displayed a plateau across the 0.1–0.4 range (Fig. 6g), suggesting potential interactive effects with other biochemical constituents. Up to a certain concentration, citric and malic acids interact to enhance the perceived sweetness of sugars, and sweet components suppress the initial perception of sourness from organic acids and reduce sourness sensitivity.67 This complex interplay between organic acids and soluble sugars can be used as an effective strategy for optimizing the sensory characteristics of tea blends. It is important to note that the highly spread and deviated (in the y direction) SHAP values for all features suggest that these biochemical compounds do not independently influence sensory characteristics. One possible explanation for this is the varying levels of these metabolites found in the different tea varieties used in the blends. These varieties differ in terms of processing methods, raw materials, and biochemical composition, which allows blends to have the same C, protein, or TSS levels but different sensory scores. This observation also suggests probable noise and instrumental errors during data acquisition. Furthermore, the TeaBioSens dataset is not comprehensive, and the narrow gap in sensory scores between the two grades may have restricted the ability of the XGBoost model to accurately capture the true relationship.
Our study specifically focuses on tea blends using four varieties of Assam tea, but our methodology of combining experimental data with interpretable ML can be easily adapted to other tea blends. The underlying framework can capture complex, non-linear interactions between biochemical content and sensory scores, which are similar across different tea blends. However, it depends on the availability of sufficient experimental data, which is essential to train ML models reliably. This is why we emphasize the need for community datasets to enable the generalization of predictive models across any tea blends.
The XGB model successfully captured the multivariate relationships between experimental biochemical features, calculated features, and the sensory scores of tea blends. The XGB model was selected for its superior performance metrics, while the relative importance of input features was determined by SHAP, a game-theoretic methodology that provides interpretability for any ML model output. The top nine important features influencing the sensory quality were C, protein/TP, protein, TP, TSS, TR, citric acid, and ascorbic acid contents, with all having higher and proportionally increasing impacts on sensory scores except C, TP, and TR.
A limitation of the TeaBioSens dataset is its small sample size and non-inclusion of failed experiments and certain features, such as volatile organic compounds, catechin subtypes, and the ratio of esterified to non-esterified catechin. These might contribute to the challenge of obtaining conclusive insights into the effects of features, like succinic acid, L-theanine, TF, and TP/theanine, on sensory scores through SHAP analysis. Presently, the dataset has been supplemented with data from our experimental findings. Further extension of the dataset by including factors such as harvesting season, processing parameters and other secondary metabolites can shed more light on the feature's complex role in tea sensory modulation. This article encourages industry professionals and researchers to evaluate the effectiveness of TeaBioSens and add more features to this public dataset to further strengthen and generalize the model predictability for other tea blends. Considering the importance of consumer preferences and acceptance in the tea market, engaging a semi-trained consumer panel using the Check-All-That-Apply method allows for practical data collection from the consumer's perspective. Inclusion of more volunteers in ST-CATA analysis will overcome the limitations of this small dataset, allow ML models to better learn the subjective variations in sensory scores, and further improve their predictive performance. This method eliminates the need for professional tea tasters to evaluate every tea grade and batch, significantly reducing costs for the tea industry.
Supplementary information (SI): “Tea Biochemical & Sensory Dataset (TeaBioSens)”, is a meticulously curated compilation of data including important biochemical features that influences tea sensory quality. This dataset consists thirty different tea blends, made from four major processed tea variety, their biochemical content, and their sensory score obtained from a semi-trained panel using the lexicon based quantitative descriptive technique. TeaBioSens comprises a total of 600 data points. Features such as TSS, protein, TP, CAF, (+)-C, TF, TR, pH, citric acid, malic acid, ascorbic acid, oxalic acid, gallic acid, succinic acid, and L-theanine were directly estimated from experiments, and ratio of TP/theanine, TF/TR, protein/TP, and CAF/TP were calculated. See DOI: https://doi.org/10.1039/d5fb00580a.
| This journal is © The Royal Society of Chemistry 2026 |