Open Access Article
Raj
Singh
a,
C.
Nickhil
*a,
Konga
Upendar
b,
Poonam
Mishra
a and
Sankar
Chandra Deka
a
aDepartment of Food Engineering and Technology, School of Engineering, Tezpur University, Napaam, Assam, India. E-mail: nickhil@tezu.ernet.in
bSmart Farm Machinery, Centurion University, Visakhapatnam, Andhra Pradesh, India
First published on 8th October 2025
The precise prediction of fruit maturity is essential for determining the optimal harvest time. It helps to reduce postharvest losses and maintain consistent fruit quality for consumers. Traditional methods for assessing maturity depend largely on manual inspection. This process is subjective, time-consuming, and prone to human error. Deep learning approaches, particularly convolutional neural networks (CNNs), offer a promising alternative by automating classification with high precision and consistency. This research seeks to identify the most effective deep-learning algorithms for predicting the maturity of mandarin oranges. In this study, the performance of four convolutional neural network architectures (EfficientNet-B0, ResNet50, VGG16, and a Custom CNN) was investigated for the classification of mandarin oranges based on their maturity levels: unripe, ripe, and overripe. The primary dataset comprised 1095 images, with each category containing 365 images. The deep learning models achieved the best accuracy rates of 98% for both EfficientNet-B0 and ResNet50, 83% for VGG16, and an impressive 99% for the Custom CNN, considering primary images. By comparing these models on a balanced dataset, this work offers a practical guide for researchers and practitioners on selecting models for assessing fruit maturity. Notably, EfficientNet-B0, ResNet50, and the Custom CNN exhibited significantly higher success rates compared to VGG16 and existing models, making them particularly recommendable for the development of an efficient automated system for harvesting and sorting mandarin oranges in the near future. The results aim to identify potential applications for improving agricultural practices, quality assessment, and overall efficiency in the food industry.
Sustainability spotlightThis research introduces a deep learning-based classification model for determining the maturity stages of mandarin oranges, enabling accurate, real-time, and non-destructive assessment. By enhancing harvest timing and reducing post-harvest losses, the approach supports resource-efficient practices, minimizes food waste, and contributes to a more sustainable and data-driven citrus supply chain. |
Nevertheless, the peel constitutes a substantial portion, approximately 40% to 50% of the wet fruit mass, representing a promising reservoir of bioactive elements, such as ascorbic acid, carotenoids, phenolic compounds, and pectins, with flavonoids being particularly concentrated. Surprisingly, the peel surpasses the juice in vitamin C content, as indicated by the USDA National Nutrient Database, and displays remarkable antioxidant properties compared to other fruit parts.7 Despite citrus peels often being treated as agricultural waste, they prove to be a potential source of valuable secondary plant metabolites and essential oils.8 Beyond its economic value, the Khasi mandarin holds significance due to its nutritional and medicinal contributions to both human and domestic animal health. Given the nutritional importance of citrus fruits in the human diet, efforts have been made to determine the optimal fruit maturity for enhanced nutritional value, considering factors such as internal changes in fruit flesh and external coloration of the peel during development, growth, and maturity.9,10
The mandarin orange is characterized by its high fragility and a brief shelf life of 1–2 weeks at room temperature, making postharvest management challenging and resulting in substantial losses for farmers and the overall economy.11 Effective temperature control is crucial during postharvest processes to preserve the overall quality of fresh citrus fruits, playing a pivotal role in their postharvest performance.12 Inappropriate harvesting maturity leads to physiological disorders during storage, negatively impacting their shelf life and quality. Accurate determination of the fruit's maturity stage at harvest is essential for minimizing postharvest losses.13,14 Achieving the proper maturity stage is critical for optimal harvesting and shelf-life preservation. The international market underscores the significance of both external and internal quality standards for Khasi mandarin fruits.11 Fruits are typically harvested when they reach harvestable maturity, a period determined through various computational and physical methods. This maturity stage is specific to each variety, with a fixed duration from full bloom to harvesting, allowing for the establishment of calendar dates for fruit plucking in orchards. While color change is a visible indicator of maturity, environmental factors such as temperature, relative humidity, and sunlight can influence the ripening timeline. Meeting market demands often necessitates delayed maturity. The final and crucial phase of fruit development is its maturity stage, characterized by heightened biological activity, intense metabolic processes, and cellular changes that manifest in alterations to texture, color, aroma, and flavor. Notably, the determination of commercial maturity indices in citrus fruits proves highly variable, contingent on factors such as the cultivation region, market demand, and specific varieties.15 Ensuring the fruit reaches the appropriate harvestable maturity extends its storage life and enhances postharvest storage quality.16
Deep learning models have emerged as a transformative tool in the field of image recognition. Their ability to autonomously learn hierarchical features from extensive datasets positions them as formidable candidates for complex visual classification tasks, particularly in precisely distinguishing different ripeness stages of fruits.17–19,44 By training the model on a diverse set of mandarin images representing various maturity levels, the deep learning system can generalize patterns and make accurate predictions, overcoming the limitations inherent in human-dependent sorting methods. The integration of deep learning into agricultural practices not only streamlines the sorting process but also addresses scalability challenges. As the demand for fruits continues to increase, automated systems based on deep learning can handle large volumes, ensuring a consistent and efficient classification process. This not only benefits farmers and producers but also contributes to minimizing post-harvest losses and enhancing overall supply chain management.20,21Fig. 1, shows the foundational framework of the CNN model designed for fruit maturation classification. In this model, every input image undergoes processing through the first, second, and third fully connected layers before ultimately reaching the output layer.
Several studies have examined machine learning and deep learning techniques for fruit classification and maturity detection (Table 1). Early methods relied on traditional image processing and feature extraction techniques, such as color, texture, and shape analysis. However, these methods often struggled with varying lighting conditions and background differences. More recent research has introduced deep learning methods, especially CNNs, to improve accuracy and reliability. Nonetheless, many existing studies either concentrate on single model architectures, use smaller or less varied datasets, or fail to assess the real-world effects of classification errors in agricultural settings.
| Crop/Fruit | Method/Model | Accuracy | Limitations | References |
|---|---|---|---|---|
| Cherry | CNN model | 99.40% | Limited generalization | 22 |
| Multi-class fruit detection | R–CNN model | 95.00% | Sensitive to background | 23 |
| Citrus fruit | CNN model | 94.55% | Lacked augmentation | 24 |
| Papaya | KNN with HOG feature | 99.50% | Small dataset | 25 |
| Banana | EfficientNet model | 98.60% | Limited generalization | 26 |
Many studies have explored image processing and deep learning for orange fruit maturity stage detection. The successful implementation of a deep learning-based system for orange maturity classification holds profound significance. It promises to streamline the supply chain, reduce labor costs and human errors, and contribute to sustainable agriculture practices.27,28 Enhanced accuracy in maturity assessment enables precise harvesting, minimizing waste, optimizing resource allocation, and ultimately improving the quality of oranges reaching consumers. Carolina et al. used image processing techniques to identify the degree of maturity in oranges,29 while Saha et al.30 applied a deep learning approach to classify and identify diseases in orange fruit. Using multi-modal input data, a deep learning approach, “Deep Orange”, was developed to detect and segment individual fruits, including oranges.40 Asriny et al.31 proposed a classification model using Convolutional Neural Networks (CNNs) to classify orange images, achieving an accuracy of 96% with the ReLU activation function. These studies collectively demonstrate the potential of image processing and deep learning for accurately and efficiently detecting the maturity stage of the orange fruit. Through this research, we aim to contribute valuable insights into developing and deploying deep learning models for the classification of oranges by maturity, paving the way for more accurate, scalable, and cost-effective solutions in the agricultural landscape.
Unlike previous studies in unrelated fields, our research addresses a practical agricultural problem: the accurate classification of fruit maturity. This work aims to reduce postharvest losses and improve sorting efficiency. The objectives of the present research are to explore the feasibility of leveraging deep learning algorithms such as EfficientNet-B0, ResNet50, VGG16, and a Custom CNN to classify Khasi mandarin by maturity and then develop a robust and accurate deep learning model capable of discerning the maturity stage of Khasi mandarin. In this study, we examined four CNN architectures—EfficientNet-B0, ResNet50, VGG16, and a Custom CNN—to determine the best model for classifying mandarin orange maturity. These networks were chosen because they follow different design philosophies. EfficientNet-B0 focuses on parameter efficiency through compound scaling, ResNet50 incorporates residual learning for deeper architectures, VGG16 serves as a popular benchmark for image classification tasks, and the Custom CNN was created to fit the specific needs of the dataset. By comparing these models using a balanced dataset, this work offers both a benchmark and practical guidance for researchers and practitioners in choosing models for assessing fruit maturity.
The uniqueness of this study is in its direct comparison of various deep learning architectures for classifying mandarin orange maturity under uniform experimental conditions. Unlike previous studies that often examine just one architecture, our approach shows the trade-offs in accuracy, efficiency, and reliability among different CNN designs. The results show that EfficientNet-B0, ResNet50, and the Custom CNN perform better than traditional models, such as VGG16. They also indicate their potential for future use in automated harvesting and sorting systems. This work assists readers by offering clear recommendations for selecting algorithms and highlighting the importance of AI in promoting sustainable farming practices.
| Image sensor | C525 webcam |
|---|---|
| Frame rate | 30fps @ 640 × 480 pixels |
| Connector | 1x USB 2.0 |
![]() |
|
| System requirements | |
| Computer | 512 MB RAM or more |
| 200 MB hard drive space | |
| USB 1.1 port (2.0 recommended) | |
| Operating system | Windows XP (SP2 or higher), windows vista or windows 7 (32 bit or 64 bit) |
![]() | ||
| Fig. 3 The ripening stages of mandarin oranges are classified into three categories: (a) unripe, (b) ripe, and (c) over-ripe. | ||
| Category | Primary datasets | |
|---|---|---|
| Training | Testing | |
| Unripe (0) | 292 | 73 |
| Ripe (1) | 292 | 73 |
| Overripe (2) | 292 | 73 |
| Total images | 876 | 219 |
088, 4096, and 4096 nodes of neurons, implemented in sequential steps for the fully connected layers. The classification task involves three classes, utilizing a softmax classifier to simplify the process by computing the probability of all labels and delivering intuitive results, as depicted in Fig. 6. Finally, the research model is implemented using a Python script with the Keras and TensorFlow libraries.33
This study devised a Custom CNN model with three convolutional layers and three max-pooling layers. The input image size is specified as 224 × 224 × 3 for the primary dataset. The chosen loss function is the cross-entropy function, and the Adam optimizer is selected for its stability in weight and offset updates. Accuracy is incorporated to strike a balance between training and validation. The final output layer employs the softmax activation function. The schematic representation of the proposed CNN model is depicted in Fig. 7.
![]() | (1) |
![]() | (2) |
![]() | (3) |
![]() | (4) |
:
10
:
10 ratio, as shown in Fig. 8. Google Colab was used as the training system. Table 4 presents the details of the corresponding training parameters, respectively.
![]() | ||
| Fig. 8 A flowchart is depicted illustrating the process of dividing the data augmented dataset for model training. | ||
| Hyperparameter | Value |
|---|---|
| Epoch | 200 |
| Batch size | 8 |
| Optimizer | SGD |
| Learning rate | 0.0001 |
| Image size | 224 × 224 |
To address the relatively small dataset size, data augmentation was applied during training, including random rotations (±20°), horizontal and vertical flips, zooming (up to 20%), brightness adjustments (±15%), and random shifts (up to 10%). Hyperparameters were optimized through trial and error using a validation split of the training data. The final configuration utilized the Adam optimizer with a learning rate of 0.0001, a batch size of 8, and categorical cross-entropy as the loss function. Training was performed for 200 epochs, with early stopping implemented to prevent overfitting.
In evaluating the performance of the employed architecture, this research relies on precision, recall, F1-score, and accuracy metrics derived from experiments conducted within the study. Table 5 provides a comprehensive summary of the performance, which extends to four distinguished architectures: EfficientNet-B0, ResNet50, and VGG16, alongside a customized CNN which is tailored explicitly for the task. The reported accuracy rates provide a nuanced understanding of each architecture's efficacy. The evaluation of different CNN architectures reveals intriguing performance variations: EfficientNet-B0 and ResNet-50 both demonstrate high accuracy rates of 98%, highlighting their effectiveness in capturing nuanced features related to the maturity of mandarin oranges. ResNet50 also performed well, particularly in identifying ripe and overripe fruits, with near-perfect accuracy and F1 scores. The residual connections in ResNet50 likely helped it learn deep hierarchical features effectively, reducing vanishing gradient problems during training. This suggests that ResNet50 is ideal when accuracy is more important than computational cost.
| CNN architecture | Class | Precision | Recall | F1-score | Accuracy |
|---|---|---|---|---|---|
| EfficientNetB0 | Unripe | 98% | 98% | 97% | 98% |
| Ripe | 100% | 97% | 99% | ||
| Overripe | 97% | 100% | 98% | ||
| ResNet50 | Unripe | 97% | 100% | 99% | 98% |
| Ripe | 100% | 98% | 98% | ||
| Overripe | 99% | 100% | 100% | ||
| VGG16 | Unripe | 84% | 91% | 88% | 83% |
| Ripe | 95% | 46% | 62% | ||
| Overripe | 79% | 100% | 88% | ||
| Custom CNN | Unripe | 100% | 100% | 100% | 99% |
| Ripe | 100% | 97% | 99% | ||
| Overripe | 98% | 100% | 99% |
VGG16, although exhibiting a slightly lower accuracy of 83%, still demonstrates commendable performance. In contrast, VGG16 showed significant variability in its performance across different classes. While it did reasonably well with unripe and overripe fruits, its performance for ripe fruits dropped sharply, with recall as low as 46%. This indicates that it often misclassifies ripe fruits, possibly due to its deeper yet less efficient architecture, which lacks residual or scaled connections. These results highlight the limitations of VGG16 in managing fine intra-class differences compared to those of more modern architectures.
Interestingly, the Custom CNN model outshines the others with an impressive accuracy rate of 99%, emphasizing the importance of tailored architectures for specific classification tasks. The proposed Custom CNN model, comprising three convolutional layers with 3 × 3 filters in each layer, incorporated 2 × 2 max pooling layers, a softmax classifier for image data classification, and five hidden layers. Its performance is similar to that of EfficientNetB0. However, the Custom CNN is a very small network with few convolutional layers and few dense layers; it showed better results compared to EfficientNetB0, ResNet50, and VGG16. The Custom CNN achieved perfect classification in the unripe category and high precision and recall across all classes, similar to EfficientNetB0 and ResNet50. This shows the promise of tailored lightweight architectures for specific applications. The overall accuracy of 99% suggests that with careful design, even simpler CNNs can perform as well as advanced pre-trained networks, which can be beneficial for use in devices with limited resources.
As it has fewer training parameters than EfficientNetB0, ResNet50, and VGG16, the model takes less time to train and exhibits lower latency during testing. This customized approach highlights the importance of domain-specific design choices in achieving optimal results and underscores the nuanced intricacies of agricultural image classification.
The loss curves and accuracy curves, meticulously plotted with the x-axis representing the number of epochs and the y-axis portraying corresponding values, provide a visual narrative of the training process and model evolution shown in Fig. 9. These curves offer insights into convergence patterns, optimal training epochs, and the impact of hyperparameters on the learning dynamics of the models. The loss curves illustrate the model's ability to minimize error during training, showcasing trends and potential challenges in the optimization process. Simultaneously, the accuracy curves reveal the model's proficiency in correctly classifying mandarin oranges at various maturity stages over epochs.
![]() | ||
| Fig. 9 Training and validation performance graphs for: (a) EfficientNet-B0, (b) ResNet50, (c) VGG16, and (d) Custom CNN. | ||
In addition to accuracy, precision, recall, and F1-scores, we calculated the area under the receiver operating characteristic curve (AUC) to provide a better measure of classification performance. The AUC values for EfficientNet-B0, ResNet50, and the Custom CNN were all above 0.97. VGG16 achieved an AUC of 0.89. These results further confirm the strong performance and reliability of EfficientNet-B0, ResNet50, and the Custom CNN in distinguishing maturity stages of mandarin oranges. The confusion matrices for each model were analyzed to examine misclassification patterns. The results showed that most errors occurred between the ripe and overripe categories. Misclassifications between unripe and other classes were relatively rare. This suggests that visual differences between ripe and overripe fruits are more subtle, leading to occasional overlap in classification. From an agricultural viewpoint, misclassifying fruit ripeness can lead to serious problems. When ripe fruits are wrongly labeled as overripe, they may be rejected from the market too soon, causing waste after harvest. On the other hand, if overripe fruits are seen as ripe, they enter the supply chain with a shorter shelf life. This can lead to quality issues and complaints from consumers. Additionally, if unripe fruits are mistakenly identified as ripe, they might be picked or sold too early. This results in poor taste, less satisfaction for consumers, and ultimately lower profits for producers. In comparison to previously discussed studies, our proposed system employing EfficientNet-B0, ResNet50, VGG16, and a Custom CNN has demonstrated superior accuracy levels. The comparison of the four architectures shows clear differences in how well they classify fruit ripening stages. EfficientNetB0 consistently achieved high precision, recall, and F1-scores for unripe, ripe, and overripe stages. This demonstrates its ability to balance feature extraction and computational efficiency. Its lightweight design, with compound scaling, helps it capture detailed visual features without overfitting, which explains its stable performance. For instance, Arampongsanuwat et al.34 reported accuracy rates of 77.50% for VGG16, 75% for AlexNet, 79% for ResNet50, and 78% for InceptionV3 in mangosteen ripeness classification. However, our proposed system achieved notably higher accuracy rates of 83% for VGG16 and an impressive rate of 98% for ResNet50, showcasing the superior performance of our approach in this context. Similarly, Kusakunniran et al.35 attained a remarkable accuracy rate of 100% for DenseNet121, EfficientNetB0, ResNet50, and VGG16. Nonetheless, our proposed system yielded slightly lower accuracy rates of 98% for EfficientNetB0 and ResNet50, and 83% for VGG16. While these figures indicate slightly lower performance compared to the referenced study, they still demonstrate competitive results for our proposed system. Furthermore, Al-Masawabe et al.36 achieved a perfect accuracy of 100% in classifying bananas based on ripeness categories. In contrast, our proposed model achieved an accuracy of 83% for VGG16, indicating comparable performance in this specific classification task. Additionally, Mahmood et al.37 reported high accuracy rates for AlexNet and VGG16 in classifying jujube fruits into maturity categories. Despite their reported accuracies of 94.17% and 97.65% for actual and augmented images, respectively, our proposed system achieved a lower accuracy of 83% for VGG16 in a similar classification task. Moreover, Nasiri et al.38 achieved an impressive accuracy rate of 98.4% using VGG16 for classifying mandarin oranges based on maturity levels. In contrast, our proposed system achieved an accuracy rate of 83% for the same task, indicating comparatively lower performance. While our proposed system may exhibit slightly lower accuracy rates compared to some referenced studies, it still demonstrates competitive performance across various classification tasks. It's worth noting that the standard accuracy of our models, computed by averaging accuracy over actual and augmented datasets, provides a comprehensive measure of their overall performance. Comparatively, the methods reported in other studies also underwent similar standard accuracy computations, facilitating a meaningful comparison across different approaches. Overall, the results indicate that modern, scalable architectures like EfficientNet and ResNet, along with Custom CNNs designed for specific domains, are more effective for classifying fruit ripening stages than older architectures, such as VGG16. In addition, the near-perfect accuracies imply that these proposed models are ready for real-world applications, such as automated grading systems in post-harvest management and supply chain monitoring. Future research could focus on testing how well these models generalize across different lighting conditions, backgrounds, and types of fruit to further establish their practical use.
| This journal is © The Royal Society of Chemistry 2025 |