Open Access Article
R.
Deepa
a,
Midhun P.
Mathew
*b,
S.
Baskar
c and
Abubeker K.
M
*d
aAssistant Professor, Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Kattankulathur Campus, Chennai, Tamil Nadu, India. E-mail: deepar2@srmist.edu.in
bAssistant Professor, Department of Computer Science and Engineering, Amal Jyothi College of Engineering (Autonomous), Kanjirappally, Kerala, India. E-mail: midhunpmathew@amaljyothi.ac.in
cAssistant Professor, Department of Electronics and Communication Engineering, Karpagam Academy of Higher Education, Coimbatore, Tamil Nadu, India. E-mail: connectbaskar@gmail.com
dAssociate Professor, Department of Electronics and Communication Engineering, Amal Jyothi College of Engineering (Autonomous), Kanjirappally, Kerala, India. E-mail: kmabubeker82@gmail.com
First published on 19th November 2025
The early diagnosis of plant leaf diseases is crucial in the sustainable management of agriculture as it minimises crop damage and reduces the use of pesticides. This paper presents Leaf Net (L-Net), a new lightweight convolutional neural network for the detection and classification of leaf diseases in apple, bell pepper, and grape. The model includes depthwise separable convolutions within the layers of the model to capture features more efficiently, an ensemble activation function to improve non-linearity of the output, and a Modified Adamax optimiser to improve convergence. The datasets used include publicly available repositories as well as custom annotated images, which were later pre-processed and augmented to enhance generalizability. A plant-wise split cross-validation approach was used in training and evaluation, along with the partitioning scheme to avoid data leakage and increase the practical applicability of the results. L-Net obtained a classification accuracy of 99.8% and AUC score of 1.00. Though the variability in precision-recall metrics suggests that improvements are needed in performance at the class level, L-Net was shown to be compatible with low-power devices such as Raspberry Pi and NVIDIA Jetson Nano edge platforms, which proved its feasibility for detection in the field. Moreover, this model facilitates the diagnosis of plant diseases in a timely and precise manner and helps in the accurate application of pesticides and the management of crops. This, in turn, fosters the adoption of sustainable agricultural practices. Additional research focuses on cross-crop studies and real-world scaling of L-Net to enhance its model robustness.
Sustainability spotlightThis research contributes to sustainable agriculture by presenting L-Net, a lightweight and highly accurate deep learning model for the early identification and classification of leaf diseases in bell pepper, grape, and apple plants. By enabling real-time, low-computation disease detection on resource-constrained devices, L-Net empowers farmers with cost-effective, scalable, and autonomous solutions for crop monitoring. The architecture's integration of depthwise separable convolutions minimizes energy consumption and enhances processing efficiency, making it suitable for deployment in remote and rural farming environments. This innovation supports precision agriculture by reducing chemical usage through timely disease intervention and improving yield with minimal environmental impact—thereby aligning with the United Nations Sustainable Development Goals (SDGs) related to Zero Hunger (SDG 2) and Responsible Consumption and Production (SDG 12). |
This research presents a neural network architecture, L-Net, implemented from scratch, which is lightweight and designed to detect and differentiate foliar diseases, frontal and dorsal, for apple, grape, and bell pepper crops from a close range or ground level distance. The L-Net model is designed for rapid deployment to real agricultural fields due to its lightweight, power-efficient, and highly accurate capabilities, thus outshining competing models. The model underwent extensive testing, spanning real-life conditions, and is reported to give over 98% accurate results on training and testing validity, which stands to prove its effectiveness. This contribution can be summarised in the following key points:
• Developing and fine-tuning a new lightweight CNN architecture (L-Net)
• Trained from the ground level to achieve the highest accuracy in multiple crops, apple, grape, and bell pepper, for leaf disease detection.
Detection of frontal and dorsal foliar diseases was conducted under real-world conditions concerning occlusions and variations in illumination. The proposed architecture is highly proficient and surpasses the efficiency of deep learning models deployed for wider applications.
The rest of the paper is organised as follows: Section 2 discusses the literature on plant disease classification using deep learning techniques. Section 3 describes the model development, followed by the methodology in Chapter 4, which entails L-Nets's architecture and dataset construction for the model. Section 5 describes the experimental setup and analysis of the results, while Section 6 presents model analysis with an accompanying discussion. Finally, Section 7 offers conclusions and outlines directions for future research.
Other developments and research have targeted real-time models and graphics processing unit (GPU) accelerated ones. Rahman et al.12 developed a system for monitoring and detecting plant diseases that works in real-time and relies on deep learning techniques. Among other things, Wang et al.13 were concerned with the isolation and identification of the causative agent of gummy stem blight disease of cucumber, contributing to the understanding of plant pathology. Alongside these developments, transformer models and federated learning methods were also crafted. Chai et al.14 proposed the PlantAIM model, which combines global attention and local features for identifying plant diseases. Hari and Singh15 presented an adaptive method of knowledge transfer using federated deep learning for plant disease detection with privacy and efficiency concerns.
Liu et al.16 developed an advanced YOLOv5-based model, achieving 92.7% mAP in apple leaf disease detection. In a recent research study, a novel mobile-optimised lightweight model YOLOv8n–GGi is designed for an apple leaf disease detection model designed to work in natural environments.17 The model was optimised through the application of GhostConv, C3Ghost, GAM, and BiFPN modules, where it achieved 86.9% mAP. Advanced DL models, particularly with Swinv2-Base, are achieving 100% accuracy in early diagnosis and identification, thus highlighting the ease of integration of deep learning in agriculture.18 In ref. 19, real-time grape leaf disease detection using the MobileNetV3Large model is developed and deployed on the NVIDIA Jetson Nano edge platform. The model was able to achieve 99.66% training accuracy and 99.42% test accuracy. In ref. 20, spanning various ML and DL frameworks designed for early plant leaf disease detection, the paper emphasises the EfficientNet family of models, which attained 98.12% accuracy for image classification at a modest computational cost. Sustainable agriculture and food security hinge on accurate leaf disease detection. An ANFIS-integrated CNN model with local binary pattern (LBP) features was used for enhanced detection of bell pepper leaf diseases. With the LBP, the model achieved exceptional accuracy, surpassing 99%, revealing its potential for dependable agricultural applications.21Table 1 shows a summary of the literature.
| Sl. No. | Model/Study | Dataset used | Advantage | Disadvantage | Accuracy (%) | Precision (%) | Research gap |
|---|---|---|---|---|---|---|---|
| 1 | MobileNetV2 (ref. 5) | Plant village | Efficient on mobile and edge devices | Relies on transfer learning; lower class-specific precision | 98.2 | 96.3 | Limited crop-specific optimisation |
| 2 | EfficientNetB0 (ref. 8) | Plant village | Balanced accuracy and model size | Higher inference latency on low-end devices | 98.6 | 97.2 | Not optimised for real-time agricultural field use |
| 3 | DBESeriesNet (ref. 4) | Plant village | Tailored for leaf disease classification | Moderate model size; lacks edge evaluation | 98.0 | 96.9 | No hardware constraint analysis or field testing |
| 4 | DSC-TransNet (ref. 11) | Plant village | GPU-enabled, real-time detection | Needs expensive hardware, not practical for low-resource areas | 97.4 | 96.5 | High deployment cost and energy demand |
| 5 | Dense-inception + attention (ref. 10) | Plant village | Improved spatial attention and feature localisation | High complexity and latency | 97.8 | 97.0 | No sustainability or deployment framework was discussed |
In contrast, the developed L-Net model for embedded systems considers hardware constraints and spatial contextual domain patterns. L-Net justifies this by completing the primary accuracy target for the embedded domain while using depthwise separable convolution, a novel ensemble activation function, and a customised optimiser. Apart from this, most literature ignores performance under the multi-crop, multi-disease, and class imbalance conditions. This work aims to close that gap using strategic augmentation, balanced partitioning, and a class-level evaluation framework. Evidence for the practical nature of this work is provided in Section 5, which includes comparisons of inference times for L-Net and baseline CNN, MobileNet, and EfficientNet architectures.
| X′ = (Wd × X) × Wp | (1) |
| f(x) = ½(GELU(x) + LeakyReLU(x,α)) | (2) |
![]() | (3) |
| LeakyReLU(x,α) = max(αx,x) | (4) |
The architecture consists of three main convolutional blocks, each containing two depthwise separable convolution layers with 32, 64, and 128 filter sizes, respectively. Batch normalisation with renormalisation follows each block for progressive weight updating, and max pooling with a stride of 2 reduces the spatial dimensions:
| Xl+1 = max(Xl,2 × 2) | (5) |
| X′ = W × X + b | (6) |
| Xavg = (1/H × W)ΣΣXij | (7) |
![]() | (8) |
A dropout of 0.5 and 0.3 is applied to improve generalisation. The final softmax output layer produces probability distributions for two classes, using the categorical cross-entropy loss:
| L = −Σyilog(ŷi) | (9) |
| mt = β1m(t−1) + (1 − β1)gt | (10) |
| vt = max(β2v(t−1),|gt|) | (11) |
| vt = v(t−1) − µgt + εvt | (12) |
θt = θ(t−1) − η(mt/( t + ε)) | (13) |
| Feature/Model | MobileNetV2 | EfficientNetB0 | Proposed L-Net |
|---|---|---|---|
| Primary purpose | Generic image classification | Generic image classification | Crop-specific plant leaf disease classification |
| Training strategy | Transfer learning | Transfer learning | Trained from scratch |
| Activation function | ReLU6 | Swish | Ensemble (GELU + leaky ReLU) |
| Optimizer | Adam | RMSProp | Modified Adamax |
| Model parameters | ∼3.4 million | ∼5.3 million | ∼1.9 million |
| FLOPs (multiply-add ops) | ∼300 million | ∼390 million | ∼36 million |
| Hardware target | Edge/Cloud | Edge/Cloud | Edge-focused (Jetson Nano, and Raspberry Pi) |
| Domain adaptability | Moderate | Moderate | Highly tailored for agricultural datasets |
| Suitability for real-time use | Medium | Medium | Low latency and fast inference |
| Sustainability relevance | Not explicitly addressed | Not explicitly addressed | Explicitly aligned with smart farming and sustainability |
To illustrate the lightweight nature of L-Net, Table 3 presents a full layer-by-layer summary showing output size, parameter count, and estimated FLOPs for each stage in the pipeline. Overall, the model uses roughly 167
000 parameters and executes around 56.9 million FLOPs per forward pass. This slim architecture lowers memory footprint and accelerates inference speed, supporting the claim that L-Net is practical for real-time disease screening on edge hardware such as the Raspberry Pi 4B and Jetson Nano.
| Layer type | Output shape | Kernel/Stride | Filters | Params | FLOPs (M) |
|---|---|---|---|---|---|
| Input | 256 × 256 × 3 | — | — | 0 | 0 |
| Depthwise conv 1 | 256 × 256 × 32 | 3 × 3/1 | 32 | 320 | 15.1 |
| Pointwise conv 1 | 256 × 256 × 32 | 1 × 1/1 | 32 | 1024 | 4.2 |
| Max pooling | 128 × 128 × 32 | 2 × 2/2 | — | 0 | 0.8 |
| Depthwise conv 2 | 128 × 128 × 64 | 3 × 3/1 | 64 | 640 | 7.5 |
| Pointwise conv 2 | 128 × 128 × 64 | 1 × 1/1 | 64 | 2048 | 8.1 |
| Max pooling | 64 × 64 × 64 | 2 × 2/2 | — | 0 | 0.4 |
| Depthwise conv 3 | 64 × 64 × 128 | 3 × 3/1 | 128 | 1280 | 6.2 |
| Pointwise conv 3 | 64 × 64 × 128 | 1 × 1/1 | 128 | 8192 | 12.4 |
| Max pooling | 32 × 32 × 128 | 2 × 2/2 | — | 0 | 0.2 |
| 1 × 1 bottleneck conv | 32 × 32 × 32 | 1 × 1/1 | 32 | 4096 | 2.1 |
| Global avg pooling | 1 × 1 × 32 | — | — | 0 | 0.03 |
| FC layer 1 | 512 | — | — | 16 896 |
0.06 |
| Dropout (0.5) | 512 | — | — | 0 | 0 |
| FC layer 2 | 256 | — | — | 131 072 |
0.05 |
| Dropout (0.3) | 256 | — | — | 0 | 0 |
| Softmax output | 6 (classes) | — | — | 2560 | 0.02 |
| Total | — | — | — | 167 128 |
56.9 |
Fig. 1 describes a streamlined deep learning architecture designed for image classification, emphasising low computational requirements and suitability for real-time or edge deployment scenarios. The system processes an input image of size 256 × 256 × 3. It begins with feature extraction through a series of depthwise separable convolution layers, which drastically reduce the number of learnable parameters while preserving representational capacity. The initial convolution block contains two depthwise separable convolutions of 32 filters each, which is followed by batch normalisation to stabilise and improve training dynamics. This scheme is duplicated with increasing filter sizes of 64 and 128, followed by batch normalisation and 2 × 2 max pooling layers for the reduction of spatial dimensions. The transition from convolutional layers to fully connected layers is aided by a 1 × 1 conv2D layer with 32 filters, which performs some degree of dimensionality reduction, and is followed by a 7 × 7 average pooling layer, which integrates some spatial characteristics. The resulting feature map is subsequently flattened and passed to two fully connected layers with 512 and 256 neurons, respectively. An ensemble of activation functions is employed, comprising a number of nonlinear functions, with the purpose of enhancing generalisation. To further mitigate the effects of overfitting, dropout layers are utilised with probabilities of 0.5 and 0.3 on the dense layers. The final dense layer has a Softmax activation specifically designed for two-class outputs. The Modified Adamax Optimiser is used, which combines the standard modified optimisers with a custom-designed learning rate algorithm that adapts to a large range of datasets to optimise convergence and performance.
Upon identifying the best configuration, the L-Net model is trained using the training dataset and subsequently assessed using the validation set. Evaluation of the model is then performed on an unbiased testing dataset. The model is evaluated on performance metrics that include accuracy, precision, recall, F1 score, and AUC during the final evaluation to demonstrate the efficacy of the proposed model. The implementation of this workflow in testing ensures achieving all of the following for the developed models: reliability, robustness, reproducibility, and preparation for deployment/additional testing.
Table 4 summarises the number of pictures for the various classifications of plant leaf diseases according to plant types such as bell pepper, apple, and grape.
| Sl. No. | Type of plant | Disease type | No. of images |
|---|---|---|---|
| 1 | Bell pepper | Bacterial spot | 797 |
| Healthy | 1183 | ||
| 2 | Grape | Black rot | 944 |
| Leaf spot | 861 | ||
| Black measles | 1107 | ||
| Healthy | 399 | ||
| 3 | Apple | Scab | 504 |
| Black rot | 497 | ||
| Cedar apple rust | 220 | ||
| Healthy | 1316 |
This research employs a dataset containing a total of 7828 images, which are labelled and span across three plant species: bell pepper, grape, and apple, considering both healthy and diseased leaves. For bell pepper, the dataset contains two classes: bacterial spot (797 images) and healthy (1183 images), totalling 1980 images. The grape category includes four classes: black rot (944 images), leaf spot (861 images), black measles (1107 images), and healthy (399 images), totalling 3311 images. The apple leaves also consist of four classes: scab (504 images), black rot (497 images), cedar apple rust (220 images), and healthy (1316 images), totalling 2537 images. Overall, the dataset contains a reasonably balanced showcase of many different types of diseased leaves and healthy leaves. This assists in training the deep learning models for evaluating the classification of diseased leaves and is effective in the training process.
The model utilises augmentation techniques, which improve model performance on unseen data, as the model learns the different representations of the same image. Example images modified with augmentation techniques can be seen in Fig. 4. For the L-Net model, the LevelNet model training set was expanded and diversified with augmentation techniques comprising rotations, shifts, shearing, zooming, and flipping.
The augmented dataset overview, as well as its characteristics, is contained in Table 5. These pre-processing approaches are aimed at ensuring that the model attends to important features and not irrelevant distortions. This is important, as it helps to correct the classifier's misclassification. Moreover, many techniques prevent overfitting by encouraging the model to learn patterns instead of memorising the augmented training data. Integrating these pre-processing and augmentation methods enhanced classification accuracy by 6% to 9%, and there was a reduction in the number of incorrect identifications made relative to the unprocessed images. Furthermore, the increase in dataset size helped improve the precision and recall values of the model, especially in the minority classes, which helped the model to identify rare plant diseases confidently. All pre-processing, as expected, was quantitatively evaluated, and it was shown that normalisation and cropping, as integrated workflows, were the most beneficial to the model in terms of feature clarity and performance.
| Sl. No. | Type of plant | Disease type | No. of images |
|---|---|---|---|
| 1 | Bell pepper | Bacterial spot | 5100 |
| Healthy | 5200 | ||
| 2 | Grape | Black rot | 6000 |
| Leaf spot | 6300 | ||
| Black measles | 6200 | ||
| Healthy | 6180 | ||
| 3 | Apple | Scab | 5300 |
| Black rot | 5200 | ||
| Cedar apple rust | 5100 | ||
| Healthy | 5110 | ||
| Total | 55 690 |
||
For bell pepper, augmentation resulted in 5100 images for bacterial spot and 5200 for healthy samples. In the grape category, there are 6000 augmented images of black rot, 6300 of leaf spot, 6200 of black measles, and 6180 healthy images. In apple, the augmented dataset includes 5300 images of scab, 5200 of black rot, 5100 of cedar apple rust, and 5110 healthy images. This meticulous procedure supports balanced classification and improves the model's robustness by simulating different conditions of plant leaves in the real world.
Only the augmented images are included in the training set; for the original images, care was taken to avoid overlaps with the augmented images. This guarantees that no training, validation, or testing set shares any plants or leaves, and thus bias in evaluation results is avoided. To ensure data accuracy, a combination of automated and manual techniques is performed. The sample overlaps to ascertain a “clean”, non-redundant, and unbiased superset and model for evaluation. To manage large data sets and quality control, we use TensorFlow data validation (TFDV), which is part of the TensorFlow Extended (TFX) suite. This step is vital for quality control of the set. The approach to the data set division is shown in Table 6. This validates unbiased objective criteria for model evaluation, thus preventing data pollution.
| Sl. No. | Image type | Training | Testing | Validation | Total |
|---|---|---|---|---|---|
| 1 | Bell pepper | 7210 | 1030 | 2060 | 10 300 |
| 2 | Grape | 17 360 |
2480 | 4960 | 24 800 |
| 3 | Apple | 14 497 |
2071 | 4142 | 20 710 |
The dataset utilised for training, validating, and testing the L-Net model consisted of 55
810 images across three plant species: bell pepper, grape, and apple. For bell pepper, 10
300 images were collected, of which 7210 were used for training, 2060 for validation, and 1030 for testing. The dataset for grape contained 24
800 images, 17
360 of which were training images, 4960 for validation, and 2480 for testing. The apple dataset contained a total of 20
710 images, which included 14
497 training images, 4142 for validation, and 2071 for testing. This stratified split allows for balanced and sufficient learning to enhance the L-Net model's generalisation and classification proficiency across different plant species and disease types.
| Sl. No. | Hyperparameter | Setting |
|---|---|---|
| 1 | Input image size | 256 |
| 2 | Batch remoralization | True |
| 3 | Stride | 2 |
| 4 | Dense layer | 512 |
| 5 | Activation | Ensemble activation |
The L-Net model was further trained with a specific set of hyperparameters tailored to achieve the best output and training efficiency to make any adjustments. The input image dimensions were set to 256, which provided both adequate resolution for feature extraction while remaining computationally efficient. Batch normalisation was active, which assisted in accelerating and stabilising training by normalising layer inputs. Convolutional layers included a stride value of 2 to downsample feature maps and reduce the spatial dimensions effectively. In addition, the architecture had a dense layer with 512 neurons that added a high-capacity representation before the output layer. Notably, the model used an ensemble activation function, which combines several nonlinear activation functions to improve the expressiveness and generalisation of the model. All these hyperparameters aided in striking a good balance in the trade-off between the model's complexity, stability during training, and accuracy during classification tasks.
The L-Net model was trained and evaluated on Google Colab Pro+, which provided a single NVIDIA A100 Tensor Core GPU with 40 GB of video memory. The virtual environment hosted 16 vCPUs and 85 GB of system RAM, and it ran Python 3.10, TensorFlow 2.12, and CUDA 11.8. Training proceeded for 100 epochs using a batch size of 9 and an initial learning rate of 0.001. During inference, the average prediction latency per image was about 36 milliseconds on the GPU, indicating that the model can classify inputs in near real time. This performance makes L-Net viable for cloud-assisted applications and lightweight edge devices, including Jetson Nano, Raspberry Pi paired with Coral TPU, or mobile platforms, where low latency and modest resource use are essential.
| Activation function | Accuracy (%) | Precision (%) | F1-score (%) | Epochs to converge | Remarks |
|---|---|---|---|---|---|
| ReLU | 98.7 | 97.8 | 97.6 | 40 | Fast but struggled with minority classes |
| Leaky ReLU | 99.0 | 98.5 | 98.3 | 42 | Improved handling of negative activations |
| GELU | 99.1 | 98.6 | 98.5 | 45 | Better smoothness and slower early learning |
| GELU + leaky ReLU | 99.8 | 99.7 | 99.6 | 37 | Best generalization and faster convergence |
Table 8 shows the results of an ablation study conducted for the L-Net model, focused on evaluating the performance and training convergence issues of different activation functions. The model with ReLU achieved 98.7% accuracy at 40 epochs, but had challenges predicting minority classes, which impacted generalization. Leaky ReLU improved the model slightly more, scoring 99.0% accuracy while also improving precision and F1-score as a result of better handling of negative activations, though it converged in 42 epochs. The GELU also provides weak performance with smoother activation and strong generalization, obtaining 99.1% accuracy at 45 epochs, which is higher than that of earlier converging models due to slower early-stage learning. Synergistic ensembles performed best with a combination of leaky ReLU and GELU, which brought accuracy to 99.8%, attaining a precision of 99.7% and an F1 score of 99.6% while also achieving the fastest 37-epoch convergence. This study demonstrates that ensemble performance improves generalization and convergence efficiency, making it optimal for L-Net.
| Optimizer | Accuracy (%) | Precision (%) | F1-score (%) | Epochs to converge | Remarks |
|---|---|---|---|---|---|
| Adam | 98.9 | 98.1 | 97.9 | 42 | Fast convergence and moderate overfitting |
| RMSProp | 98.7 | 97.9 | 97.6 | 45 | Stable but slow convergence |
| Modified Adamax | 99.8 | 99.7 | 99.6 | 37 | Best stability, generalisation, and speed |
A comparative evaluation of optimisers on L-Net illustrates the discrepancies across performance measures, including convergence speed and generalisation ability. Using the Adam optimiser produced an L-Net model architecture that attained an accuracy of 98.9%, and a precision and F1 score of 98.1% and 97.9%, respectively, with convergence occurring at 42 epochs. For Adam, while there was rapid convergence, there was also moderate overfitting. The accuracy reached with the RMSProp optimisers was slightly lower at 98.7, but there was also a lower convergence at 45 epochs. While the optimiser provided stable learning, it was also slow with convergence and generalisation. The Modified Adamax optimiser exceeded all others, achieving the highest accuracy score of 99.8, an accompanying precision of 99.7, and an F1 score of 99.6, and convergence in only 37 epochs. This optimiser proved to have the best training stability, faster convergence, and robust generalisation. Therefore, it was the most beneficial optimisation strategy for the L-Net architecture. These findings demonstrate that Modified Adamax optimisers significantly improve learning dynamics and classification performance for tasks involving the detection of plant diseases.
Fig. 5–7 provide performance metrics depicting training and validation accuracy exceeding 0.99 after 20 epochs, suggesting practical training with no overfitting. The model-level initialisation above 0.88 demonstrates that the model is initialised strongly. Between epochs 10 and 20, there is a fluctuation followed by stabilisation in validation accuracy. The overall drop in validation accuracy between epochs 20 and 40 suggests overfitting during some stages. The overall validation scores are very high (0.97–0.99), showing that the model generalises well with very little bias.
In the first model, the accuracy and loss plots show both high stability and flawless performance, as the training and validation curves for accuracy converge around 99.8%. The accuracy gap between the curves, which is minimal, along with consistently low loss values, confirms excellent generalisation of the model, which is well-optimised with no overfitting. In contrast, the second model displays some degree of early-stage instability, albeit accumulating high accuracy in the end. Additionally, the validation curves for accuracy and loss are unstable, which indicates that too much data stresses the model and makes it hypersensitive to hyperparameters such as the learning rate. Withal, the model appears to be stable at later training stages, which makes me suspect the model could gain from the use of more regularisation methods, including dropout or early stopping. The performance of the third model starts lower, around 82% but improves as time progresses. Training and validation curves strongly align with each other through all the epochs, suggesting consistency in learning and generalisation. The gradual yet stable convergence highlights the robustness of the model and indicates the capability to handle more complex and noisy datasets.
| Model | Accuracy (%) | Parameters (M) | Inference time (ms) | Std dev (±) | Model size (MB) |
|---|---|---|---|---|---|
| L-Net (proposed) | 99.8 | 1.9 | 38 | ±0.12 | 7.6 |
| MobileNetV2 | 98.2 | 3.4 | 45 | ±0.27 | 12.6 |
| EfficientNetB0 | 98.6 | 5.3 | 60 | ±0.21 | 19.1 |
| ViT | 96.7 | 86 | 200+ | ±0.31 | 330 |
| Swin-T | 97.3 | 28 | 110 | ±0.25 | 91 |
Table 11 shows that adding these transformations boosted the models general ability to predict unseen data, with the most significant gains seen in precision and recall measures. Overall accuracy climbed by 2.6%, precision increased by 3.8%, and recall jumped by 4.1%, improvements that were especially pronounced for rare disease categories that the original pool poorly represented. Such results suggest that augmentation partially balanced class distributions and gave the network a richer view of how diseases can appear under varied real-world imaging conditions.
| Metric | Without augmentation | With augmentation | Improvement (%) |
|---|---|---|---|
| Accuracy (%) | 97.2 | 99.8 | +2.6 |
| Precision (%) | 95.9 | 99.7 | +3.8 |
| Recall (%) | 95.2 | 99.3 | +4.1 |
| F1-score (%) | 95.5 | 99.5 | +4.0 |
Fig. 10 shows a classification of grape leaves into four groups. This has been done with exceptional performance demonstrated by very high F1 scores. Regarding apple, bell pepper, and grape leaf disease classification, L-Net showed unmatched classification accuracy owing to performance across different evaluation metrics. From the receiver operating characteristic (ROC) curve, L-Net has an AUC (1.00) for all classes, meaning that L-Net can differentiate between diseased and healthy leaves without any false positives. This means the model performs well in class discrimination, exceeding the performance of traditional deep learning models like ResNet and Inception-v3, which are used in other classification tasks for plant disease. Enhancing the model's reliability, the precision-recall (PR) curve shows that precision values remained high irrespective of other recall levels. However, it is interesting to note the macro average PR AUC of 0.74, which suggests that while most classes perform very well, there is some imbalance in performance across all the classes, and hence, the overall generalisation will be poor.
L-Net is well calibrated for disease classification, as shown by the calibration curve, since the predicted probabilities nearly match the actual disease occurrence. However, room for improvement exists owing to the slight deviations for specific classes, which could be dealt with using temperature scaling or Bayesian uncertainty estimation. The mean average precision (mAP) trends show that L-Net's performance varies across training epochs, especially in early training periods. Differences in the texture of the leaves and disease severity among the apple, bell pepper, and grape datasets likely cause variability. Unlike transfer learning-based models, L-Net, with its capsule networks and transformer encoders, captures spatial hierarchies and disease-specific features, which leads to enhanced generalisation.
Fig. 11 shows L-Net results on other datasets that account for real-life changes like light, humidity, and leaf position. The study shows that the model works for bell pepper, grape, and apple leaves with excellent efficiency.
![]() | ||
| Fig. 11 (a) Bell pepper disease identification using L-Net. (b) Grape disease identification using L-Net. | ||
Overall, L-Net achieves state-of-the-art results in classifying apple, bell pepper, and grape leaf diseases. Its excellent ROC and PR curves, robust calibration, and accuracy trends make L-Net a frontrunner for practical application in plant disease diagnosis. Fig. 12 presents the graphical representation of the above analysis.
![]() | ||
| Fig. 12 (a) The mean average precision (mAP) values of bell pepper leaves. (b) The mean average precision (mAP) values of grape leaves. (c) The mean average precision (mAP) values of apple leaves. | ||
The model scores an ROC-AUC of 1.00, which suggests that it can cleanly separate positive from negative instances when evaluated overall. However, the more modest macro PR-AUC of 0.74 exposes a hidden imbalance among the individual classes. This gap exists because precision–recall curves emphasise precision and recall alone, so they react strongly when one class appears much less often than others. Because the ROC accounts for true negatives, its score can stay high even when the positive class is rare, giving a deceptively upbeat picture in skewed datasets. PR-AUC keeps a tighter grip on minority performance, punishing drops in either precision or recall, and therefore acts as a tougher sanity check in multi-class scenarios like these.
Examining the class-by-class PR curves shown in Fig. 13 reinforces these points. Frequent classes, such as bell-pepper-bacterial-spot, and rare classes like grape-leaf-blight and apple-scalp, pull the overall score down because their recall is weak. In other words, L-Net still performs well on average, but it stumbles whenever any single class dominates. To narrow this spread, the next model cycle will add focal loss and adaptive resampling so minority classes receive more attention during training, with the hope that macro PR-AUC will climb as a result.
| Disease class | Precision (%) | Recall (%) | F1-score (%) |
|---|---|---|---|
| Bell pepper – bacterial spot | 99.8 | 99.9 | 99.85 |
| Bell pepper – healthy | 99.6 | 99.7 | 99.65 |
| Grape – black rot | 99.7 | 99.5 | 99.6 |
| Grape – esca (black measles) | 99.5 | 99.2 | 99.35 |
| Grape – leaf blight | 98.9 | 97.8 | 98.35 |
| Apple – apple scab | 98.2 | 97.5 | 97.85 |
| Apple – black rot | 99.1 | 98.9 | 99.0 |
| Apple – cedar apple rust | 98.4 | 97.2 | 97.8 |
| Apple – healthy | 99.3 | 99.4 | 99.35 |
• A learning-rate scheduler cut the rate to 0.0001 after every fifteen epochs.
• Augmentation procedures now emphasise each class with stratified, random, and balanced transformations.
• A dropout layer with a 30% keep-rate and early stopping with ten-epoch patience was added to smooth convergence.
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request or available from ref. 22–25.
| This journal is © The Royal Society of Chemistry 2026 |