Khalid Ferji
Université de Lorraine, CNRS, LCPM, F-54000 Nancy, France. E-mail: khalid.ferji@univ-lorraine.fr
First published on 16th July 2025
The rapid and unbiased characterization of self-assembled polymeric vesicles in transmission electron microscopy (TEM) images remains a challenge in polymer science. Here, we present a deep learning-powered detection framework based on YOLOv8, enhanced with Weighted Box Fusion, to automate the identification and size estimation of polymer nanostructures. By incorporating multiple morphologies in the training dataset, we achieve robust detection across unseen TEM images. Our results demonstrate that the model provides accurate vesicle detection within 2 seconds—an efficiency unattainable using traditional image analysis software. The proposed framework enables reproducible and scalable nano-object characterization, paving the way for a general AI-driven automation in polymer self-assembly research.
In recent years, artificial intelligence (AI) has emerged as a powerful tool for accelerating discovery in polymer science.17–21 Convolutional neural networks (CNNs),22 particularly object detection models, have demonstrated remarkable success in recognizing and classifying nanoparticles in TEM images.23,24 Several research efforts have already explored the use of deep learning for the detection and characterization of nanoparticles and nanostructures, highlighting the potential of AI in improving analysis speed and precision.25–27 For example, Kamble et al.28 developed a deep learning model for microstructure recognition in polymer nanocomposites, achieving high accuracy. Similarly, Saaim et al.29 utilized machine learning algorithms to automatically segment and classify nanoparticles in high-resolution TEM images, significantly reducing the workload associated with manual annotation. Another relevant study by Lu et al.30 demonstrated the feasibility of using semi-supervised learning approaches for identifying and differentiating the morphologies of nanostructures, enabling automated classification without extensive manual labelling.
Recent efforts in the field of bioimage analysis have demonstrated the power of open-source tools in democratizing the use of deep learning for microscopy applications. For instance, ilastik—developed by Kreshuk and collaborators—has enabled non-expert users to perform machine learning-based segmentation and classification tasks in a highly interactive environment, significantly reducing the technical barrier for researchers handling complex microscopy data.31 Similarly, Henriques and co-workers contributed to the development of ZeroCostDL4Mic, a platform that simplifies the use of deep learning models in microscopy by leveraging free cloud resources and user-friendly interfaces, thus accelerating the adoption of AI in image-based research workflows.32 These initiatives illustrate how thoughtfully designed tools can facilitate the integration of AI into everyday scientific practice—especially for researchers outside the machine learning community.
While AI-based approaches have been successfully implemented in material characterization,33–35 their application to self-assembled polymers remains rare.36,37 The development of a deep-learning-based approach tailored specifically for polymersome detection could offer a significant breakthrough in polymer and materials sciences. This work aims to provide a proof of concept demonstrating that AI can successfully detect vesicles across diverse TEM datasets and offer users an open-source tool38—DetectNano— to assist them in detecting and evaluating size distribution in a straightforward manner (Fig. 1). Importantly, our goal is also to deliver a concrete training example that can serve as a starting point for polymer scientists—particularly non-specialists in AI—seeking to integrate machine learning into their daily research workflows.
![]() | ||
Fig. 1 (A) Class distribution and schematic 3D representations of polymer nanostructures in the dataset, accompanied by representative transmission electron microscopy (TEM) images illustrating each morphological class: large compound nano-objects (LCN), multicompartment vesicles (MCV), thick membrane multicompartment vesicles (TMCV), unilamellar vesicles (V), and the scale bar. (B) Workflow for automated detection, dataset annotation, and morphological analysis of nanostructures from TEM images using multi-model fusion via Weighted Box Fusion (WBF) of three YOLOv8 models (YOLOv8n, YOLOv8s, and YOLOv8m). TEM images were adapted with permission from ref. 3. Copyright 2022, American Chemical Society. |
However, to improve the generalization ability of the AI model and enhance its detection accuracy, we also included additional nanostructures commonly observed in polymer self-assembly. These different morphologies help the model learn to distinguish between various forms and prevent it from overfitting to a single vesicle shape. The three selected additional nanostructures are summarized in Fig. 1A: (i) multicompartment vesicles (MCV): unlike simple vesicles, these structures contain multiple hydrophilic cores within a single polymer membrane. (ii) Thick membrane multicompartment vesicles (TMCV): these vesicles represent an intermediate state between MCV and larger structures. They have a thicker polymer membrane, which makes them more stable before merging into larger aggregates. (iii) Large compound nano-objects (LCN): these structures are formed when multiple vesicles fuse together, leading to non-spherical morphologies. Their irregular shape differentiates them from traditional vesicles and provides additional complexity for the AI model to learn. These four nanostructures (V, MCV, TMCV and LCN), along with annotated scale bars, constitute the five object classes used to train the model.
Including these different morphologies improves the model's ability to distinguish between subtle structural variations and ensures better performance in real-world datasets. The dataset was built using 65 high-resolution TEM images.3,7,41–44
Training was conducted on PyTorch 2.0 with Ultralytics YOLOv8 using an Intel Core i7-1068NG7 processor under Ubuntu 20.04. The dataset was randomly split into training (72%), validation (17%), and test (11%) sets, ensuring a representative sample for model generalization. Class distributions were preserved across subsets using a fixed random seed to maintain consistency and reproducibility. The training set was used to optimize model parameters, the validation set helped fine-tune hyperparameters and prevent overfitting, while the test set provided an independent evaluation of model performance on unseen data. Key training settings included the use of the Adam optimizer with an initial learning rate of 0.001, an image size of 640 pixels, and a batch size of 1 due to CPU constraints. A total of 85 epochs were used, with data augmentation and image caching enabled to improve convergence.
In object detection tasks, models must learn to simultaneously identify the correct class of each object and accurately localize it within the image using bounding boxes. The performance of such models is therefore assessed using a combination of classification and localization metrics.47 The following evaluation metrics were used to monitor and compare model performance throughout training and validation. A brief description of each metric is provided in Table 1 to guide the interpretation of the results presented in Fig. 2 and 3. The detailed computation of these metrics is handled automatically by the Ultralytics YOLOv8 framework.46
Metric | Definition | Purpose and expected trend |
---|---|---|
Box loss | Measures the error in predicting object location (bounding box) | Evaluates localization accuracy. Should decrease and approach 0 |
Class loss | Measures the error in classifying detected objects | Assesses how well the model assigns labels. Should decrease and approach 0 |
Distribution focal loss (DFL) | Refines bounding box prediction by focusing on high-confidence areas | Improves regression accuracy. Should decrease and approach 0 |
Precision | Ratio of correct detections to total detections made by the model | Indicates reliability of predictions. Should increase and approach 1 |
Recall | Ratio of correctly detected objects to the total number of ground truth objects | Measures model's ability to find all objects. Should increase and approach 1 |
mAP50 | Mean average precision under a moderate detection threshold | Measures detection accuracy under moderate conditions. Should increase and approach 1 |
mAP50-95 | Average precision over a range of thresholds | Assesses detection performance across varying localization precision levels. Should increase and approach 1 |
The training process aims to minimize the loss functions (e.g., box loss, class loss, and DFL), which reflect errors in object localization and classification. Simultaneously, the objective is to maximize evaluation metrics such as precision, recall, and mean average precision (mAP), which indicate how accurately and comprehensively the model detects nanostructures. These indicators also help highlight trade-offs, such as under-detection (low recall) versus over-detection (low precision), and can signal overfitting if validation performance deteriorates while training accuracy continues to improve.
The training and evaluation of the YOLOv8 models for polymer nanostructure detection reveal distinct strengths and trade-offs between speed, accuracy, and generalization. Each model exhibits unique characteristics that make it more suitable for specific tasks, yet they remain complementary in their contributions to robust detection performance.
During training, all three models demonstrated a gradual reduction in loss values, with YOLOv8n stabilizing the fastest (Fig. 2). This model, being the smallest in terms of parameters, converged more quickly and maintained a relatively low training loss, indicating efficient learning with minimal overfitting. In contrast, YOLOv8m, with its significantly larger number of parameters, exhibited greater fluctuations in loss, suggesting a more complex optimization process. The longer training time of YOLOv8m (226 minutes) compared to those of YOLOv8n (48 minutes) and YOLOv8s (117 minutes) reflects the computational intensity required for more refined feature extraction. Despite its slower convergence, YOLOv8m's higher recall suggests that it is better at detecting a broader range of nanostructures, albeit at the cost of increased false positives. YOLOv8s, as an intermediary model, balanced both precision and recall, exhibiting moderate convergence speed and a stable reduction in loss values.
The performance metrics provide further insight into the models’ strengths. YOLOv8n excels in precision, achieving the highest score across most nanostructure classes, meaning that its predictions are highly reliable with fewer false positives. However, its recall is lower, indicating that while it detects structures with high confidence, it may miss some instances. On the other hand, YOLOv8m demonstrates superior recall, making it advantageous for detecting more instances of nanostructures, even if some false positives are introduced. YOLOv8s, positioned between these two extremes, achieves a well-balanced trade-off, making it a versatile option when both precision and recall are equally important.
The model-specific performance across different nanostructure classes further supports this complementarity (Fig. 3). YOLOv8m tends to perform better in detecting LCN and MCV, where structural complexity can challenge smaller models. Its ability to capture fine details makes it particularly useful for these intricate structures. Meanwhile, YOLOv8n performs exceptionally well in detecting V and scale bars, where distinct and well-defined edges allow for higher confidence in detection. YOLOv8s, once again, serves as a middle ground, performing consistently across all classes without being heavily biased toward either precision or recall.
Given these observations, it becomes evident that each model has distinct advantages depending on the detection criteria and computational constraints. Rather than favouring a single model, a more effective strategy is to leverage their complementary strengths. By combining the precision of YOLOv8n, the balanced performance of YOLOv8s, and the high recall of YOLOv8m, an optimized detection framework can be achieved. To this end, implementing Weighted Box Fusion (WBF) provides a means to integrate the predictions of all three models, capitalizing on their respective advantages while mitigating their individual weaknesses. This ensemble approach is expected to enhance both detection robustness and reliability, ensuring a more accurate and generalizable characterization of polymer nanostructures in TEM images. Details of the WBF implementation, including the fusion logic and parameters, are provided in our public code repository on Zenodo.39
The impact of WBF on nanostructure detection is illustrated in Fig. 4, where detection outputs from YOLOv8n, YOLOv8s, and YOLOv8m, and their fusion via WBF, are shown in the same unseen TEM image. Detected nanostructures are highlighted by bounding boxes, enabling a direct visual comparison of detection behavior across models. YOLOv8m tends to produce more detections but often assigns incorrect classes (misclassifications), while YOLOv8n is more conservative. WBF applies Soft Non-Maximum Suppression (Soft-NMS) to merge overlapping predictions and balance these extremes, resulting in a cleaner output with improved spatial localization. Examples of misclassified detections are highlighted with white arrows. These improvements are further reflected in the detection counts and confidence scores per class, as shown in Fig. 4E and F.
One of the key advantages of using YOLOv8 for nanostructure detection is its ability to provide automated size estimation. Unlike traditional manual methods (using ImageJ for instance), where individual objects must be segmented and measured—often requiring extensive time and user input—YOLOv8 enables rapid and systematic size quantification with minimal effort. By leveraging the bounding box dimensions, vesicle diameters can be efficiently estimated in real time, making this approach particularly well-suited for high-throughput nanostructure characterization.
As shown in Table 2, individual YOLO models exhibit variations in size estimations, particularly for LCN, where YOLOv8m tends to overestimate sizes. WBF refines these measurements by merging predictions across models, reducing variability and ensuring more consistent and reliable size estimates. Compared to manual ImageJ analysis, WBF values are in good agreement, especially for MCV and vesicles, while slight differences for LCN reflect known limitations of bounding-box-based estimation for irregular or compound structures. Notably, generating all class-specific size statistics with YOLOv8 and WBF required less than two seconds, whereas manual measurement of the same image in ImageJ took over 30 minutes, clearly demonstrating the efficiency advantage of our automated approach.
Model | Size (nm) | |||
---|---|---|---|---|
LCN | TMCV | MCV | V | |
YOLOv8n | 357.6 ± 107 | 122.8 ± 26 | 202.2 ± 32.8 | 80.6 ± 18 |
YOLOv8s | 334.2 ± 143 | 129 ± 32 | 209.8 ± 36 | 83.4 ± 18.8 |
YOLOv8m | 374.8 ± 136.4 | 128.4 ± 26.4 | 208.8 ± 37.6 | 87 ± 19.2 |
WBF | 308 ± 104 | 122.4 ± 26 | 203.8 ± 32.6 | 84 ± 18.2 |
ImageJ | 301 ± 63 | 132.1 ± 25.9 | 202.1 ± 41.6 | 78.2 ± 23.2 |
Several recent studies have explored deep learning-based approaches for nanoparticle or nanostructure detection in TEM images.23,28–30 These works mainly target rigid inorganic materials and rely on segmentation or classification strategies rather than real-time object detection. In contrast, this work focuses on soft polymeric morphologies such as vesicles and multicompartment structures, and leverages YOLOv8 combined with WBF to enhance detection robustness across morphologies. To our knowledge, no existing framework addresses this specific application space while enabling automated size estimation using embedded scale bars. This highlights the complementary nature and originality of the present approach.
As illustrated in Fig. 5, our model successfully detects vesicles across different datasets, demonstrating high confidence and accurate size estimation despite variations in contrast and imaging artifacts. The confidence score distributions remain consistent with the results obtained on our test dataset, reinforcing the model's reliability in identifying self-assembled nanostructures beyond the initial training conditions. Notably, the size distribution remains coherent with the expected vesicle dimensions, further validating the robustness of the detection approach.
![]() | ||
Fig. 5 Detection, size distribution, and confidence analysis of vesicles using the WBF-enhanced YOLOv8 model in unseen TEM images extracted from the literature. The middle panels show the corresponding size distributions (in nm) and the right panels present the distribution of detection confidence scores, along with the average ± standard deviation. Example 1 was adapted with permission from ref. 48. Copyright 2017, American Chemical Society. Example 2 was adapted with permission from ref. 6. Copyright 2011, Royal Society of Chemistry. Example 3 was adapted with permission from ref. 49. Copyright 2021, Wiley-VCH Verlag GmbH & Co. KGaA. |
These results highlight the generalization capability of our framework, emphasizing its applicability to a broad range of TEM datasets. This adaptability is particularly crucial for polymer self-assembly studies, where consistent nanostructure characterization across datasets is essential. Nevertheless, vesicles with distinct or complex morphologies—such as non-spherical aggregates, onion-like vesicles, or structures exhibiting extreme contrast variations—may not be reliably detected by our current model. In addition to morphological variability, image quality factors—such as resolution, signal-to-noise ratio, or contrast inconsistencies—can significantly affect detection confidence and size estimation, particularly for poorly resolved structures. Overcoming these limitations would benefit from targeted strategies, including advanced data augmentation techniques (e.g., synthetic contrast variation, controlled noise addition, and rotational or spatial transformations), transfer learning from larger microscopy datasets, and increasing dataset diversity by integrating publicly available, community-shared annotated TEM images.
To facilitate such improvements, our framework has been designed to be easily fine-tuned by future users, allowing them to adapt the model to their specific vesicle types by retraining on additional annotated datasets. This flexibility ensures that the model can be continually refined to meet the evolving needs of the polymer and soft matter research community, further extending its applicability to new and emerging nanostructures.
In this study, we developed a YOLOv8-based deep learning model to detect vesicles and other self-assembled nanostructures, enhancing generalization across diverse TEM datasets. Tested on independent TEM images from the literature, the model demonstrated high accuracy in recognizing vesicular structures, reinforcing its potential for automated, scalable, and unbiased nanostructure analysis.
As a proof of concept, DetectNano demonstrates that deep learning models can be effectively trained to analyze soft polymer nanostructures in TEM images. Its ability to provide accurate and reproducible vesicle morphology and size estimation makes it particularly relevant for applications such as drug delivery, where vesicle size influences the circulation time and targeting efficiency,50,51 or synthetic biology, where vesicle-based systems serve as protocells and compartments.52,53 Even in its current form, the framework can support high-throughput screening and quality control tasks in experimental workflows involving vesicular nanocarriers. Furthermore, as an open-source and modular platform, DetectNano is designed to evolve. By providing annotated datasets, pretrained models, and full source code, this framework offers a concrete and accessible entry point for non-specialists in AI to explore deep learning applications in nanoscience. With community-driven contributions, DetectNano could be extended toward more advanced implementations, including real-time analysis pipelines or in situ/flow-based TEM monitoring for continuous nanostructure detection.
Although the current dataset is sufficient to demonstrate proof-of-concept performance, its limited size and source diversity may restrict full generalization to highly heterogeneous TEM conditions. Most images originate from our previous work, potentially introducing bias in contrast and morphology representation. We acknowledge these limitations and recognize the importance of expanding the dataset through public repositories and broader community contributions. This step is essential to move toward robust, generalizable models applicable across varied polymer and nanomaterial systems.
Looking ahead, the next step is to develop a universal model capable of detecting a wide range of polymeric and inorganic nanostructures. Achieving this goal requires a collective effort from the scientific community, emphasizing the need for open-access datasets and collaborative model training. By sharing annotated TEM datasets and uniting efforts across disciplines, the community can accelerate the development of a robust, generalizable AI tool for nanomaterial characterization. Beyond a technical contribution, this study calls for collaborative efforts to harness AI in nanoscience.
This journal is © The Royal Society of Chemistry 2025 |