A visual language model enabling intelligent nanomaterial scanning electron micrograph annotation

Abstract

Artificial intelligence (AI) has significantly advanced the research and development of materials science through data-driven approaches. However, the large number of labeled datasets required for AI to operate is difficult to obtain in current times due to the time-consuming and laborious nature of manual labeling. The morphology of nanomaterials is crucial for the study of their properties, and scanning electron microscopy is among the key techniques for morphology characterization. For nanomaterials, the structural complexity makes the annotation of Scanning Electron Microscopy (SEM) images extremely challenging, with very few labeled images available. Therefore, it is urgent to develop an automatic pattern recognition technology for SEM images of nanomaterials without relying on labeled data. In this paper, we develop the Scanning Electron Microscopy Vision-Language Model (SEM-VLM), which is a domain-specific adaptation of the Vision-Language Model (VLM) for nanomaterials science. The model is trained via contrastive learning on SEM image–text pairs extracted from the literature. SEM-VLM demonstrates superior cross-modal retrieval performance over the general-domain model Contrastive Language-Image Pretraining (CLIP) and random baselines with Recall@10 and Recall@50 metrics, and keyword searches show its robust capability to retrieve relevant images. SEM-VLM also achieves high accuracy in zero-shot classification through ensemble vision-language alignment, outperforming CLIP. In few-shot settings, SEM-VLM with 2.1% training labels exhibits superior performance compared with the fully supervised model (EMCNet: Graph-Nets for Electron Micrograph Classification). Activation mapping analysis reveals precise localization of critical nanoscale features (particles, holes, and probe tips), providing more interpretable results than conventional approaches while maintaining operational reliability. This multimodal framework reduces labeled dataset dependency by orders of magnitude and enables automated high-precision classification.

Graphical abstract: A visual language model enabling intelligent nanomaterial scanning electron micrograph annotation

Supplementary files

Article information

Article type
Paper
Submitted
17 Jul 2025
Accepted
15 Sep 2025
First published
22 Oct 2025

Nanoscale, 2025, Advance Article

A visual language model enabling intelligent nanomaterial scanning electron micrograph annotation

Y. Cai and H. Wang, Nanoscale, 2025, Advance Article , DOI: 10.1039/D5NR03027G

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements