Jun Chen Ngab,
Farina Muhamadb,
Pauline Shan Qing Yeohb,
Ziyi Hana,
Zanlin Qiuac,
Khin Wee Lai
*b and
Xiaoxu Zhao
*ac
aSchool of Materials Science and Engineering, Peking University, Beijing 100871, China. E-mail: khinwee.lai@um.edu.my; xiaoxuzhao@pku.edu.cn
bDepartment of Biomedical Engineering, Faculty of Engineering, Universiti Malaya, Kuala Lumpur 50603, Malaysia
cAI for Science Institute, Beijing, China
First published on 10th December 2025
Thickness measurement of two-dimensional (2D) materials is essential due to their thickness-dependent physical and optical properties. However, current thickness characterization techniques, e.g., Atomic Force Microscopy (AFM), suffer from limitations such as slow scanning, tip–sample artifacts, and low throughput. To address this, an Artificial Intelligence-based pipeline was proposed for estimating the thickness of 2D materials from Optical Microscopy (OM) images, offering a significantly faster and more efficient alternative. OM captures colour contrast due to thin-film interference, explained by Fresnel's law. These colour cues, along with morphological features (area and perimeter), were extracted from the regions of interest (ROIs) segmented using Otsu's thresholding. Several regression models, including Random Forest Regressor (RFR) and a shallow Multi-Layer Perceptron (MLP), were trained on augmented paired OM-AFM data. Both models performed well on representative 2D materials, e.g., In2Se3, under threshold-based segmentation, but only the MLP maintained strong accuracy with automated ROI detection using Cellpose, achieving excellent predictive performance (R2 = 0.947, MSE = 34.580 nm2, MAE = 4.696 nm, RMSE = 5.881 nm). Statistical analysis validated the model's generalizability across segmentation methods. Shapley Additive Explanations (SHAP) identified red and green intensities as key predictors, aligning with thin-film interference theory. Overall, this AI-based model provides a non-destructive, efficient alternative to AFM, allowing precise and continuous thickness estimation from small datasets with high robustness and generalizability.
Focusing on 2D materials, e.g., In2Se3, they typically exhibit triangular or hexagonal shapes and exist in multiple phases. These phases can be differentiated based on their band gaps and are commonly characterized by Raman17,18 and X-ray diffraction (XRD) analyses.19,20 For more accurate phase identification at the atomic level, advanced techniques such as high-resolution cross-sectional transmission electron microscopy (TEM) and scanning transmission electron microscopy (STEM) are employed, enabling atom-resolved imaging of the crystalline structures.21–23
Accurate thickness determination is essential, e.g., In2Se3, because its structural, electronic, and optical properties vary significantly with thickness, directly influencing device performances.24 Thickness plays a critical role in governing phase stability by modulating interlayer coupling, strain distribution, and symmetry breaking.25–29 These trends highlight how thickness is not merely a geometric parameter but a key driver of phase selection and functional performance in In2Se3-based applications. Therefore, precise thickness quantification is essential for understanding and optimizing the material's properties for targeted applications.
Consequently, several methods have been introduced to determine the thickness of 2D materials. Among the techniques, AFM is a popular technique for analyzing material surface properties, including thickness and topography, by providing high-resolution 3D surface profiles.30–32 Initially, AFM was limited to insulating materials, but advancements have evolved it into a versatile technique for characterizing surfaces at the atomic and nanometre scales.33,34 Its ability to deliver absolute height measurements makes it a preferred method for quantifying the thickness of 2D materials. However, AFM suffers from inherent limitations that restrict its practicality for high-throughput and large-area characterization. These include a slow scanning speed, limited vertical range, tip–sample interaction artifacts, and issues such as surface deformation, laser misalignment, tip contamination, and ambiguous contact points.35,36 Furthermore, the relatively low repeatability and acquisition speed hinder the generation of large, consistent datasets necessary for robust Machine Learning (ML) model development.37
Given these limitations, there is a growing need for automated and scalable alternatives, particularly those driven by Artificial Intelligence (AI), to accelerate the characterization of 2D materials. Even though AI-enhanced AFM workflows have already been explored to accelerate data analysis,38,39 reliance on AFM hardware still imposes throughput constraints. To address this, optical microscopy (OM) has been explored as a faster, non-destructive alternative. While OM does not provide direct thickness measurement, the thin-film interference effect produces colour contrast that correlates with flake thickness. While early approaches have leveraged optical contrast by creating empirical colour charts of fitting optical spectra using Fresnel-based models,40,41 these conventional methods are often labour-intensive and lack scalability. In response, AI has emerged as a powerful tool to automate and accelerate thickness estimation directly from OM images, enabling scalable and data-driven workflows suitable for high-throughput screening.42–46
Building on these insights, this study advances beyond conventional analytical and classification-based methods by implementing an AI-driven approach for continuous thickness estimation. Rather than classifying OM images into discrete layer numbers, this work aims to predict the actual height values, offering more detailed and quantitative information. Focusing specifically on In2Se3, a representative 2D material, the proposed method utilized surface morphological features, namely colour intensity and geometric measurements (area and perimeter), extracted from OM images. Several regression models were evaluated, and the MLP model was selected to learn both linear and non-linear correlations between these features and thickness due to its superior performance, offering an automated, data-driven alternative to manual or physics-based estimation techniques.
Among the trained regression models, RFR demonstrated the best performance based on standard regression metrics (SI Table S1). The results indicate that LR and RR performed poorly, even during the training phase, with R2 scores below 0.5, suggesting that these models could explain less than half of the variance in the data. This poor performance may be attributed to the simplicity of these models, which likely limits their ability to capture complex, non-linear relationships among the features. However, only RFR maintained consistent results between the training and testing phases, suggesting no signs of overfitting. Both XGBR and SVR, on the other hand, exhibited a significant performance drop in the testing phase, indicating overfitting.
After applying Grid Search Cross Validation (CV) and Randomized Search CV, respectively, to optimize model performance,48 the training performance of XGBR, RFR, and SVR improved compared to their default settings (SI Tables S2 and S3). Despite the improvement in training metrics, XGBR and SVR still exhibit clear signs of overfitting. The testing performance remained significantly lower than their training performance, indicating that the models failed to generalize well to unseen data. As such, both models were deemed unsuitable for further implementation for height prediction. For the RFR model, while training performance slightly improved after hyperparameter tuning, the performance on the testing dataset decreased, with the R2 score dropping from 0.997 (default) to 0.976 (Grid Search CV tuned) and 0.949 (Randomized Search tuned). This suggests that the default RFR model generalized better than the tuned version despite having slightly lower training accuracy. The smaller gap between training and testing performances in the default RFR model supports its superior generalization ability.
Overfitting is a common problem in ML, especially when working with limited datasets. It occurs when the model tends to memorize the training data rather than learning the underlying patterns or relationships.49 Consequently, the model performs well on training data but fails to generalize to unseen data, leading to poor performance during testing.50 For the MLP, the relatively small dataset used in this study necessitated careful tuning to prevent overfitting (SI Table S4). The MLP was selected because, unlike conventional machine learning algorithms, it incorporates interconnected layers that can capture complex relationships among features. At the same time, the network was kept relatively shallow, making it more suitable for this application by reducing the risk of over-interpreting the relationships between input features. The model's best performance was achieved when the number of neurons in each layer decreased from 512 to 64, the learning rate was 0.001, the weight decay was 0.001, and the cross-validation fold was set to 5 (model 6). It showed an average R2 of 0.973 across the cross-validation folds, with an MSE of 45.996 nm2, an MAE of 5.743 nm, and an RMSE of 6.782 nm. These values indicate strong model performance in predicting the AFM height. The model's generalization ability was further validated using the test dataset, achieving an R2 of 0.947, an MSE of 34.580 nm2, an MAE of 4.696 nm, and an RMSE of 5.881 nm. Since these values remained consistent with the training results, it confirmed that no overfitting occurred, demonstrating the model's robustness in AFM height prediction. Although some models, such as models 7 and 13, also showed no signs of overfitting, their predictive performance was inferior to that of the selected model, particularly in terms of MAE, MSE, and RMSE. As a result, model 6 was identified as the most reliable and was selected for further exploration.
According to SHAP analysis for model interpretability,51 as illustrated in Fig. 2b, the RFR model ranked area as the most influential feature, followed by perimeter and RGB intensity features. However, the relatively low mean SHAP values (<1.0) suggest that RFR does not heavily rely on any single feature, suggesting a more distributed learning pattern and possibly robust learning behaviour. The dispersion of SHAP values in the summary plot (Fig. 2b) further implies non-linear and inconsistent feature interactions. In contrast, the MLP model exhibits clearer and more concentrated feature dependencies (Fig. 2d). Red intensity emerged as the most influential factor, followed by green intensity, area, blue intensity, and perimeter. The summary plot (Fig. 2d) highlights red and green intensities as the most influential features in AFM height prediction. While their exact directional effects vary, both channels show stronger contributions compared to other features, supporting their importance in the model's decision-making. The dominance of red intensity indicates a strong feature-to-output correlation, consistent with thin-film interference effects in OM imaging. In short, the MLP exhibits more pronounced and interpretable feature contributions than RFR, aligning well with the underlying physical principles. This reinforces the suitability of MLP for AFM height prediction based on OM features, especially when interpretability and physical relevance are critical.
For the MLP model, the Shapiro–Wilk test indicated non-normality of prediction differences between segmentation methods (p < 0.05), leading to Wilcoxon signed-rank testing, which showed no significant difference (p > 0.05). Across both the test and full datasets, predictions from both segmentation methods were statistically consistent with ground truth (p > 0.05) as confirmed by paired t-tests. Effect size analysis supported this finding, with only a small Hedges’ g value, indicating that the automated method produces measurements highly comparable to the original baseline. The MLP's strong regression performance (R2 = 0.978, MSE = 30.112 nm2, RMSE = 5.487 nm, and MAE = 3.820 nm) confirms high predictive accuracy and generalization to automated segmentation, without signs of overfitting, corroborating the SHAP analysis, which revealed the MLP model's strong reliance on specific features, particularly red and green intensities, followed by area, allowing the MLP model to maintain predictive stability, even when segmentation-induced variations.
Collectively, these results highlight the practical advantage of the automated segmentation workflow, which achieves prediction performance statistically indistinguishable from the threshold-based approach while offering greater scalability and consistency. Due to the limited sample size, evaluations were conducted on the full dataset to prevent a fully independent generalization assessment. Nevertheless, consistency across statistical tests, effect-size analysis, and SHAP-based interpretability provides strong evidence that the automated method can reliably replace threshold segmentation in this context. Using the MLP as the baseline, the automated workflow generates predictions in approximately 1 second, allowing for high-throughput applications. Additional evaluation on the held-out validation data confirmed that the model generalizes effectively to unseen data, supporting its use for robust and scalable thickness prediction. The segmented outcome of the augmented images (see SI Fig. 1) shows slight variations in the segmentation masks, which can lead to minor differences in predicted values. However, because the masks are largely consistent, these differences do not result in statistically significant deviations from the ground truth.
As In2Se3 exhibits ferroelectricity, its spontaneous polarization alters the internal electronic distribution, directly impacting the material's complex dielectric function, ε = ε1 + iε2, where ε1 and ε2 are the real and imaginary parts, respectively. Given that the refractive index, n, is related to the dielectric function by
This thickness-dependent optical behaviour, combined with the material's anisotropy, is crucial for understanding its visual appearance. In2Se3 is an anisotropic material, meaning its optical properties vary with direction.53 One manifestation is birefringence, where an incident light ray splits into two rays with different velocities and polarizations. Fresnel's law describes how reflection and transmission at an interface depend on both the angle and polarization, with the overall complex reflection coefficient for a thin film system given by:
Here, r12 and r23 are the complex amplitude reflection coefficients at the top and bottom interfaces, respectively (which are polarization- and angle-dependent), and e2iβ accounts for the phase accumulation across the film, with
While the ferroelectric polarization and anisotropy-driven effects modulate the reflected light intensity, the primary factor contributing to the significant colour contrast observed in the OM images of In2Se3 is the dominant thin-film interference arising from its thickness variation. This interference originates from light reflections at the top and bottom surfaces of the thin film, leading to constructive or destructive interference depending on the optical path difference, governed by: 2nd
cos
θ = mλ where n is the refractive index, d is the film thickness, θ is the refraction angle of the light within the film, m is the order of interference, and λ is the wavelength. As a result, regions of differing thicknesses appear as different colours under OM.40,41 This makes OM a fast, non-destructive, and widely accessible way for estimating the thickness of 2D materials, aligning with the core objective of this study. This contrast-based method has been previously validated for thickness estimation. For instance, it was successfully applied to determine the number of layers in graphene using its refractive index.54
Consistent with findings, uniform colour in OM images reflects uniform thickness, as explained by thin-film interference theory; regions of equal thickness produce the same interference conditions and, thus, reflect similar colour.27 It is further demonstrated that thinner regions appear orange (lighter), while intermediate regions appear blue (darker), forming a colour gradient consistent with varying optical path lengths. Thicker regions often appear white due to broadband constructive interference and additional scattering effects.55 Complementing these optical insights, a positive correlation between PL intensity and In2Se3 thickness was reported, suggesting that both colour contrasts in OM and PL response serve as indirect indicators of thickness, shaped by the nanoscale geometry and optical interactions.56
Generally, AFM-observed higher thickness corresponds to a higher refractive index and a lower bandgap, factors that influence the observed colour via thin-film interference. The refractive index and bandgap are inversely related, as described by the Moss relation, n4 × Eg = constant. A higher refractive index implies more densely packed electronic states and, consequently, a reduced bandgap due to quantum confinement effects. The thickness-dependent bandgap of α-phase In2Se3 was experimentally confirmed by using electron energy loss spectroscopy (EELS) to demonstrate that the bandgap increases from 1.44 eV at 48 nm thickness to 1.64 eV at 8 nm.57 Their results, supported by Density Functional Theory (DFT) calculations, are consistent with the quantum confinement model. These interlinked changes, thickness, refractive index, and bandgap directly affect the OM colour contrast, making it a valuable proxy for assessing both the thickness and electronic structure. Fig. 3d illustrates how these properties vary with thickness. These observations are reinforced by SHAP analysis, which identified red and green colour intensities as the most influential features in predicting the AFM-measured height. This supports the conclusion that OM contrast, particularly colour, is significantly informative of surface topography.
Building on the Fresnel interference theory, the observed optical contrast in In2Se3 is closely tied to its thickness-dependent dielectric behaviour. As a ferroelectric material, In2Se3 exhibits spontaneous polarization, which alters its internal electronic distribution and modulates its complex dielectric function. This directly affects the refractive index and consequently the optical contrast observed in OM images. Such contrast is not merely a visual artifact but reflects real changes in the electronic structure, enabling thickness estimation through colour-based analysis.
Moreover, the dielectric constant of In2Se3 has been shown to increase monotonically with the number of layers, saturating at the bulk value beyond eight layers. This indicates a strong correlation between thickness and intrinsic properties such as phase stability, interlayer coupling, and ferroelectric switching behaviour. The proposed AI-based approach, by providing continuous height values rather than discrete layer classification, enables a more nuanced analysis of these thickness-dependent properties. This not only aids in rapid and non-destructive thickness estimation but also facilitates phase identification and material characterization, which are crucial for optimizing the use of In2Se3 in memory devices, optoelectronics, and other semiconductor applications.
To enhance the generalizability of the trained AI model across OM images captured under varying lightning conditions, histogram matching was applied as a preprocessing step to unseen images. This technique adjusts the RGB intensity distribution of the new images to match that of the training dataset. Such correction is particularly critical in OM-based thickness estimation, where contrast variations arising from thin-film interference serve as key predictive features. Differences in illumination or imaging parameters can alter these colour cues, potentially leading to inaccurate predictions. By standardizing the colour distribution, histogram matching improves the consistency of feature extraction and supports more reliable thickness estimation across diverse imaging scenarios. SI Fig. 2 and 3 illustrate the application of this correction under different brightness and lighting conditions, respectively, demonstrating the enhanced robustness and generalization capability of the trained model.
Future work could extend this approach to a broader range of 2D materials and incorporate alternative loss functions or physics-informed regularization strategies, such as physics-informed neural networks, to further improve both understanding and predictive accuracy. Since the current model was trained on a limited dataset and relies solely on AFM and OM imaging, future studies could benefit from expanding the dataset and expanding the model across different material systems or varying imaging conditions to better assess its generalizability. Such expansion would help evaluate the model's robustness against variations in surface morphology, imaging artifacts, and sample preparation procedures. Building on the strong foundation established in this work, the proposed pipeline has significant potential for integration into high-throughput experimental workflows, rapid quality control during material synthesis, and scalable industrial applications where non-destructive and automated characterization, particularly for thickness estimation, is crucial.
| 2InCl3 + 3Se + 6H2 = In2Se3 + 6HCl↑. |
After the growth process, the system was naturally cooled to room temperature. The resulting samples were attached to a SiO2/Si substrate (the thickness of SiO2 = 285 nm) for optical characterization. A Nikon optical microscope (ECLIPSE LV100D) was used for morphological observation. The thickness of In2Se3 was measured using AFM with a Bruker Dimension Icon system.
Features were standardized or MinMax scaled based on type. Five conventional regressors (XGBR, RFR, SVR, LR, RR) were tuned through Grid Search CV. The MLP included ReLU activations, dropout, and batch normalization, and was optimized using the Adam optimizer with MSE loss.
Supplementary information: discussion on the hyperparameter tuning during model training, the application of histogram matching as a correction step, and its effect on the model performance. See DOI: https://doi.org/10.1039/d5nr03320a.
| This journal is © The Royal Society of Chemistry 2026 |