Open Access Article
Ye Lin†
abe,
Pingyang Sun†
*c,
Rongcheng Wu†
ab,
Shu Gengd,
Man Lung Yiue,
Zhidong Lib,
Fang Chenb,
Yu Gaoa,
Mingzhe Wang*f,
Kaiwen Sun
*c and
Xiaojing Hao*c
aMolly Wardaguga Institute for First Nations Birth Rights, Faculty of Health, Charles Darwin University, Brisbane, QLD 4000, Australia. E-mail: yu.gao@cdu.edu.au
bThe Data Science Institute, University of Technology Sydney, Sydney, NSW 2007, Australia. E-mail: rongcheng.wu@student.uts.edu.au
cSchool of Photovoltaic and Renewable Energy Engineering, University of New South Wales, Sydney, NSW 2052, Australia. E-mail: xj.hao@unsw.edu.au
dSchool of Chemical Engineering and Australian Centre for Nanomedicine (ACN), The University of New South Wales, Sydney, NSW 2052, Australia. E-mail: shu.geng@unsw.edu.au
eDepartment of Computing, The Hong Kong Polytechnic University, Hong Kong, 999077, China. E-mail: csmlyiu@polyu.edu.hk
fSchool of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710126, China. E-mail: wangmingzhe@xidian.edu.cn
First published on 26th November 2025
Photovoltaic electroluminescence (PVEL) imaging captures material-level degradation in PV modules and offers high-resolution input for machine learning (ML) models to perform automated fault detection and health evaluation, reducing reliance on manual inspection. It is expected to have a simple and efficient defect detection ML model to achieve accurate segmentation for the fine-featured identification of defects in fabricated PV modules. This study proposes a novel enhanced iterative autoencoder (EI-AE), a completely new model that differs fundamentally from existing approaches which rely directly on classical ML models for defect detection. The proposed EI-AE, which for the first time introduces an iterative mechanism into the traditional AE framework, features a simple yet effective architecture and achieves accurate unsupervised pixel-level segmentation of all defect types using only normal PVEL images. In addition, few-shot learning can be realized by extending the unsupervised EI-AE with a small number of annotated masks, allowing more detailed functional defect detection while mitigating background interference. Theoretical proof demonstrates the benefits of the proposed EI-AE in improving defect detection compared to the conventional AE. Experimental results further validate its superiority, showing consistently better performance across multiple pixel-level metrics and outperforming both widely used unsupervised and few-shot baseline approaches.
Broader contextPhotovoltaic (PV) systems are expanding rapidly worldwide, making reliable and cost-effective maintenance increasingly important. PV electroluminescence (PVEL) imaging provides high-resolution visual data that reveal material-level degradation in PV modules. These images are particularly valuable for machine learning (ML)-based automated fault detection and health assessment, reducing reliance on manual inspection. Achieving accurate segmentation of minute defects in fabricated PV modules requires a defect detection model that is both simple and efficient. This study presents an enhanced iterative autoencoder (EI-AE), a fundamentally new model that, for the first time, incorporates an iterative mechanism into the conventional AE framework. Unlike existing methods that rely directly on classical ML architectures, EI-AE features a simple yet effective design capable of performing fully unsupervised, pixel-level segmentation of all defect types using only normal PVEL images. Furthermore, by extending the unsupervised EI-AE with a small set of annotated masks, the framework supports few-shot learning, enabling more detailed functional defect identification while suppressing background interference. |
Machine learning (ML) can automatically detect defects in PVEL images by learning complex patterns,4,5 outperforming conventional image processing.6,7 While most ML methods address defect detection,8–13 advanced approaches perform defect segmentation to localize defective regions, as illustrated in Fig. 1. ML has been applied to (i) correlating defects with power output,14,15 (ii) detecting defects before lamination,16,17 (iii) enhancing image quality,18,19 and (iv) identifying defects in assembled modules.8,20–22 PVEL images of assembled PV modules contain rich spatial and intensity information that reflects subtle material and manufacturing defects, making them highly suitable for automated analysis.8 To identify defects in PVEL images of finished PV modules, three types of defect detection ML approaches can be utilized: (i) image-level binary and multi-class classification, (ii) bounding box-based object localization, and (iii) segmentation. This work focuses on the defect identification of fabricated PV modules, and adopts segmentation23 for pixel-level localization of complex defects to support automated EL image inspection.
Although image-level binary/multi-class classification is the most basic ML method for EL image defect detection, it assumes that all defect categories are fully defined and mutually exclusive. Convolutional neural networks (CNNs) dominate both tasks,8–10 employing architectures such as VGG16,11–13 high-resolution network (HRNet),9,10,24 and the combination of ResNet152, Xception, and coordinate attention (CA).25 Feature enhancement and model compression are achieved via the incorporation of histogram of oriented gradients (HoG)12 and knowledge distillation13 into VGG16. The current multi-class classification efforts mainly rely on CNN,20,26–36 support vector machine (SVM),26,30 and random forest (RF).30,37 In addition, transfer learning with compact architectures has been adopted to utilize pre-trained features,29 while architectural combinations33 and particle swarm optimization (PSO)35 further enhance accuracy and reduce model complexity. Other approaches include modified VGG19,28 unsupervised clustering,27 generative adversarial network (GAN)-based augmentation,32 fuzzy logic integration,34 and defect localization by YOLO.36 Unlike classification, bounding box methods detect multiple defects per module. Fusion of Faster regions with CNN features (R-CNN) and region-based fully convolutional network (R-FCN) outputs based on intersection over union (IoU) consistency improves accuracy and reduces false detections.21 Incorporating a complementary attention network (CAN) into a Faster R-CNN's region proposal network further enhances defect extraction.5 Mask R-CNN with a ResNet-101-FPN backbone detects fourteen defect types.38 However, the classification and bounding box-based defect localization tasks are often limited in providing detailed spatial information, which segmentation can overcome by delivering pixel-level defect mapping for precise assessment.
For the segmentation task, two paradigms can be utilized: (1) approaches that use a separate feature extractor followed by a segmentation procedure,22,39–41 and (2) end-to-end segmentation networks.42–46 Classification networks, such as ResNet1840 and ResNet50,22,41 can serve as backbone feature extractors, with their outputs passed to a segmentation head, such as autoencoder (AE)39 and DeepLabv3,41 for pixel-wise prediction. Instead of performing explicit segmentation, a ResNet-50 trained for classification is used to generate intermediate activation maps,22 whose spatial responses are interpreted as segmentation.
End-to-end segmentation networks are task-optimized for accurate, dense defect detection of PVEL images. A GAN is used to produce more realistic reconstructions of normal samples through adversarial training.42 However, GANs often suffer from training instability and typically require larger datasets, which limit their practicality in industrial settings. Different encoder–decoder NN architectures are also explored in PVEL image defect detection, including standard U-Net,44,46 U-Net with an attention mechanism,43 PSPNet46 and DeepLabv3+.46 Multiple combinations of encoder and decoder networks are explored,45 which are Mobile-net, ResNet, VGG-net, and U-net for the encoder part, while U-net, FCN-net, PSP-net, and SegNet for the decoder part. In addition, wavelet analysis is used to handle non-stationary textures in the segmentation of PVEL images,47 while K-Net has been used as a baseline method in segmentation tasks.48
Nevertheless, the current studies on defect detection of PVEL images using end-to-end segmentation networks are still in the stage of direct use of classical NN models or their simple combinations. These approaches often lack adaptability to subtle or complex defect patterns, especially in cases involving irregular morphology or background interference. To address this limitation, a novel enhanced iterative autoencoder (EI-AE) is proposed in this study to achieve simple and accurate unsupervised defect segmentation of PVEL images using only normal samples during training. The proposed EI-AE utilizes U-Net49 as encoder and decoder blocks, while iterative operations50 are implemented in each encoder and decoder to significantly (i) expand function space constraints (enhancing the ability to generalize from normal PVEL image patterns), (ii) prevent defect memorization (avoiding the model from incorrectly learning and reconstructing latent defects in normal-looking PVEL images), and (iii) improve multi-scale information representation (accurately detecting defects of varying sizes in PVEL images). In addition, by incorporating a multi-image fusion structure, the proposed EI-AE can be adapted to detect more specific defects using a few-shot approach with only a limited number of annotated functional defect masks.
543 near-infrared images (11
353 good images) of solar cells featuring various internal defects and heterogeneous backgrounds. The dataset includes one defect-free category and 12 distinct defect types, namely black cores, corner defects, cracks (non-star), finger interruptions, fragments, horizontal dislocations, printing errors, scratches, short-circuit defects, star cracks, thick lines, and vertical dislocations (two images of each category are shown in Fig. 2). The PVEL-AD dataset, with its long-tail distribution of defect types, provides a challenging while realistic benchmark for evaluating unsupervised and few-shot learning approaches. It is particularly well-suited for testing models (e.g. the proposed EI-AE), which performs segmentation using only normal samples, and can be extended with limited annotations to detect rare functional defects, addressing the annotation bottleneck in practical PV quality inspection.
Furthermore, to evaluate the segmentation performance under different settings of our proposed EI-AE, we manually annotated two types of segmentation masks (Mask A and Mask B) based on this dataset. All defects are captured in Mask A without explicitly defining the defect categories, while four specific defect types are annotated in Mask B. These additional annotations allow a more comprehensive assessment of segmentation accuracy and robustness.
![]() | ||
| Fig. 4 Functional defect masks by manual annotation (Mask B): (a) cracks (non-star), (b) finger interruptions, (c) scratches, and (d) star cracks. | ||
By refining the coarse bounding-box labels into pixel-precise segmentations, this annotation process ensures that our evaluations concentrate on those defects most critical to PV module reliability. Consequently, this approach enables an in-depth assessment of model performance in detecting and characterizing functionally significant defects, thereby facilitating more targeted strategies for quality control in smart manufacturing processes.
⊂
h×w×c, where h, w, and c indicate the height, width, and number of channels, respectively. For the defect detection task, a training set
train includes i normal EL images without abnormalities, while a test set
test consists of t defective EL images, where
train = {x1,x2,…,xi},
test = {x1,x2,…,xt}, and x ∈
. The learning objective aims to develop a model capable of performing pixel-level segmentation to detect defective regions within test images
test.
A convolutional AE is trained to minimize the reconstruction error Lrecon. on normal samples as:
![]() | (1) |
i denotes the reconstruction of the input image xi, expressed as:
i = fD(fE(xi)),
| (2) |
→
) that processes the input image to obtain a latent representation
(
⊂
h′×w′×c′, h′ < h, w′ < w, c′ ≥ c); fE is the decoder network (
→
), which takes the latent representation Z produced by the encoder and reconstructs the input image. A deep convolutional AE with depth N can be represented as:| FAE(x) = fD1(fD2(…fDN(fEN(…fE2(fE1(x)))))). | (3) |
Moreover, the parameters of each encoder and decoder block are given by
and
, respectively.
However, traditional AEs face several critical limitations when applied to industrial PVEL image defect detection:
(1) With a limited number of normal EL images, the AE is prone to overfitting, resulting in poor generalization.54 Although the training objective minimizes the reconstruction loss as eqn (1), the network will memorize training examples when the training set
train is small:
fD(fE(xi)) ≈ xi, ∀xi ∈ train.
| (4) |
Since anomalies are absent in training, the model fails to generalize to unseen test samples
test ∉
train, making it unreliable for defect detection.
(2) When a high representation capacity is present in the latent space, the AE reconstructs both normal and defective samples accurately, making defect detection ineffective.55 If the encoder fE maps inputs to a high-capacity latent space
, then for any input x, an expressive decoder can reconstruct it perfectly:
| zi = fE(xi), xi ≈ fD(zi), | (5) |
Since defective data points xanom, are also mapped to similar latent representations, their reconstructions remain accurate:
| ‖xanom − fD(fE(xanom))‖p ≈ 0 | (6) |
This contradicts the assumption that anomalies should have high reconstruction errors and thus reduces the effectiveness of defect detection.
(3) A standard AE reconstructs anomalies in a single resolution scale, lacking multi-scale feature extraction, which limits their ability to differentiate complex anomalies from normal variations.56,57 The single encoding-decoding operation is expressed as:
i = fD(fE(xi)).
| (7) |
iterative iterations, while 5 iterations are used in this model.A modified U-Net is implemented as an encoder in the compression stage by replacing the output layer of the standard U-Net with a convolutional layer with kernel size 2 and stride 2, so that the spatial dimensions (height and width) of the original input are reduced by half. This process is similar to the encoder in a standard AE, where the input is compressed in the compression layer.
However, unlike a standard AE, our approach employs a modified U-Net (U-Net-E) that iteratively refines the encoding process through
self-iterations within the encoder, continuously compressing the input into a lower-dimensional representation while sharing a common encoder fE with parameters ξfE:
S(j)ItC = fE(S(j−1)ItC;ξfE), j ∈ {1,2,3,…, },
| (8) |
transforms the input from one resolution level to a lower one, where
. The compression depth is limited to a maximum of
= 5 for 1024 × 1024 input images, as further downsampling leads to excessively small feature maps and prevents the model from functioning.
)ItC in the iterative compression stage. Similar to the U-Net-E in the encoder, we also modify a standard U-Net to a new U-Net (U-Net-D) by replacing the output layer with a transposed convolution (deconvolution) layer with a kernel size of 2 and a stride of 2, upsampling the low-dimensional features back to the original input size.Multiple self-iterations are performed in the new decoder to progressively upsample the low-dimensional features into high-dimensional representations, through
iterative iterations using a shared decoder fD with parameters ξfD:
S(k)ItR = fD(S(k−1)ItR;ξfD), k ∈ {1,2,3,…, },
| (9) |
)ItC, and the spatial dimensions are gradually restored to the original resolution. The shared decoder
maps data from a lower resolution level to a higher one. By performing
iterative reconstruction, the defective image will be reconstructed to their normal state S(
)ItR. Moreover, by successively subtracting the reconstructed normal images S(k)ItR from the input defective images in the test set
test, defect maps can be generated:
![]() | (10) |
However, the current defect maps reflect all the defects (dark regions) in the PVEL images, which may not be ideal for industrial detection in specific scenarios. In practice, industrial detection often aims to focus on critical failures, such as significant cracks, fingers and scratch, while ignoring less significant defects. Simply subtracting the reconstructed normal image from the defective image results in the extraction of both true anomalies and background clutter (false positive). To prioritize true defects, multi-image fusion detection is further used, which can selectively emphasize significant anomolies while minimizing the impact of background-induced noise in the final defect maps. The implementation details are provided in the following section.
The reconstructed
images and the input image (in total
+ 1 images in each set) are fed into a 3D U-Net (U-Net-Seg), as shown in the red concatenation paths of Fig. 5 and can be expressed as:
![]() | (11) |
![]() | (12) |
reconstructed images instead of a single one. This approach enhances structural consistency throughout the reconstruction process. The detailed configurations of pseudo masks and real functional defect masks are provided in Section 4.1.
and
. However, the proposed EI-AE uses a shared encoder fE and a shared decoder fD, each applied
times:
FEI-AE(x) = f(k)D(f(j)E(x)) = f( )D(f( −1)D(…f(1)D(f( )E(f( −1)E(…f(1)E(x)))))),
| (13) |
| FEI-AE ⊂ FAE, | (14) |
In addition, shared parameters ξfE and ξfD are present in the shared encoder fEi and decoder fDi, respectively. Through gradient accumulation during backpropagation, the parameter sharing enforces constraints that ensure scale consistency:
iterative image compression:
![]() | (15) |
iterative image reconstruction:
![]() | (16) |
The gradients are computed for all iterations and then combined, guiding the learning process to ensure that the parameters learned by the model remain consistent across all layers. This forces the model to learn representations that are consistent in the function space, thereby avoiding overfitting to any particular scale.
Such analysis demonstrates that the function space in the proposed EI-AE is strictly smaller than that of the conventional AE, further effectively limiting its capacity to memorize arbitrary defect patterns.
norm and
anom, respectively. The iterative architecture can be interpreted as utilizing regularization R(FEI-AE) on
norm to step-by-step enforce consistency, ensuring stable feature representation across
iterations:
![]() | (17) |
The expected reconstruction error for defective EL images in the proposed EI-AE, given by the expectation
x∈
anom and bounded below by:
x∈ anom[‖x − f( )D(f( )E(x;xfE);xfD)‖] ≥ c·min(d( norm, anom)),
| (18) |
norm,
anom)) refers to the minimum distance between normal and defective distributions, and c is a positive constant (c > 0) that is a function of the number of iterations
.
This sets a lower bound on the reconstruction error for anomalies, demonstrating that the used iterative architecture effectively prevents memorizing defective patterns without sacrificing the reconstruction of normal data. The detailed explanation is provided in S1.2 of the SI.
-step can capture multi-scale information with improved scale consistency. Starting from the iterative reconstruction stage S(k)ItR = f(k)D(f(
)E(x)), the mutual information Info(·;·) between
reconstructions is:| Info(S(k)ItR;x) ≥ Info(SI(k−1)ItR;x), | (19) |
![]() | (20) |
![]() | (21) |
Therefore, the proposed EI-AE inherently captures hierarchical information through iterative operation, allowing it to differentiate between normal and defective patterns. A detailed theoretical explanation can be found in S1.3 of the SI.
The U-Net-Seg network processes this 3D volume and outputs a single-channel prediction of shape h × w × 1, which is subsequently flattened to a 2D defect score map. The network is supervised using a pixel-wise ℓ1 loss between the predicted output and the GT mask, where pixels with value 1 indicate known defective regions. These ground truth masks are derived from the binary masks used for input corruption, guiding the model to focus on regions where the reconstruction deviates from expected normal patterns. In addition, 142 real functional defect masks (1.25% of good images) from the datasets described in Section 2.2 are included to achieve few-shot learning. These real masks will help U-Net-Seg selectively focus on the true defects.
P-AUROC measures the model's ability to distinguish between normal and defective pixels across all threshold values. P-AP is calculated as the area under the precision–recall (PR) curve, reflecting the trade-off between precision and recall. P-F1 is defined as the harmonic mean of precision and recall at the optimal threshold, providing a single-value summary of detection accuracy.
AUPRO evaluates detection performance across varying false positive rates by measuring how well predicted regions cover ground-truth defects. A-PRO calculates the proportion of ground-truth regions correctly detected, considering a prediction successful if the intersection over union (IoU) exceeds a predefined threshold (0.3 in this study).
Fig. 6 (segmentation maps) and Table 1 (metrics) illustrate the comparison results, obtained by directly computing pixel-wise differences between the reconstructed images and their original counterparts. For performance evaluation, the GT is defined using the full defect masks (Mask A) as described in Section 2.1. Although EI-AE achieves slightly lower P-AP (0.6434) and P-F1 (0.6739) compared to DRAEM (0.7248 and 0.7206 respectively), it consistently outperforms all baseline methods in the remaining three metrics: P-AUROC (0.8800), AUPRO (0.4074), and A-PRO (0.8557). These results indicate that EI-AE provides stronger global discrimination capability and better robustness in identifying true functional defects across diverse pixel regions, while DRAEM tends to focus more on pixel-level differences, which may lead to higher precision in localized areas but lower overall consistency and generalization performance. The performance of the conventional AE is poorest because the model tends to take a shortcut during reconstruction by simply copying the input to the output. As a result, it also learns to reconstruct the defects, making it difficult to distinguish them from normal regions [eqn (5) and (6)]. In general, the results demonstrate the strong generalization capability of the proposed EI-AE in scenarios that demand comprehensive detection of diverse defect types in PVEL images.
| Methods | P-AUROC | P-AP | P-F1 | AUPRO | A-PRO |
|---|---|---|---|---|---|
| a U-Net is trained with functional defect masks (Mask B) to enable few-shot segmentation. | |||||
| AE (unsupervised) | 0.5772 | 0.2002 | 0.3757 | 0.0239 | 0.0212 |
| EdgRec (unsupervised) | 0.5448 | 0.2687 | 0.3099 | 0.1515 | 0.4060 |
| DRAEM (unsupervised) | 0.8596 | 0.7248 | 0.7206 | 0.1251 | 0.8202 |
| U-Net (unsupervised) | 0.5349 | 0.2449 | 0.3096 | 0.1232 | 0.3104 |
| U-Net (few-shot)a | 0.7571 | 0.5379 | 0.6207 | 0.1625 | 0.6939 |
| EI-AE (unsupervised) | 0.8800 | 0.6434 | 0.6739 | 0.4074 | 0.8557 |
Notably, the maximum feasible compression depth is
= 5, as it represents the limit imposed by the 1024 × 1024 input resolution and the downsampling factor of two per stage. Beyond this depth, the feature maps shrink to sizes below 32 × 32, potentially resulting in loss of spatial context, misalignment in the decoder, and numerical instability due to over-compression.
Fig. 8 presents the segmentation results obtained by fusing the input image with five piecewise recovery images as input to U-Net-Seg, where only pseudo masks are used during training [eqn (11)]. Table 2 presents the quantitative comparison of different unsupervised methods for the “Crack (non-star)” category. The proposed EI-AE achieves the best overall performance, with the highest scores in P-AUROC (0.8868), AUPRO (0.6297), and A-PRO (0.8814), indicating superior pixel-wise discrimination and region-level defect localization. While P-AP (0.1428) and P-F1 (0.2076) are slightly lower than those of the U-Net baseline (0.2503 and 0.3355, respectively), EI-AE maintains a better balance across all metrics. This suggests that EI-AE avoids overfitting to local noise and generalizes better in complex surface defect scenarios. Although the integration of U-Net-Seg in the conventional AE significantly improves its performance, it still falls short of EI-AE, confirming the critical role of the embedded iterative operation in enhancing defect localization and suppression of irrelevant features.
| Methods | P-AUROC | P-AP | P-F1 | AUPRO | A-PRO |
|---|---|---|---|---|---|
| a Segmentation head (U-Net-Seg) is included in the conventional AE and the proposed EI-AE. | |||||
| AE (unsupervised) | 0.5108 | 0.0034 | 0.0074 | 0.5508 | 0.1512 |
| AE (unsupervised with U-Net-Seg)a | 0.8135 | 0.0651 | 0.1659 | 0.5508 | 0.7641 |
| EdgRec (unsupervised) | 0.6175 | 0.0068 | 0.0226 | 0.2534 | 0.4635 |
| DRAEM (unsupervised) | 0.6666 | 0.0491 | 0.1110 | 0.3368 | 0.5676 |
| U-Net (unsupervised) | 0.7353 | 0.2503 | 0.3355 | 0.4226 | 0.4985 |
| EI-AE (unsupervised)a | 0.8868 | 0.1428 | 0.2076 | 0.6297 | 0.8814 |
Despite comprising only 1.25% of the good images, the inclusion of true masks contributes to improved prediction accuracy and more precise defect segmentation with reduced background interference [Fig. 9(e)], as demonstrated in the comparison of EI-AE results in Experiment 2 [Fig. 8(h)]. An accuracy improvement of 3.04% is achieved in the few-shot setting (P-AUROC: 0.9138), compared to the baseline without few-shot learning (P-AUROC: 0.8868).
Table 3 compares the performance of AE, U-Net, and the proposed EI-AE in a few-shot defect segmentation setting. In terms of P-AUROC, EI-AE achieves the highest score of 0.9138, outperforming AE (0.8586) and U-Net (0.7923), indicating improved defect segmentation at the pixel level. For P-AP, EI-AE reaches 0.3425, which remains higher than that of U-Net (0.2870) and noticeably better than that of AE (0.2098), suggesting better precision–recall trade-off. The P-F1 of EI-AE is 0.4208, significantly higher than that of AE (0.3054) and slightly higher than that of U-Net (0.4205), reflecting more accurate segmentation boundaries. In terms of region-aware metrics, EI-AE also demonstrates competitive performance, achieving an AUPRO of 0.7576, slightly lower than that of AE (0.7916) but significantly higher than that of U-Net (0.5446). Similarly, the region-wise A-PRO of EI-AE reaches 0.8185, exceeding that of U-Net (0.6852) and only marginally lower than that of AE (0.8193). These results confirm the effectiveness of the enhanced iterative structure in improving segmentation accuracy and robustness with limited supervision.
To further illustrate how accuracy varies with different sampling sizes in the few-shot setting, we conduct an additional experiment testing 1-, 10-, and 50-shot scenarios, respectively. As shown in Table 4, the 1-shot case performs worse than the unsupervised one (0.8868) because a single labelled sample provides insufficient and potentially misleading supervision, disrupting the model's originally stable feature representation. However, as the number of shots increases, the accuracy improves rapidly and approaches saturation at 50-shot, demonstrating the model's strong few-shot learning capability.
| Methods | EI-AE (1-shot) (%) | EI-AE (10-shot) (%) | EI-AE (50-shot) (%) | EI-AE (142-shot) (%) |
|---|---|---|---|---|
| P-AUROC | 74.5 | 88.6 | 91.5 | 91.4 |
There are also some major challenges in practical deployment. First, building a larger set of high-quality PV-EL defect-free and defective images remains a challenge, especially for emerging PV materials such as perovskite and CIGS. Secondly, the current implementation does not explicitly distinguish defect types; integrating a dedicated classifier model would be a promising yet non-trivial step toward enabling automatic defect categorization. Lastly, in real production lines where PV panels may exhibit variations in shape, orientation, and layout, developing a model robust to such variations is still an open challenge. The proposed EI-AE paves the way for the development of more robust and adaptive models capable of handling these practical complexities.
Three experiments, including (1) unsupervised detection of all defects, (2) unsupervised detection with segmentation head, and (3) few-shot detection with segmentation head, progressively demonstrate the superior performance of the proposed EI-AE over conventional methods. The proposed EI-AE reduces the need for extensive labelled data, making it highly suitable for large-scale PV module inspection in industrial settings. Its simple design and adaptability to complex defects also support efficient deployment in real-world manufacturing and maintenance scenarios.
Footnote |
| † Y. Lin, P. Sun and R. Wu contributed equally to this work. |
| This journal is © The Royal Society of Chemistry 2026 |