Haochong
Huang†
*,
Haichao
Huang†
,
Zhiyuan
Zheng
and
Lu
Gao
School of Science, China University of Geosciences (Beijing), Beijing, 100083, China. E-mail: hchhuang@cugb.edu.cn
First published on 27th January 2025
This paper introduces the infrared crystal phase, and provides unconventional mechanistic insights into the commonly thought “crystal phase”. The critical challenge with the obtained fidelity phase is the unwrapping process, which was addressed using self-attention mechanism deep learning infrared-band holography. This method strikes a balance between the theoretical rigor of physical models and the flexibility of data-driven approaches. Specifically, utilizing a short wavelength infrared digital holographic system and algorithm resulted in the acquisition of high-quality wrapped phases. Then, the network architecture was applied for phase unwrapping. Through demonstrative applications, static phase-type thickness variation was measured in samples. During moments of intense phase transitions, the microstructural evolution of Na2CO3 crystals was monitored, and the process of perovskite material film formation was observed. The results demonstrated that environmental detection noise and twin images were effectively suppressed, and phase values were also dramatically varied after stabilization of the traditional amplitude signal. These discoveries guide the characterization of novel materials and also provide insights into alterations of properties during crystal preparation and growth, which is crucial for the final outcome.
Digital holography (DH) plays a pivotal role in micro-crystal development and material nature analysis.7–9 Compared to the aforementioned approach, DH can provide much richer structural data. The core challenge of this technology lies in phase unwrapping (PU),10 which involves extracting phase information from the intensity measurements of the optical field. Traditional PU methods, such as path-tracking algorithms and minimum norm strategies, continue to be hampered by issues such as path discontinuity and twin-image artifacts.11
To overcome these problems, researchers have explored novel avenues by machine learning techniques.12–14 For instance, Galande et al. attempted to enhance phase recovery quality by combining deep networks with explicit denoisers, yet this approach is parameter-sensitive and poses challenges in dynamic imaging.15 Dyomin et al. studied ZnGeP2 crystal properties within a DH framework, although real-time reconstruction remains a question.16,17 Moreover, end-to-end hologram-to-phasemap recovery methods rely heavily on high-quality annotated datasets of a particular sample and are limited in their interpretability and flexibility.18–20 Similarly, the PU-generative adversarial network (GAN) proposed by Zhou et al. directly recovers continuous phases from wrapped phases (WP), which is inspiring for the crystal phase in PU of DH.21,22
Moreover, the traditional concept of phase is regarded as the crystal phase, while herein, the phase of the light field specifically refers to that studied in the infrared band. Therefore, we present unconventional mechanistic insights into the crystal phase, namely the infrared crystal phase.23 This mode integrates a physical model and algorithm with data-driven methods for high-precision imaging in complex environments. Specifically, an infrared-band digital in-line holographic system is used to capture holograms.24–26 Then, it freely propagates through the sample plane and detector plane via transform. After applying the arctangent function on the resulting complex area, WP are obtained and use network unwrapping. Experiments demonstrate that this mode excels in static and dynamic scenarios.27,28
During upsampling via transposed convolution, a checkerboard effect issue may emerge,29 disrupting the spatial continuity and accuracy crucial for the PU task. In contrast, using bilinear interpolation, which is based on the weighted averaging of surrounding pixel values, can effectively avert the generation of such unnatural patterns. Moreover, in high-resolution PU assignments, transposed convolution entails intricate convolutional kernel operations with high computational complexity. Nevertheless, the modification can achieve upsampling through linear weighted calculations, thereby reducing the numerical burden of the model to some extent and enhancing operational efficiency.
Finally, the skip connections reintegrate the high-level data from the encoder into the decoder, which significantly boosts detail restoration and the precision of PU. Based on these, the BSRU-Net is introduced. Additionally, the SE is replaced with the Efficient Channel Attention, and the Convolutional Block Attention Module to form BERU-Net and BCRU-Net, respectively. These two variants are used for comparative tests to validate BSRU-Net's reliability and adaptability.30
000 phase map pairs, with 20
000 for training or validation and 2000 for testing.
The generation process involves: (1) creating a random initial matrix of 2 × 2 to 24 × 24 dimensions with random values, ensuring data diversity and model generalization to avoid overfitting; (2) upscaling and cropping to 256 × 256 to avoid low peripheral data; (3) scaling of matrix values to [0, 80], with 60% in [0, 60], and the rest evenly split between [60, 70] and [70, 80] to ensure balance within the data set.
![]() | (1) |
This study employs the peak signal-to-noise ratio (PSNR) to quantify the discrepancies between the reconstructed and original images. It is defined by eqn (2) and (3):
![]() | (2) |
![]() | (3) |
The structural similarity index (SSIM) serves as a critical tool for evaluating PU image fidelity. It is defined as eqn (4):
![]() | (4) |
shows the predicted result, μf and σf2 correspondingly denote f’s the mean and variance, σ
ϕ denotes the covariance between
and ϕ, and c1 and c2 denote constants.
SSIM may not be able to fully capture the negative impacts brought about by aliasing, and its sensitivity to local high-frequency detail changes is relatively limited. However, binary error mapping (BEM)31 can quantify the performance of the model in handling these sensitive regions. Thus, BEM is applied according to eqn (5), but it contains defects in the fault tolerance regarding the accuracy of unwrapping (AU) and the precision of recovering high phase when the WP and the RP are very similar. In this study, bias factor α = max (phase value)/200 for low-phase error allowance and 1% exactness for high-phase are introduced. These visually show pixel differences and boost PU evaluation comprehensiveness.
![]() | (5) |
![]() | (6) |
To test BSRU-Net's generalizability, real phase values are necessary, although difficult to obtain, but verifying its output is crucial. Therefore, four complicated phase samples were created—‘C’, ‘NLP’, ‘RST’, and ‘CUGB’. At first, ‘RST’ was more complex and challenging than ‘C’. DLPU's AU for ‘RST’ is 23.3% of ‘C’, and showed instability with high-phase data in Fig. 3(e1) and (e3). However, BERU-Net increased AU by 17.04%, BCRU-Net decreased by 7.32%, and BSRU-Net only decreased by 2.32%, thus achieving 96.56% accuracy. The results underscore BSRU-Net's robust reliability and precision in PU for sophisticated samples.
Then, ‘NLP’ exhibited a higher AU and SSIM across methods, as shown in Fig. 3(d2)–(h2). However, PUMA's output demonstrated that the edge continuity was not satisfactory. When dealing with complex samples (except for ‘NLP’), all its indicator values were inferior to those of other approaches. In addition, large areas of black regions were found in the corresponding BEM diagram. Thus, PUMA is characterized by insufficient generalization ability and low stability.
‘RST’ caused significant negative fluctuations in all networks except BSRU-Net, as shown in Fig. 3(j3). Moreover, ‘CUGB’ exhibited aliasing, and possesses the highest phase height and maximum object distribution, as shown in Fig. 3(a4). However, BSRU-Net achieved a SSIM of 0.9908 and AU of 86.25%, outperforming the others. Concurrently, Fig. 3(j1)–(j4) also shows it undergoing the smallest fluctuation, with an average SSIM of 0.996 and AU stability at 95.35%. Thus, it is a promising tool for real PU applications, effectively mapping from WP to RP.
To enhance the persuasiveness of the BSRU-Net experimental results, several algorithms were selected for comparative analysis. The state of Natron (Na2CO3 and Na2CO3·10H2O) crystal at a certain moment in the crystallization process was selected to study. The presence of wire-drawing and plaque phenomena in Fig. 4(a7) indicates that PU cannot be effectively carried out.33Fig. 4(a3) shows that there are aliasing regions. When processing such areas, the methods employed in Fig. 4(a5) and (a6) result in poor unwrapping performance, with both showing discontinuous phase sections. The computational efficiency of minimum cost flow (MCF) is the lowest,34 with the current output time reaching up to 88 seconds. Moreover, as the complexity of the WP increased, there was a further downward trend in its efficiency. In contrast, the unwrapping effect demonstrated in Fig. 4(a4) is superior to the former two. One possible reason for this is that the LS method achieves PU by minimizing the error of the entire image. The interrelationships among all pixel points in the image are fully considered with this method, which assists in maintaining the continuity of the phase and thereby usefully suppressing noise. However, the implementation process of the PUMA is relatively complex. Its optimization procedure based on graph cuts is more sensitive to noise, leading to a reduction in the accuracy of the unwrapping results.
In Fig. 4(a9), due to the influence of noise, compared with LS, the curve fluctuations of the PUMA and MCF algorithms are extremely significant. However, BSRU-Net is stable, and its trend is more consistent compared with other line segments, thereby indicating certain advantages in capturing the overall trend of phase changes in crystals. In addition, except for BSRU-Net, there was significant interference in the other algorithms by twin images and uneven light distribution noise outside the object region. The existence of a large amount of salt-and-pepper noise can also be observed from Fig. 4(a3).
To ensure the equity of the comparison, wavelet transform, median filtering, and mean filtering methods were adopted for denoising treatment, and then, the unwrapping operation was subsequently performed. However, the results in Fig. 4(b1)–(d5) show that there is a loss of WP information after denoising, and the unwrapping effects of the corresponding methods are not satisfactory. In subsequent studies, the LS method will be selected for comparison. To avoid the loss of phase information, no denoising treatment will be carried out, and the noise resistance performance of the algorithm will completely depend on the algorithm itself. Current research findings indicate that under such circumstances, BSRU-Net's capability demonstrates certain advantages among them.
Subsequently, the BSRU-Net was applied to measure the phase-type thickness of samples and to explore the variations in the physical properties of sodium carbonate crystals and perovskite crystals. In addition, although the network outperformed the traditional method in simulated data, LS served as a control to bolster the validity of the findings. The trials were as follows. First, the workpiece was laser-engraved with a ‘CUGB’ pattern to a depth of 316 nm, creating subtle height differences from the overall flatness, resulting in significant phase modulation effects. The selected depth of 316 nm was approximately one-fifth of the wavelength of 1550 nm, thereby indicating a very high precision. Comparing Fig. 5(a4) and (a5), the outcome illustrates that CNN effectively reduces background noise and illumination inconsistencies, thus enhancing the fidelity of the workpiece's surface representation. In Fig. 5(b2), the contours of ‘CUGB’ are precisely outlined, and no twin image is detected. Moreover, in the case of this small depth, the phase value was easily overwhelmed by noise. Instead, analysis of phase differences from Fig. 5(a5) revealed an average groove depth of 312.47 nm, and a low error rate of 1.12%, which closely matched the pre-experiment measurements. Furthermore, as evident in Fig. 5(b3), the network-generated result curve exhibited reduced fluctuations and a smoother profile compared to the LS outcome, clearly suppressing the noise impact. These data not only verify the high precision of the net in PU of real samples, but also highlight its outstanding capability to suppress noise and extract refined phase data. These advantages of BSRU-Net are of great significance for the real-time monitoring of crystals.
![]() | ||
| Fig. 5 (a1) The ‘CUGB’ hologram; (a2–a3) Amplitude and WP. Reprinted and reused with permission from ref. 4 © American Chemical Society. (a4–a5) LS unwrapped phase and BSRU-Net output. (b1–b3) Microscopic photo, (a5)'s 3D view, and amplitude-phase select line comparisons for (a2), (a4), and (a5). | ||
Additionally, 1.5 μL of a 20.6% Na2CO3 solution was utilized in the experiment to observe the crystal growth process, particularly during moments of intense phase transitions. By using this technique, the features of crystals can be detected and characterized, and stages of crystal growth at those times can be further determined, such as the nucleation period in Fig. 6(e1), the growth step in Fig. 6(e2)–(e4), and the maturation era in Fig. 6(e5). Notably, regions outside the object showed stripe noise and uneven light, as seen in Fig. 6(d1)–(d5), suggesting that LS is notably susceptible to noise during crystallization. In contrast, BSRU-Net's output at corresponding time instances demonstrated substantial noise suppression. It has also been observed from the result graph of LS that due to the pseudo-lens effect, a phase-free region will appear at the edge of the water droplet. However, because the network adopts an end-to-end data-driven processing mechanism, this problem does not exist in the training set.
While continuously learning and adapting to various changes and abnormal situations in the data, the model utilizes its internal parameter adjustment system and feature-processing logic to reasonably repair and reconstruct the phase information at the edge of the droplet, successfully suppressing the influence of this problem. During the crystallization process of Na2CO3, phase shifts arise due to physical changes such as varying crystal part growth rates, environmental factors such as temperature and pressure, and internal crystal stresses, which cause minute changes in the optical path. In DH, it can be leveraged to study crystal growth dynamics and material properties. For instance, the irregularities at the edges of the sample's main region in Fig. 6(e1)–(e5) reflect the actual physical changes, yet this genuine information is obscured by LS, and the edge information is weakened. This suggests that LS fails to capture detailed information on the complex surface of crystals.
As shown in the segments of Fig. 6(f1)–(f5), the middle part contains phase information, yet amplitude cannot represent it. The figure shows that the amplitude curve and the LS phase curve fluctuate extremely violently, indicating that it is susceptible to noise interference. Although the segment trends of BSRU-Net and LS are alike to a certain extent, the phase curve of the network is relatively smooth and exhibits spatial continuity. The experimental results demonstrate that BSRU-Net, harnessing SE attention, pinpoints crucial information, leveraging its architecture to mitigate interference. As a result, its curve distributions more accurately mirror the dynamics and authenticity of complex crystal material.
The crystallization of the CsPbBr3 solution and film formation were also monitored.35 Because the perovskite material may exhibit high photosensitivity in the infrared region, using high-intensity infrared light as much as possible can more effectively activate the photoresponse of the material, thereby achieving higher sensitivity during the hologram recording process. However, due to the light-gathering effect, the formation of this strong light area occurs, which will interfere with the phase information within the area.
Based on the collected data, the second and third rows of Fig. 7 show the results obtained by performing diffraction propagation on the corresponding holograms at specific moments. The root cause of this effect lies in the limited dynamic range of the infrared detector in the detection system, leading to the hologram acquisition reaching the threshold. However, studying the phase information during the perovskite film-forming process, especially during moments of intense phase transitions, is of great significance for the characterization of new materials. Therefore, regions of interest were selected for research.
Fig. 7(d1)–(d8) shows that the crystal variations are dramatic, with phase shifts leading to uneven light field distribution and noise, such as interference fringe in areas other than the object. In response to this challenge, LS smoothed out the actual physical phenomena, blurring the edges and structural information of the object. In contrast, at the same moment, BSRU-Net effectively suppressed noise such as light spots and characterized the object's contour and structural data more clearly, as shown in Fig. 7(e1)–(e8). The net also successfully suppressed the influence of the pseudo-lens effect.
Moreover, a comparison of the selected line segments indicated that the amplitude values were stable and failed to effectively represent the object information, while the phase data from LS tended to be smooth. However, CsPbBr3 perovskite crystals exhibited satisfactory photoelectric effects, which indicated a complex crystal structure. The line segment changes presented by the BSRU-Net at different moments are more consistent with the actual crystal structure compared to LS in Fig. 7(f1)–(f8).
Finally, based on the line segment changes at different moments during the process of Na2CO3 crystal development and perovskite film formation, it was demonstrated that throughout the crystal growth process, even after the traditional amplitude signal ceased to change, the phase values continued to exhibit dramatic physical phenomena, offering a guide for the preparation and characterization of novel materials.
Ablation experiments assessed model modules' impact on performance, with ‘A’ using nnU-Net, ‘B’ employing RU-Net, ‘C’ utilizing SRU-Net, and ‘D’ applying BSRU-Net. There was more rapid convergence with ‘D’, and lower loss, as shown in Fig. 8(e), indicating the effectiveness of its refinements. On the validation set, ‘D’ displayed the highest stability by the 50th epoch. Table 1 exhibits performance metrics as averages from the test set. ‘B’ increased M-PSNR and M-SSIM by 1.6 and 0.0007 over Set ‘A’. ‘C’ enhanced M-PSNR and decreased M-MSE by 1.79 and 0.098 over ‘B’. Despite having fewer parameters, ‘D’ possessed the strongest expressive capability and superior metrics, reducing parameters by approximately 8% compared to ‘C’, and avoiding overfitting. Thus, BSRU-Net's balance of complexity and performance offers insights for future PU model design.
| Experiment | R | SE | Bilinear | M-PSNR | M-SSIM | M-MSE | Params (M) |
|---|---|---|---|---|---|---|---|
| A | — | — | — | 44.72 | 0.9976 | 0.151 | 93.20 |
| B | ✓ | — | — | 46.32 | 0.9983 | 0.157 | 100.37 |
| C | ✓ | ✓ | — | 48.11 | 0.9984 | 0.059 | 100.38 |
| D | ✓ | ✓ | ✓ | 52.10 | 0.9993 | 0.049 | 92.35 |
Because noise impacts PU accuracy, testing BSRU-Net's noise resistance was essential. This study added Gaussian (GS) and salt-and-pepper (SP) noise to a noise-free dataset with increasing factor ratios (0.2
:
0.2, 0.4
:
0.4, 0.6
:
0.6, 0.8
:
0.8). GS element α (max standard deviation) and SP factor β (noise density) control noise levels. A random standard deviation matrix was used to enhance noise randomness. As α and β increased, noise intensity increased, testing BSRU-Net's performance under various noise conditions. This study also used mixed noise and increasing phase values to simulate real-world noise. Seven samples with different phase heights were tested, and only shown at a max phase of X = 55. Despite increasing noise, the results indicate its output phase images closely match the true images in Fig. 8(c1)–(c5). The PSNR remained stable in Fig. 8(d1) and (d2). SSIM and AU values minimally fluctuated with noise intensity in Fig. 8(a1)–(a3), stabilizing after an initial dip, as shown in Fig. 8(d4) at X = 47.
Table 2 shows that Fig. 8(d1) displays the highest metric values, suggesting minimal noise impact at this level. Despite the noise getting stronger, the network's PU metrics (M-PSNR, M-AU, and M-SSIM) only slightly decreased, with the lowest values being 10%, 0.18%, and 2.9% less than the highest, respectively, maintaining high performance. These experiments effectively evaluated BSRU-Net's noise resistance, supporting its stability and reliability in real-time infrared crystal characterization.
| Fig. 8 | M-PSNR | M-SSIM | M-AU |
|---|---|---|---|
| d1 | 49.52 | 0.9990 | 99.70% |
| d2 | 47.08 | 0.9985 | 99.04% |
| d3 | 45.57 | 0.9972 | 98.75% |
| d4 | 44.16 | 0.9985 | 96.79% |
A comparison was also made with several classic and commonly used algorithms, demonstrating the competitiveness of BSRU-Net. Noise resistance tests were conducted at various intensities, providing data support for accurate PU in real samples. Based on the system, BSRU-Net performed experimental studies on static ‘CUGB’ workpiece thickness measurement. The data (error rate of 1.12%) revealed its accuracy in real samples, and its adeptness at noise reduction and phase detail extraction, providing reliability for the characterization of infrared crystals. BSRU-Net also monitored the microstructural evolution during the growth of Na2CO3 crystals and CsPbBr3 perovskite crystals.
The trials demonstrated the successful untangling of phase information while suppressing environmental detection noise, twin images, and pseudo-lens effects. It is worth noting that the outcomes also illustrate that after the traditional amplitude signals stabilize, changes in phase values continue to dramatically occur. These discoveries show the applicability and practical value of the unconventional insights, and hold significant importance for research on new materials for sodium-ion batteries and advanced materials for perovskite photovoltaic cells.
Footnote |
| † Haochong Huang and Haichao Huang contributed equally to this work and should be considered co-first authors. |
| This journal is © The Royal Society of Chemistry 2025 |