Open Access Article
Yixi Shen†
a,
Ledu Wang†a,
Yan Huang†*a,
Xiaolong Zhang†
a,
Meng Huanga,
Huirong Lia,
Jing Hea,
Aoran Caia,
Yang Wanga,
Pieter E. S. Smithb,
Jun Jiang
*ac,
Zhuoying Zhu*a and
Linjiang Chen
*a
aState Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China. E-mail: hyang@ustc.edu.cn; jiangj1@ustc.edu.cn; zyzhuq@ustc.edu.cn; linjiangchen@ustc.edu.cn
bHefei JiShu Quantum Technology Co., Ltd, Hefei 230026, China
cHefei National Laboratory, University of Science and Technology of China, Hefei, 230026, China
First published on 25th March 2026
Bridged azobenzene derivatives are key photo-responsive molecular switches. However, probing and interpreting their microscopic Z ↔ E isomerization mechanisms remain challenging as isolated spectroscopic and computational efforts struggle to establish clear structure–spectrum relationships. We report an integrated, large-language-model (LLM) agent-driven workflow that links literature-guided planning, ab initio molecular dynamics (AIMD) sampling, density functional theory spectral calculations, robotic infrared/Raman measurements, and interpretable machine learning for structural–spectral analysis of bridged azobenzenes. Central to the analysis is an attention-based convolutional neural network (ATT-CNN) that predicts the C–N
N–C dihedral angle directly from vibrational spectra with r = 0.99 and MAE = 5°. Attention maps highlight mechanistically informative bands and support holistic (non-marker-dependent) interpretation; transfer learning extends performance across chemical environments and experimental datasets. LLM agents formulated the research plan and coordinated automated simulations and measurements, whereas neural-network architecture design, training, and comparative benchmarking were performed by human researchers to retain full flexibility for model exploration and ensure rigorous interpretation. To our knowledge, this is the first LLM-agent-planned and -orchestrated mechanistic study unifying literature synthesis, theory, experiment, and machine learning. The resulting strategy advances quantitative insight into azobenzene photoisomerization and provides a generalizable blueprint for AI-driven investigations of dynamic molecular systems.
Despite decades of research, a comprehensive molecular-level understanding of the dynamic Z ↔ E isomerization process across the vast chemical space of bridged azobenzene derivatives and under realistic environmental conditions remains elusive.16–19 This challenge arises from several persistent limitations inherent to traditional approaches. Conventional studies often rely on painstaking, molecule-by-molecule experimental and computational investigations,20 which restrict the systematic exploration of chemical diversity and slow the rational design and optimization of new derivatives. Moreover, experimental efforts to capture the ultrafast dynamics of isomerization are further complicated by the inherently complex and often overlapping spectral features, the high sensitivity of these features to environmental factors, and the difficulty in directly interpreting subtle changes in molecular conformation or distinguishing transient intermediates.21,22
From a theoretical standpoint, the accurate simulation of photochemical dynamics in these systems requires a careful balance between computational precision and feasibility, especially when it comes to modeling non-adiabatic transitions,23,24 explicit solvent effects,24 and rare but critical conformational states.25,26 Moreover, there is often a disconnect between theoretical predictions and experimental measurements; these efforts are frequently pursued in parallel rather than in an integrated, high-throughput, or feedback-driven fashion. As a result, closing the loop between hypothesis, observation, and mechanistic understanding remains a formidable task. Together, these factors create a substantial bottleneck that limits both the development of fundamental insight and the accelerated application of photo-switchable azobenzene-based materials.
Recent advances in laboratory automation, robotics, and artificial intelligence (AI),27–32 most notably large language models (LLMs) and their agent systems—present a transformative opportunity to overcome these long-standing barriers. By integrating autonomous knowledge extraction, hypothesis generation, experimental design, high-throughput data acquisition, and interpretable machine learning, it is now possible to move beyond piecemeal strategies and toward data-driven, AI-augmented collaborative research. In this work, we developed and implemented an integrated workflow for the elucidation of Z ↔ E isomerization dynamics in bridged azobenzenes (Fig. 1), powered by a multi-agent-driven robotic AI chemist platform.31 Our workflow combines agent-guided computational simulation and spectral modeling, automated experimental spectroscopic measurements, and interpretable machine learning to enable automated, high-throughput structural analysis during photoisomerization. By systematically coordinating the actions of literature-mining, research-planning, data-generating, and machine-learning agents, our approach lays the groundwork for autonomous structure–function discovery and rational design of new molecular photoswitches. The results and methodology presented herein seek not only to advance the study of azobenzene photochemistry, but also to serve as a blueprint for autonomous, generalizable research platforms capable of accelerating discovery across the broader landscape of dynamic molecular systems. It should be noted that this study does not directly simulate the excited-state dynamics (such as the evolution prior to the conical intersection), which constitute the key mechanism governing the ultrafast photochemical step. Instead, it focuses on establishing a quantitative “structure–spectrum” relationship based on ground-state conformers and their vibrational spectra. This research paradigm not only provides a feasible pathway for high-throughput structural analysis of complex photochemical systems but also lays a methodological foundation for subsequent studies that integrate steady-state analysis with excited-state dynamics.
The Literature Reader combined GraphRAG (Graph Retrieval-Augmented Generation) technique33 with state-of-the-art large language models, DeepSeek V3 (ref. 34) and DeepSeek R1 (ref. 35) to construct a comprehensive, hierarchical knowledge graph. These open-source models were selected to ensure cost-efficiency for token-intensive workflows and to facilitate reproducibility within the scientific community, avoiding reliance on proprietary ecosystems. This approach enabled multiscale parsing of the literature—capturing not only chemical entities and molecular structures, but also experimental conditions, spectroscopic features, theoretical methodologies, kinetic and thermodynamic data, and mechanistic hypotheses. The resulting knowledge graph was organized such that each publication formed a high-level node, with nested sub-nodes detailing molecular systems, experimental protocols, spectral observations, and computational outcomes. This architecture facilitated the rapid identification of knowledge gaps, precedents, and best practices relevant to our research objectives, mirroring the literature mining and knowledge integration framework previously developed for multi-domain catalyst discovery. To inform subsequent research planning, the Literature Reader was systematically queried to generate targeted reports synthesizing the key theoretical concepts and experimental methodologies in the field, providing a robust contextual foundation for intelligent experimental design.
User: “The idea of generating molecular ‘movie’ from a femtosecond Raman experiment is promising, however, conducting such experiments are resource-extensive, and we would like to know if there are promising methods to generate such trajectories via simulations”.
Agent: “Proposed Research Direction: High-Fidelity Nonadiabatic Molecular Dynamics Simulations… Rationale: While experimental ultrafast spectroscopy (e.g., FSRS) is powerful, its cost/time intensity can be mitigated by computational trajectory generation”.
User: “Are basic neural networks enough, or should we use advanced models like CNNs? … Since molecular vibrations exist, are individual bond lengths inappropriate…? Given our expertise and equipment, Raman spectra is our preferred spectral feature”.
Agent: “Refined Predictive Framework: Direct Dihedral Angle Prediction from Raman Spectra… Core Concept: Develop a 1D-CNN with Spectral Attention focused solely on predicting the bridge dihedral angle… This prioritizes quality over quantity”.
Consistent with our previous work on LLM-driven catalyst discovery, this collaborative process enabled effective fusion of knowledge across theoretical, spectroscopic, and experimental domains. The Research Designer recognized that achieving real-time, mechanistic insight into isomerization required bridging the gap between unobservable molecular parameters (such as the C–N
N–C dihedral angle) and directly measurable vibrational spectra. Thus, the agent proposed a strategy centered on constructing predictive, interpretable machine learning models capable of inferring structural coordinates from IR and Raman data, effectively translating complex spectroscopic signatures into actionable molecular information.
N–C dihedral angle.
The AIMD simulations were performed using the CP2K software package with the hybrid Gaussian plane wave method. The agent employed the PBE exchange–correlation functional augmented with Grimme's D3 dispersion correction and the DZVP basis set. Metadynamics was used with the dihedral angle as the collective variable to efficiently sample the reaction pathway within the canonical (NVT) ensemble, using a Nosé–Hoover thermostat. From these 5 ps trajectories (0.5 fs timestep), approximately 500 snapshots covering the complete Z-to-E transition were extracted for each system.
Subsequently, the infrared and Raman spectra for all sampled conformers were computed at the DFT level to generate the spectral dataset. All frequency calculations were carried out using Gaussian 16 with the B3LYP functional and the 6-31+G(d,p) basis set. The resulting harmonic frequencies were uniformly scaled, and spectral lines were broadened with a Lorentzian function (8 cm−1 FWHM) to produce continuous spectra suitable for model training.
The automated workflow produced a curated dataset of 4400 structure–spectrum pairs across 11 bridged azobenzene derivatives. With this high-fidelity dataset in hand, the workflow then shifted to the construction and training of interpretable machine-learning models to decode the spectral–structural relationship. To complement and validate the purely computational data, the workflow was extended to incorporate experimental measurements.
To ensure both robust validation and broad transferability, the workflow further incorporated systematic experimental measurements under multiple isomerization conditions, including both thermal and photoinduced processes, with the Experiment Designer and Robot Operator agents coordinating the planning and execution of high-throughput sample preparation and spectroscopic acquisition. This agent-driven integration enabled the capture of a comprehensive range of isomerization behaviors, resulting in a rigorously curated dataset for subsequent machine learning (ML) studies.
N–C dihedral coordinate. This human–AI coordinated approach guaranteed that every stage of the workflow—from initial research conception and planning, through automated computation and simulation, to manual ML model development and high-throughput experimental validation—remained integrated, data-driven, and systematically executed. The following sections detail the specific computational, ML, and experimental advances that emerged from this multi-agent workflow.
N–C dihedral angles were calculated for the 4400 conformations of the 11 bridged azobenzene derivatives (Table S1) generated using AIMD. The IR and Raman spectra of these conformers were obtained via DFT calculations. The compiled dataset of spectrum-dihedral angle pairs was randomly divided into training and testing subsets in an 8
:
2 ratio. The overall performance of the attention-based convolutional neural network (ATT-CNN) achieved high accuracy, as evidenced by a 0.99 correlation (r) between the calculated dihedral angles (φCalc) and those predicted by the neural network (φNN) (Fig. 2a). As shown in Fig. 2b, the φCalc distribution spans a wide range, with dihedral angles of 6.4° (Z conformation) and 146.8° (E conformation) identified as relatively stable.
We compared ATT-CNN with a convolutional neural network without an attention layer (CNN), a traditional fully connected neural network (FCNN), a long short-term memory (LSTM) network, and a transformer network. Our results indicate that ATT-CNN best predicts the dihedral angle, yielding both the highest correlation with φCalc (r = 0.99) and the smallest mean absolute error (MAE) (MAE = 5.0°). Removal of the attention layer from ATT-CNN increased the MAE from 5.0° to 8.6° and decreased the correlation coefficient from 0.99 to 0.97, illustrating its importance to ATT-CNN's predictive performance. The MAEs for the LSTM, FCNN, and transformer networks were 7.3°, 10.3°, and 6.3°, respectively, and the correlations between φCalc and φNN for the LSTM, FCNN, and transformer networks were 0.96, 0.94, and 0.98, respectively.
N–C generally falls within the spectral region with the highest attention layer weight allocation, termed the active location (in cases where multiple NN stretches exist, the one with the largest amplitude of motion for the directly bonded nitrogens is considered the predominant NN stretch). Examples of the predominant NN stretching frequency coinciding with the active location are provided in Fig. 3a–d. Given the role of the C–N
N–C moiety in defining the dihedral angle, its NN stretching vibration is logically expected to be significant in angle prediction. Therefore, these examples underscore the utility of the attention layer's ability to pinpoint crucial spectral frequency regions.
The relationship between the NN stretching frequency and the attention layer's weight distribution was confirmed through quantitative analysis (Fig. 4a–d). IR and Raman spectral regions were ranked based on the weights conferred by the attention layer: the highest-weighted region was named active location 1 (AL1), followed by active location 2 (AL2), active location 3 (AL3), and active location 4 (AL4). In our TOP1 analysis, the average frequency of AL1 was defined as FrequencyAL, and it was plotted against the DFT-derived NN stretching frequency for C–N
N–C (FrequencyDFT) (Fig. 4a). The TOP2 analysis involved designating the average frequency of AL1 and AL2 nearest to FrequencyDFT as FrequencyAL. Similarly, TOP3 analysis entailed choosing the closest frequency to FrequencyDFT from the averages of AL1, AL2, and AL3. Last, the TOP4 method extended this selection to include the nearest frequency from the combined averages of AL1 through AL4. In the TOP2, TOP3, and TOP4 analyses, FrequencyAL was then plotted against FrequencyDFT (Fig. 4b–d).
Although the correlation between the frequency of the NN stretch critical for dihedral angle prediction and the highest weighted spectral region frequency was only 0.64 (Fig. 4a), the correlation increased significantly when other highly weighted spectral regions were included (r > 0.9) (Fig. 4b–d). This is also reflected in the average absolute frequency difference between FrequencyAL and FrequencyDFT, which decreases from 46 cm−1 for TOP1 to 26 cm−1 for TOP2 and, as more highly weighted spectral regions are included, remains below 23 cm−1. These data confirm the link between the NN stretching frequency, vital for predicting the dihedral angle, and the spectral regions prioritized by the attention layer; this connection was previously illustrated with select examples in Fig. 3. The connection explains the enhanced accuracy of the ATT-CNN model that incorporates attention layers, as they effectively highlight the most informative spectral features for analysis.
As illustrated in Fig. S1, this zeroing process was repeated at regular intervals of 4 cm−1 to ensure a comprehensive and unbiased assessment. We used each of these datasets to predict the bridged azobenzene derivative dihedral angle. The MAE of the average SWZ, 8.9°, is similar to that of the untreated data, 8.7°, suggesting that an indiscriminate zeroing out of 16 consecutive absorbance values generally has little impact on dihedral angle prediction. Notably, the maximum MAE of 11.2° was observed when the AL1 absorbance values were zeroed out, marking a 28% increase from the MAE of the untreated data. Zeroing out the NN stretch absorbance values led to an MAE of 9.9°, a comparatively smaller 14% increase in MAE.
These data indicate that, although the AL1 spectral region is far from the N
N stretching frequency, it still plays a critical role in accurately predicting the dihedral angle. To further elucidate the physicochemical nature of these anomalous AL1 regions, we analyzed the 40 cases with the largest deviations (Fig. S6). The results show that the vibrational modes corresponding to these AL1 regions have clear chemical significance, such as C–N coupling, ring breathing vibrations, and ester C
O stretching, confirming that the attention mechanism can dynamically identify the most informative spectral features based on the molecular environment.
Notably, although zeroing out the secondary attention regions (AL2–AL4) led to measurable increases in MAE (10.3°, 9.7°, and 9.4°, respectively), the predictive accuracy of the model remained largely intact. This demonstrates that ATT-CNN does not rigidly rely on a single spectral peak but is capable of capturing synergistic relationships between the primary feature (AL1) and other relevant spectral signals. These secondary features play an indispensable role in calibrating predictions, refining structural information, and providing necessary chemical context. Therefore, although AL1 contains the primary spectral–structure relationship, the inclusion of other attention-weighted regions remains essential for achieving high-precision predictions—this also reflects the model's effectiveness in integrating and utilizing complex spectral features.
The performance comparison in Fig. 5 show that transfer learning consistently surpasses direct learning. For the target domain consisting of 500 conformers (Fig. 5a), transfer learning on only 20 training examples resulted in a dihedral angle prediction with an MAE less than 12°, outperforming direct learning on 500 training samples. To validate ATT-CNN's applicability to experimental data, we used time-resolved IR and Raman spectroscopy to monitor dihedral angle changes in bridged azobenzene under 400 nm illumination and azobenzene under heating at 70 °C (Fig. S11, S13, S17 and S18). Since experimental data lacked explicit dihedral angle labels, we employed a semi-supervised learning approach: the model was first fine-tuned using a loss function that penalized deviations from the general increasing trend of predicted dihedral angles with illumination or heating time.
For bridged azobenzene, 19 experimental datasets were collected (9 for fine-tuning, 10 for testing); the refined model successfully captured the increasing trend in the testing set under illumination (Fig. 5c, r = 0.935). For azobenzene, 20 datasets were collected (10 for fine-tuning, 10 for testing); the refined model captured the increasing trend in the testing set under heating (Fig. 5d, r = 0.958). These results reflect the macroscopic structural evolution process based on ensemble-averaged spectra, verifying the model's capability to track changes in product distribution driven by light (or heat), and further demonstrate that ATT-CNN can generalize from theoretical training to experimental data with good robustness and generalizability.
N–C dihedral angle, with a mean absolute error as low as 5°. The attention mechanism within ATT-CNN reliably identifies and weights the most informative spectral features, enabling robust and interpretable structural assignments even amid common experimental variations or in the absence of traditional marker peaks.
Notably, the holistic learning approach of ATT-CNN allows it to extract structural information from complex spectral data without relying exclusively on any single feature, as demonstrated by successful predictions even when NN stretch regions were omitted. Further, transfer learning to new domains demonstrated the model's strong generalizability and adaptability, underscoring its potential as a broadly applicable tool for automated structure determination.
Altogether, our results lay the groundwork for a new approach of data-driven, interpretable, and human–AI collaborative workflows for real-time spectroscopic structure analysis. By harnessing both the accessibility and information-rich nature of vibrational spectroscopy, this strategy opens the door to accelerated mechanistic discovery in ultrafast dynamics, responsive materials, and beyond. Continued expansion of ATT-CNN's training and application to a wider chemical space will further enhance its power, paving the way for transformative advances in the automated interpretation of complex spectroscopic data across diverse chemical systems.
Finally, we acknowledge that the present workflow is grounded in ground-state simulations and steady-state spectroscopy. Although this approach does not capture the ultrafast excited-state dynamics underlying the initial photochemical step, it successfully establishes a robust mapping between readily accessible vibrational spectra and the key structural descriptor (the C–N
N–C dihedral angle) of the relaxed photoproducts. This enables the quantitative tracking of macroscopic isomerization progress and photostationary states, which is directly relevant for evaluating photoswitch performance and guiding molecular design. Future integration with time-resolved spectroscopy and nonadiabatic dynamics simulations could further extend the framework to probe the real-time photochemical trajectory.
Footnote |
| † These authors contributed equally: Y. S., L. W., Y. H., and X. Z. |
| This journal is © The Royal Society of Chemistry 2026 |