End-to-end molecular structure elucidation from multimodal NMR spectra images using vision transformers

Chao Han; Xiaolin Pan; Yingkai Zhang

doi:10.1039/D6SC02352E

End-to-end molecular structure elucidation from multimodal NMR spectra images using vision transformers

Chao Han,^a Xiaolin Pan

^a and Yingkai Zhang

*^abc

Author affiliations

* Corresponding authors

^a Department of Chemistry, New York University, New York 10003, USA
E-mail: yingkai.zhang@nyu.edu

^b Simons Center for Computational Physical Chemistry at New York University, New York, New York 10003, USA

^c NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, China

Abstract

Nuclear magnetic resonance (NMR) spectroscopy is a cornerstone technique for molecular structure elucidation, but interpreting complex NMR spectra remains challenging and often relies on expert-driven, heuristic workflows. Recent deep learning approaches have enabled spectrum-to-structure prediction, yet many depend on peak-level text annotations that discard informative intensity patterns and are not readily extended to higher-dimensional NMR experiments. Here, we present NMRViT, a spectral Vision Transformer framework for NMR-driven molecular structure elucidation that operates directly on raw spectral signals of both one-dimensional (¹H, ¹³C) and two-dimensional (HSQC) spectra. Trained on a large-scale simulated dataset, NMRViT achieves strong performance across single-modality and multimodal NMR spectral input configurations, and provides, to our knowledge, one of the first end-to-end benchmarks for HSQC-based structure prediction. We systematically evaluate both zero-shot transfer and fine-tuning from large-scale simulated data on two experimental benchmarks covering 1D and 2D NMR spectra, highlighting the simulation–experiment domain gap and showing that fine-tuning with only a small number of experimental samples can substantially reduce this gap. To further improve candidate ranking, we introduce a chemical-shift-based post-processing strategy that re-ranks candidate structures using local spectral evidence, yielding consistent performance gains in both zero-shot and fine-tuning settings. Together, these results establish raw-spectrum vision transformers, combined with lightweight experimental adaptation and chemically informed re-ranking, as a practical framework for automated molecular structure elucidation from multimodal NMR data.

Chemical Science

End-to-end molecular structure elucidation from multimodal NMR spectra images using vision transformers

Abstract

Supplementary files

Article information

Download Citation

Permissions

End-to-end molecular structure elucidation from multimodal NMR spectra images using vision transformers

Social activity

Search articles by author

Spotlight

Advertisements