Minji
Kim
a,
Jung Ho
Lee
*a and
Keunhong
Jeong
*b
aDepartment of Chemistry, Seoul National University, Seoul 08826, Republic of Korea. E-mail: jungho.lee@snu.ac.kr
bDepartment of Physics and Chemistry, Korea Military Academy, Seoul 01805, Republic of Korea. E-mail: doas1mind@kma.ac.kr
First published on 17th November 2025
Accurate prediction of 15N NMR chemical shifts in flexible peptide systems remains challenging. We present an ensemble-based computation protocol combining density functional theory with replica-exchange molecular dynamics. This approach outperformed single-structure predictions with deviations of 2.5–5.6 ppm for most residues and 1.1 ppm for leucine in a model peptide.
However, experimental acquisition of 15N NMR data faces significant challenges due to low natural abundance (0.365%) and relatively small gyromagnetic ratio, resulting in poor sensitivity compared to proton detection.3 Recent advances in density functional theory (DFT) have enabled increasingly accurate predictions of NMR chemical shifts across diverse molecular systems.4–6 AI-based tools have also demonstrated high accuracy in chemical shift prediction in previous studies.7–10 However, these AI-based approaches face inherent limitations in accurately incorporating specific solvent environments into their predictive models, whereas DFT-based methodologies enable explicit consideration of solvation effects through quantum mechanical calculations.
Computational frameworks specifically designed for nitrogen nuclei in aqueous environments remain largely underexplored. While reliable scaling factors for 15N chemical shift predictions have been established, these calibrations primarily relied on gas-phase calculations or non-aqueous solvation models.11–15 Therefore, the development of a comprehensive methodology specifically validated for aqueous systems is crucial for practical applications in water-based biological systems. The challenge becomes even more complex when dealing with intrinsically disordered proteins (IDPs), which lack rigid three-dimensional structures and exhibit dynamic conformational variability. These proteins exist as ensembles of multiple conformers, with observed NMR signals representing ensemble-averaged value over all accessible conformations.16 Our investigation focuses on the short model peptide, Asp–Gln–Leu–Gly–Lys (DQLGK), corresponding to residues 98–102 of α-synuclein IDP (Fig. 1).
This pentapeptide is expected to form an ensemble of multiple conformers. Boltzmann-weighted averaging is therefore anticipated to improve the accuracy of chemical shift predictions for such flexible systems.
In this study, we first establish an optimal computational protocol by systematically evaluating17 882 functional/basis set combinations across 19 small organic molecules (43 nitrogen datapoints) in water3 to determine accurate scaling factors for aqueous environments. Using this optimized protocol, we predict Boltzmann-averaged 15N chemical shifts for a pentapeptide and validate their accuracy against experimental data. Representative conformers composing the ensemble are derived from replica-exchange molecular dynamics (REMD)18 simulations, and the chemical shifts are calculated with density functional theory (DFT). This approach ensures reliable predictions for flexible peptide systems in biologically relevant conditions.
To obtain experimental 15N chemical shift for the pentapeptide, we performed 1H–15N HSQC, 1H–1H TOCSY, and 1H–1H NOESY experiments. All NMR experiments in this work were performed on a 600 MHz NMR spectrometer equipped with a z-gradient cryoprobe. 15N chemical shifts of backbone amides were measured from the HSQC spectrum (Fig. S1, SI), and the five peaks were assigned to each residue by TOCSY (Fig. S2, SI) and NOESY (Fig. S3, SI) spectra. All samples were 6 mM peptides (without 15N enrichment) in 20 mM sodium phosphate (pH 6.0), 20 mM sodium chloride, 5% D2O, and measured at 20 °C.
The conformational ensemble of the pentapeptide was derived via REMD simulation in GROMACS 24.02.19–26 The simulation employed the AMBER99SB force field27 and the SPC/E water model.28 Simulations were run for 40 ns with a 2 fs time step. Trajectories were clustered with the GROMOS method,29 yielding 12 clusters. The lowest-energy frame from each cluster was selected as its representative conformer.
Each conformer was first geometry optimized and then subjected to 15N chemical shift calculations, using the same DFT functional/basis set identified as optimal during the scaling factor optimization. Then, Gibbs free energy was calculated using DFT B3LYP30–32/6-31G(d,p).33–35 All DFT calculations were performed using Gaussian16 Rev. A.0336 on the NURION supercomputer (KISTI) using Knights Landing nodes with Intel Xeon Phi 7250 processors (68 cores, 96 GB RAM).
Geometry optimization was the most time-consuming step, with CPU times ranging from 40 to 100 days, and up to 200 days for the representative conformer of the second cluster. In contrast, NMR calculations typically required CPU time about 7 hours, and frequency calculations took approximately 8 days on average.
Scaling factor optimization used datasets of small molecules (Table S1, SI). The isotropic shielding tensors were calculated with a broad range of DFT methods. For geometry optimization, 3 functional/basis set were employed, whereas the NMR calculations explored 294 combinations (49 functionals × 6 basis sets) (Tables S2–S4, SI). Subsequently, we carried out linear regressions between the experimental chemical shifts and the computed shielding tensors for each case (Tables S5–S7, SI). The optimal scaling factor were adopted for the peptide study.
Through systematic evaluation, we establish an optimal computational protocol specifically calibrated for aqueous environments (Fig. 2); B3LYP[GD3BJ]30–32,37/6-311++G(d,p)38–46 for geometry optimization and LC-ωHPBE47–50/6-31+G(d,p)33–35,38,39,51–55 for NMR calculations. The linear regression yielded σ = −1.015δ − 112.304 with R2 = 0.992, demonstrating excellent agreement between the fitted slope −1.015 and the theoretical slope −1 in δsample = σref − σsample.56
Thermodynamic analysis using DFT-calculated Gibbs free energies revealed substantial differences in conformer stability. The three most populated conformers represent 54.75%, 31.73%, and 10.76% of the ensemble, respectively, amounting to 97.24% of the total population (Fig. 3).
The calculation of ensemble-averaged 15N chemical shifts involved Boltzmann-weighted averaging of individual conformer contributions according to their thermodynamic populations. Comparison between calculated ensemble-averaged chemical shifts and experimental values obtained from 1H–15N HSQC spectroscopy showed good agreement (Fig. 4).
Ensemble averaging improved prediction accuracy compared to using only the dominant 3rd cluster's chemical shifts. (Fig. S6, SI) While the 3rd cluster exhibited an MAE (Mean Absolute Error) of 5.1 ppm, the ensemble-averaged chemical shifts yielded a reduced MAE of 3.6 ppm. This finding supports the view that the peptide exists as a dynamic mixture of interconverting conformers and that the experimentally observed chemical shifts represent an ensemble-averaged property.
The computational predictions showed deviations ranging from 2.5 to 5.6 ppm for most backbone nitrogen atoms (Asp, Gln, Gly, Lys), with all calculated values appearing up-field relative to experimental measurements, except for glutamine. Notably, the leucine residue demonstrated exceptional agreement with only 1.1 ppm deviation from the experimental value. Averaged 15N shift deviations of 3.6 ppm is slightly smaller than those reported in prior 15N DFT study – averaged deviation of 4.0 in ubiquitin and 4.6 ppm in GB3.57
In contrast to the four residues, the central residue exhibits accurate prediction results. The accuracy of quantum calculations for peptides in aqueous solution is critically dependent on the proper treatment of solvent system. For terminal residues, particularly the charged aspartic acid and lysine residues, implicit solvation models may inadequately capture the specific hydrogen bonding networks and electrostatic interactions with water molecules. Recent computational studies have demonstrated that explicit solvation models are essential for accurate description of peptide-solvent interactions, particularly for systems involving strong hydrogen bonding and charged residues.58,59 The central leucine residue, being surrounded by neighbouring residues and possessing a hydrophobic side chain, experiences less solvent interaction, making implicit solvation models more suitable for this residue.
In conclusion, we have successfully determined scaling factors for NMR chemical shift calculations in aqueous environments, demonstrating their reliability and accuracy for both small molecule and peptide systems. This systematic validation provides a robust framework for future applications in predicting NMR parameters of biomolecules in their aqueous environment.
The Boltzmann-weighted averaging of 15N chemical shifts from 12 representative conformers yielded good agreement with experimental HSQC measurements, with deviations of 2.5–5.6 ppm for most residues and an improved accuracy (1.1 ppm) for leucine. These results validate the ensemble approach for capturing the dynamic nature of intrinsically disordered proteins and demonstrate that experimental chemical shifts represent population-weighted averages over accessible conformational states.
The computational protocol developed in this work provides a reliable foundation for future investigations of larger and more flexible systems, bridging the gap between theoretical predictions and experimental NMR observations. This methodology offers valuable insights into how conformational dynamics modulate 15N chemical shifts in biomolecules for which conformational flexibility is central to biological function.
| This journal is © the Owner Societies 2026 |