Dynamic Protein Structures in Solution: Decoding the Amide I Band with 2D-IR Spectral Libraries and Machine Learning
Abstract
The dynamic three-dimensional structures of proteins dictate their function, but accessing structures in solution at physiological temperatures is challenging. Ultrafast 2D-IR spectroscopy of the protein amide I band produces a spectral fingerprint that derives directly from the 3D backbone structure within minutes, using microlitres of label-free samples, in aqueous (H₂O) solution and with picosecond time resolution. However, transforming 2D-IR fingerprints into quantitative, solution-phase protein structures relies on decoding the fundamental link between the atomistic structure and the 2D spectrum. We demonstrate a top-down approach to solution-phase protein structure determination that combines 2D-IR spectral libraries with machine learning (ML). Using a dataset consisting of 6732 spectra of 35 proteins in H2O that span a range of structures, Support-Vector Machine (SVM) models classified unknown protein samples according to structural content and measured quantities of α-helix and β-sheet with an RMS error of ≤ 7 %. The potential for hybrid 2D-IR-ML tools to predict the number and length of helices in a protein, and identify the presence of parallel and antiparallel β-sheets from the 2D-IR fingerprint is also demonstrated. These results lay the groundwork for rapid, quantitative analysis of dynamic protein structures under physiologically relevant conditions.
Please wait while we load your content...