MCR-ALS analysis of two-way UV resonance Raman spectra to resolve discrete protein secondary structural motifs

John V. Simpson a, Gurusamy Balakrishnan b and Renee D. JiJi *a
aUniversity of Missouri, Department of Chemistry, 601 S. College Ave., Columbia, MO 65211, USA. E-mail: jijir@missouri.edu; Fax: +1 573 882 2754; Tel: +1 573 882 8949
bPrinceton University, Department of Chemistry, Princeton, NJ, USA

Received 19th August 2008 , Accepted 24th October 2008

First published on 25th November 2008


Abstract

The ability of ultraviolet resonance Raman (UVRR) spectroscopy to monitor a host of structurally sensitive protein vibrational modes, the amide I, II, III and S regions, makes it a potentially powerful tool for the visualization of equilibrium and non-equilibrium secondary structure changes in even the most difficult peptide samples. However, it is difficult to unambiguously resolve discrete secondary structure-derived UVRR spectral signatures independently of one another as each contributes an unknown profile to each of the spectrally congested vibrational modes. This limitation is compounded by the presence of aromatic side chains, which introduce additional overlapping vibrational modes. To address this, we have exploited an often overlooked tool for alleviating this spectral overlap by utilizing the differential excitability of the vibrational modes associated with α-helices and coil moieties, in the deep UV. The differences in the resonance enhancements of the various structurally associated vibrational modes yields an added dimensionality in the spectral data sets making them multi-way in nature. Through a ‘chemically relevant’ shape-constrained multivariate curve resolution-alternating least squares (MCR-ALS) analysis, we were able to deconvolute the complex amide regions in the multi-excitation UVRR spectrum of the protein myoglobin, giving us potentially useful ‘pure’ secondary structure-derived contributions to these individual vibrational profiles.


1. Introduction

The applications of X-ray and more advanced NMR techniques to derive atomic resolution structures in a variety of protein structure types has truly brought forth the structural era in biochemistry and biophysics. However, even with the plethora of information gained, limitations in these techniques have become conspicuous in that many proteins' secondary structure contents are not static enough within the time-scale of these techniques to be easily determined. These voids have brought about the re-emergence and advancement of classical protein secondary structure sensitive spectroscopic techniques, such as circular dichroism (CD), IR and Raman scattering-derived spectroscopies. One such vibrational technique enjoying a revolution in use and interpretation is ultraviolet resonance Raman (UVRR) spectroscopy. UVRR is proving to have tremendous potential to study protein secondary structures and their dynamics in even the most difficult or problematic protein samples. Early work has established that UVRR can be used effectively to quantify relative amounts of α-helical, β-sheet and various disordered types of secondary structures.1,2 Recently, for instance, Huang et al. have added β-turn structure to the types of secondary structures that are quantifiable at excitation wavelengths below 200 nm.3 UV-excited Raman spectroscopy is unique in that selective enhancement of chromophores of interest, such as amide and or aromatic side chains, eliminates interference from the rest of the protein and thus simplifies the spectral analysis. Further, water is a weak Raman scatterer, so the UVRR spectra of proteins are nearly free of water interference, allowing structural studies to be performed in aqueous, non-deuterated, environments. The structural sensitivity of vibrational spectroscopies (IR- and Raman-based) is largely derived from the polypeptide backbone amide group. The amide group, –CO–NH–, gives rise to several discrete vibrational modes, collectively termed amide modes. The position and intensity of these modes shift with changes in the ϕ and ψ dihedral angles of the polypeptide backbone.4 There have been several good reviews on the sensitivity of the amide modes to protein secondary structure.5–9 Deep-UV excitation (<210 nm) results in resonance enhancement of the amide I, II, III and S modes, along with the aromatic side chains (tyrosine, phenylalanine and tryptophan).5 The position and intensity of the four amide modes are dependent upon the secondary structure of the protein with their relative contributions being proportional to the relative amount of each conformation. Likewise, the position and intensity of the various aromatic modes will be dependent upon their environment (hydrophobic/buried or hydrophilic/exposed) and their relative abundance.

UVRR has been applied to protein folding and unfolding studies under a host of conditions of varying pH and ionic strength.10–14 Additionally the kinetics of structural transitions from one type to another have also been studied through the use of temperature-4,15–19 and pH-jumps.20

While changes in the position, intensity and overall spectral shape of the amide modes are dependent on secondary structure, they remain highly overlapped in the UVRR spectra of proteins. Additionally, the UVRR bands from the aromatic residues tyrosine, tryptophan, and phenylalanine also overlap the amide regions, complicating the spectra further. Typically, individual spectra are systematically deconvoluted using non-linear least squares (NLLS). However, these methods are dependent on initialization parameters and constraints. When the individual Raman bands are not sufficiently resolved, the choice of initialization parameters and constraints often dictate the final solution rather than the spectroscopic data itself. Alternatively, multivariate methods could easily be applied to the aforementioned data, greatly simplifying the analysis process. Furthermore, additional components may be included to account for contributions from overlapping aromatic modes or other non-random spectral interferents.

With all the advantages of multivariate analysis, it is surprising that the application of chemometric methods to bilinear UVRR data is a relatively recent phenomenon.21–23 As resonance enhancement occurs when the excitation wavelength is at or near the electronic absorption band of the analyte, the excitation wavelength may be tuned to differentially excite the various structures. The observed absorption maxima of the amide transitions in the polypeptide backbone are similar to those observed for N-methyacetamide (NMA), which has three electronic transitions at 165 nm (S3 ← S0), 185 nm (S2 ← S0) and 210 nm (S2 ← S0).5 However, the absorption maxima vary slightly with the conformation of the peptide backbone.24 Likewise, the molar scattering ratio (excitation profile) of the backbone amide is conformation dependent.1,3,25,26 Thus, every protein UVRR spectrum will be dependent upon both the excitation wavelength and secondary structural composition of the protein. Varying the excitation wavelength or secondary structural composition would result in bilinear UVRR data that could be analyzed using a multitude of multivariate methods.

Bilinear model-based chemometric techniques have been shown to be extremely useful in the investigation of mixture-based chemical systems, especially factor analysis-based techniques like multivariate curve resolution (MCR).27 Coupling MCR with iterative algorithms, such as alternating least squares (ALS), has produced techniques which are flexible enough to cope with many different kinds of data structures and chemical problems.28–32 For example, MCR-ALS has been successfully applied to resolve kinetic concentration profiles from reaction-based chemical sensors,33 intermediates in protein folding34 and the solvation of solutes as a function of solvent composition.35

In this study, UVRR spectra of a single protein, myoglobin, collected at six different deep-UV excitation wavelengths, were used as a novel multivariate data set. Myoglobin is a well-studied model protein with a mostly α-helical structure. Thus, myoglobin is an attractive model system for the determination of a ‘pure’ α-helix basis spectrum and the associated steady state amide mode peak positions. Through the use of a novel chemically relevant shape constraint, imposed within the MCR-ALS algorithm, analysis of this multi-way data revealed what we believe are pure α-helical spectral profiles for the amide I, III and S regions versus excitation wavelength. Although a unique α-helical profile could not be resolved for the amide II region, the resolved spectral profiles could be used as starting parameters for UVRR analysis in the future.

2. Experimental

2.1 Sample preparation

Myoglobin was dissolved in 0.1 M phosphate buffer at pH 7 to a final concentration of 0.2 mM. Sodium perchlorate (0.2 M) was added to the sample as an internal intensity standard. Myoglobin and sodium perchlorate were purchased from Sigma (St. Louis, Missouri). The average laser power was 1.5 mW for all excitation wavelengths and each spectrum was averaged for 15 minutes.

2.2 Raman spectroscopy

The UV Raman spectrometer used in data collection was previously described by Spiro and co-workers.36 Specifically, the excitation source was the fourth harmonic of a tunable Ti:Sapphire laser (Photonics International), which was pumped by a frequency-doubled Nd:YLF laser. The sample solution was circulated through a wire-guided cell and held at a constant 25 °C, using a circulating water bath and a water-jacked reservoir. The UVRR spectra of myoglobin were collected at six excitation wavelengths for a total of six spectra (193, 197, 200, 203, 206, 210 nm). All the spectra were calibrated using the Raman spectra of cyclohexane and acetone.

3. Data analysis

3.1 Data pre-processing

In Raman dispersive instruments, the mechanical grating drive follows a sinusoidal function and the dispersion along the spectral axis is linear with respect to ‘wavelength (nm)’ and non-linear with respect to ‘wavenumber (cm−1)’. As a result, an instrument with 5 nm spectral bandpass will cover about 1308 cm−1 at 196 nm (193–198) excitation whereas the same instrument will cover about 1140 cm−1 at 210 nm excitation. Chemometric methods are based on matrix operations and require the same number of data points (equal length vectors) for spectra measured at multiple wavelengths. For this study, periodic datums were removed at higher excitation wavelengths (λex > 193 nm) and the spectra were aligned to within 0.5 cm−1 based on their associated calibration. Alignment to within 0.5 cm−1 was considered sufficient for this study as the spectral resolution ranged from 1.35 cm−1 at 193 nm to 1.13 cm−1 at 210 nm, at least twice the possible error in the alignment. Although removal of a nominal number of selected datums could potentially reduce the spectral resolution and introduce slight alignment errors, the simplicity of this method and the minimal distortion to the original spectral data led us to choose this method for our initial studies. In future studies, the effect of alternate methods, including the use of linear and spline interpolants to obtain equal numbers of points among different excitation wavelengths, will be investigated.

3.2 Non-linear least squares (NLLS) analysis

For comparison, each spectrum was initially deconvoluted using a NLLS fitting algorithm. NLLS analysis was performed in the Matlab environment (Mathworks, Natick, MA) using the standard NLLS optimization function available with the optimization toolbox. The spectral profiles were constrained to have a mixture of Gaussian/Lorentzian distributions. Each band was defined by four parameters: position (center), height, width, and fraction of Lorentzian content. The fitted spectral profiles from the UVRR spectrum excited at 193 nm (Fig. 1b) were used as the initialization parameters for the constrained MCR-ALS models. This will be discussed in more detail in the next section.
(a) Normalized deep-UVRR spectra of myoglobin, collected at excitation wavelengths from 193 nm (light gray) to 210 nm (black). Spectra were normalized to the internal intensity standard, perchlorate (ClO4−). (b) Non-linear least squares fit () of the UVRR spectrum of myoglobin (), excited at 193 nm. A series of mixed Gaussian/Lorentzian line profiles (dashed lines) were used to fit the entire experimental spectrum. Components denoted with black dashed lines (), were used as initialization profiles for the respective amide region. The residuals are denoted with in gray ().
Fig. 1 (a) Normalized deep-UVRR spectra of myoglobin, collected at excitation wavelengths from 193 nm (light gray) to 210 nm (black). Spectra were normalized to the internal intensity standard, perchlorate (ClO4). (b) Non-linear least squares fit (ugraphic, filename = b814392g-u1.gif) of the UVRR spectrum of myoglobin ([thick line, graph caption]), excited at 193 nm. A series of mixed Gaussian/Lorentzian line profiles (dashed lines) were used to fit the entire experimental spectrum. Components denoted with black dashed lines ([dash dash, graph caption]), were used as initialization profiles for the respective amide region. The residuals are denoted with in gray (ugraphic, filename = b814392g-u2.gif).

3.3 Multivariate curve resolution-alternating least squares analysis (MCR-ALS)

3.3.1 MCR-ALS analysis. The ALS algorithm that we have utilized was written in-house and is based on that outlined by Bro and Sidiropoulos.37 For clarity, matrices are represented as bold type uppercase letters, vectors are represented as bold type lowercase letters and scalars are presented in italics according to the standard notation reviewed by Kiers.38 Given that UVRR spectra collected at multiple excitation wavelengths are bilinear in nature, the resultant data matrix (X, I × J) may be decomposed into the Raman spectral profiles (A, I × N matrix) and excitation profiles or cross-sections (D, J × N matrix) plus an error matrix (E) of the same dimensions as the original data matrix. The data matrix, X, may then be reproduced according to eqn (1):33
 
X = ADT + E(1)

Eqn (1) is solved iteratively using an ALS algorithm. This optimization procedure requires reasonable initial estimates of either the excitation (D) or spectral (A) profiles.39 At each iteration of the optimization, A or D is optimized, while the opposing matrix is held constant.

As mentioned above, the spectral profiles (A) were initialized using the deconvoluted spectral bands from NLLS analysis of the UVRR spectrum of myoglobin (λex = 193 nm). Subsequently, the excitation profiles or relative cross-sections were calculated according to eqn (2):

 
D = XTA(ATA)−1(2)
Non-negativity constraints were applied to both the spectral and excitation profiles, by setting all negative values to an infinitesimally small value. Although setting negative values to zero or a relatively small value is not a least squares solution, it was sufficient to eliminate convergence to unreasonable solutions. Additionally, the spectral profiles were constrained to a mixed Gaussian/Lorentzian profile by minimizing ||CndnanT|| subject to the vector an being a Gaussian/Lorentzian profile. Cn represents the two-dimensional UVRR spectrum of the nth component (eqn (3)),
 
Cn = dnanT + E(3)
where E = XDAT. The pure spectral profile of the nth component was calculated according to eqn (4):
 
an = dnTCn(dnTdn)−1(4)

Bro and Sidiropoulos showed that minimization of ||CndnanT|| was equivalent to minimizing ||βan|| where β = (dnTCn)/(dnTdn) and an is constrained to unimodality, non-negativity, etc.37 This allows the easy implementation of the Gaussian/Lorentzian peak shape constraint through the incorporation of a NLLS optimization step that minimizes ||βan|| subject to an being a Gaussian/Lorentzian profile. The use of non-linear optimization methods to enforce chemically relevant constraints has been used previously to determine reaction rate constants from three-way LC-DAD data.40

3.3.2 Initialization of nested NLLS optimization. Each spectral profile was constrained to a Gaussian/Lorentzian profile and each component was described by four parameters: position, height, width, and fraction of Lorentzian. To minimize the influence of user-defined parameters on the final model, each peak parameter was initialized using pre-defined or systematically determined values. Peak width and fraction of Lorentzian content were initialized using pre-defined generic values of 50 cm−1 and 0.5, respectively. The peak height parameter was initialized as one half the maximum intensity. The peak center was estimated by fitting the cumulative sum of the unconstrained nth spectral profile (β) to a sigmoidal function (eqn (5), Fig. 2)
 
ugraphic, filename = b814392g-t1.gif(5)
where ymin and ymax and are the minimum and maximum intensity of the curve, x is the spectral range, and x0 is the center of the estimated curve. Thus, the peak position, x0, was initialized using the experimental data rather than an initial guess. This method was more efficient that using the position of maximum spectral intensity. This was most likely due to the higher level of noise in the unconstrained spectral profiles of highly overlapped modes, such as in the amide III region.

(a) Second unconstrained component from MCR-ALS fitting of the amide I region. (b) Cumulative sum of the second unconstrained component from MCR-ALS fitting of the amide I region () and the corresponding fitted sigmoidal curve ().
Fig. 2 (a) Second unconstrained component from MCR-ALS fitting of the amide I region. (b) Cumulative sum of the second unconstrained component from MCR-ALS fitting of the amide I region ([thick line, graph caption]) and the corresponding fitted sigmoidal curve ([dash dash, graph caption]).
3.3.3 Convergence. The convergence criterion was based on the scaled sum of the squared differences (SSD) between successive iterations according to eqn (6), where Vec([X with combining circumflex]i) is the vectorized form of the matrix [X with combining circumflex]i at the ith iteration. SSD represents the difference between the current ([X with combining circumflex]i) and previous ([X with combining circumflex]i−1) estimates of the data matrix X. As SSD becomes small, it indicates that the difference between successive estimates is decreasing. Convergence was defined as when SSD fell below a sufficiently small value, signaling that successive estimates were essentially identical.
 
ugraphic, filename = b814392g-t2.gif(6)
SSD values were also calculated for the excitation and spectral profiles. Convergence was defined as the point at which all three of the calculated SSD values fell below the pre-defined threshold.
3.3.4 Model evaluation. The quality of each model was determined by its ability to accurately represent the data. Evaluation of the model was performed by two methods: visual inspection and using the sum of the squared residuals (SSR), where ADT corresponded to the model and X corresponded to the experimental data (eqn (7)):
 
SSR = Vec(XADT)T × Vec(XADT)(7)
This operation was performed iteratively; a decrease in the SSR indicated an improvement in the quality of the fit of the model to the experimental data. Furthermore, the SSR value was used to evaluate the performance of the unconstrained, fully- and partially-constrained models with respect to one another.

3.4 Post-processing

Calculation of the relative cross-sections (σ) for each component was performed using eqn (8):41
 
ugraphic, filename = b814392g-t3.gif(8)

The cross-section of each spectral band was determined from the ratio of peak intensities (IN/IS), concentrations (CS/CN) along with a frequency term, where the letters N and S, designate sample and internal standard, respectively. The frequency term is composed of the ratio of the difference between the laser excitation frequency (ν0) and the vibrational frequencies of the sample (νS) and standard (νN), to the fourth power. In effect, the frequency term utilizes the ratio of the selected peak position for the sample and the peak position of the internal standard. All of these terms are multiplied by the cross-section of the internal standard (σS), which was taken from Balakrishnan et al.36 Cross-sections are typically expressed in units of millibarns/(molecule·steradian).

Although cross-sections are often determined from peak area rather than peak height, peak area can be much more susceptible to error arising from inaccuracies in the estimation Lorentzian content (see Table 1). Inadequate baseline correction can result in over-estimation of the Lorentzian content, which results in over-estimation of the cross-sections due to the more extended wings of Lorentzian bands. Given that the peak width and fraction of Lorentzian content should be independent of excitation wavelength, and are modeled as such with MCR-ALS, the ratio of peak intensities rather than areas was used in this study. Although this will result in differences in the relative magnitude of the cross-sections, it will have no effect on the shape of the excitation profile, which should be similar to the shape of the UV absorption spectrum.25,26,36

Table 1 Peak parameter estimation from NLLS analysis of UVRR spectra of myoglobin at each excitation wavelength
  Excitation wavelength/nm [x with combining macron] σ
193 197 200 203 206 210
a The intensity of the amide S band was too low to fit a band in this region in the 210 nm excited UVRR spectrum of myoglobin.
Amide III
Center/cm−1 1262 1264 1265 1265 1268 1258 1264 3
Width/cm−1 55 55 55 38 55 55 52 7
% Lorentzian 0 0 79 85 54 0 36 41
 
Center/cm−1 1294 1300 1302 1299 1303 1299 1299 3
Width/cm−1 29 29 29 29 29 29 29 0
% Lorentzian 0 0 0 0 0 69 11 28
 
Center/cm−1 1328 1330 1335 1333 1336 1334 1333 3
Width/cm−1 55 38 38 38 38 38 41 7
% Lorentzian 0 0 100 100 90 100 65 50
 
Amide S
Center/cm−1 1397 1397 1392 1385 1384 a 1391 6
Width/cm−1 36 38 38 38 38 a 38 1
% Lorentzian 0 36 100 100 100 a 67 47
 
Amide II
Center/cm−1 1512 1518 1519 1515 1521 1515 1517 3
Width/cm−1 37 23 31 26 43 38 33 8
% Lorentzian 100 0 28 0 0 0 21 40
 
Center/cm−1 1551 1554 1555 1552 1552 1554 1553 2
Width/cm−1 26 32 27 29 32 33 30 3
% Lorentzian 100 100 100 100 100 100 100 0
 
Amide I
Center/cm−1 1650 1655 1656 1653 1653 1654 1653 2
Width/cm−1 49 45 48 38 34 36 42 6
% Lorentzian 82 41 13 0 9 0 24 32
 
Center/cm−1 1683 1685 1697 1684 1682 1686 1686 5
Width/cm−1 29 37 38 33 29 25 31 5
% Lorentzian 47 50 0 0 0 0 16 25


4. Results and discussion

4.1 NLLS analysis of UVRR spectra

A common approach to spectral deconvolution is to fit each spectrum to a series of Gaussian-, Lorentzian- or Gaussian/Lorentzian-line shapes using NLLS optimization methods. These methods usually produce excellent approximations of a global vibrational spectral line shape, as can be seen in Fig. 1b. However, the minimized solution depends upon parameters defined by the user at the outset of the optimization, including the number of components, the initial peak parameters (height, width, position, shape) and the various constraints to be applied to these peak parameters during the optimization. One means by which to alleviate this inherent pre-analysis user bias is to globally fit data sets where individual components can vary independently as a function of some designed variable, such as excitation wavelength, pH, temperature, etc.… This added dimensionality limits the final solution to a global minimum, although still as a function of the initialization parameters chosen. UVRR spectra of myoglobin, collected at multiple excitation wavelengths, comfortably fulfill this strategic design principle (Fig. 1a).

The results of NLLS fits of these data (Table 1) clearly show that the combination of random noise and non-random error, possibly arising from small changes in the alignment or calibration of the instrument, significantly influence the fitted parameters. In general, we find that NLLS estimates of lower intensity bands have a higher degree of error associated with them than more intense spectral features, as evidenced by Raman bands arising from non-helical secondary structure, such as the minimally present β-turn and coiled regions (see Table 3 later for classifications) having the greatest degree of variance. Specifically, the predicted position of the non-helically associated amide S band varied by more than 10 cm−1 (λ193 nm: 1397 cm−1, λ206 nm: 1384 cm−1). Likewise, the estimated position of the second component in the amide I region varied by 15 cm−1 (λ200 nm: 1697 cm−1, λ206 nm: 1682 cm−1). These values are in stark contrast to the standard deviation of the predicted peak position, width and shape, for the relatively intense α-helical-derived amide II band which were 2, 3 and 0 cm−1, respectively.

The arbitrary influence of noise and instrumental variations on the resolved spectral features of minor secondary structural components highlights one of the primary limitations of the application of NLLS fitting of vibrational spectra, which is that each spectrum is being analyzed in isolation. An alternate fitting algorithm, MCR-ALS, is ideal as chemically relevant constraints are easily incorporated into the ALS minimization and a series of spectra may be analyzed as a whole. However, reasonable estimates of the spectral or excitation profiles are required to initialize the MCR-ALS minimization. In this work, these initial profiles were derived from the peak parameters obtained from the NLLS fit of the UVRR spectrum of myoglobin with the lowest excitation wavelength (λex = 193 nm).

4.2 Effect of constraint level on MCR-ALS model

As a proof of principle, the MCR-ALS method described in the Experimental section was first applied to only the amide I (1640–1730 cm−1) region of the pre-processed data (Fig. 3) in an attempt to resolve what can be thought of as pure amide I features associated with the α-helical and non-helical (random coil + β-turn) secondary structures. In order to robustly evaluate the effectiveness of chemically relevant constraints, three models – an unconstrained, a fully constrained, and a partially constrained variant – were applied to the amide I region (Fig. 3a–c). Both the partially constrained and fully constrained models incorporated the aforementioned Gaussian/Lorentzian peak shape constraint in the ALS optimization. Each spectral component, as they are optimized iteratively, represents the best global fit of the data set under a specified set of constraints. All models were evaluated based on the SSR values and the nearness to the assumption that the amide I region should have two discrete, unimodal, amide I components corresponding to the α-helical and non-helical portions of the protein's structure.3,18 The results are summarized in Table 2.
Spectral reconstruction of the amide I region (λex = 193 nm) using (a) an unconstrained, (b) fully constrained and (c) partially constrained model. The original spectrum () and residuals () at 193 nm excitation are shown in black. The fitted components () and predicted spectrum () are shown in gray.
Fig. 3 Spectral reconstruction of the amide I region (λex = 193 nm) using (a) an unconstrained, (b) fully constrained and (c) partially constrained model. The original spectrum ([thick line, graph caption]) and residuals ([dash dash, graph caption]) at 193 nm excitation are shown in black. The fitted components (ugraphic, filename = b814392g-u3.gif) and predicted spectrum (ugraphic, filename = b814392g-u1.gif) are shown in gray.
Table 2 Effect of constraint level on the fitted peak parameters and SSR for the amide I region
Constraint level Height Center/cm−1 Width/cm−1 % Lorentzian SSR
a Peak parameters for the unconstrained model were obtained by fitting the resolved unconstrained components to single Gaussian/Lorentzian peak profiles using NLLS. b The SSR based on the unconstrained components.
Unconstraineda 4.63 × 104 1652 39 19 1.97 × 108,b
2.35 × 104 1685 36 100  
 
Fully constrained 4.60 × 104 1652 37 0 5.04 × 108
2.63 × 104 1684 35 100  
 
Partially constrained 3.66 × 104 1652 37 0 1.95 × 108
2.10 × 104 1683 44 73  


4.2.1 Unconstrained model. The unconstrained model produced an excellent fit of the amide I region (Fig. 3a). However, the second component was clearly not unimodal, with a minor maxima at the edge of the spectral region (∼1640 cm−1) and another larger maximum at ∼1685 cm−1. The maximum at ∼1640 cm−1 is likely due to a small contribution from the neighboring aromatic bands. Resonance enhancement of tyrosine and phenylalanine is quite strong in the deep-UV.36,42 Attempts in previous studies have been made to subtract these contributions; however, subtraction does not completely remove the aromatic bands from the UVRR spectrum.3,43
4.2.2 Fully constrained model. The same initialization profiles were used for the fully constrained model as used previously for the unconstrained model, with the exception that each spectral profile was constrained to have a Gaussian/Lorentzian profile. Comparing the fully constrained model (Fig. 3b) to the unconstrained model in Fig. 3a, it is clear that the fully constrained model results in a poorer fit of the amide I region. Furthermore, the fully constrained model resulted in the highest SSR value (Table 2) as compared to the unconstrained and partially constrained models. This indicates that the fully constrained model is too rigid and some contribution from the neighboring aromatic modes must be incorporated in any model of this region.
4.2.3 Partially constrained model. In order to account for non-random, non-unimodal spectral contributions, an unconstrained component was introduced. The partially constrained model is generated using the same initialization parameters as the fully constrained model, with the first two amide I components subjected to a Gaussian/Lorentzian shape constraint. However, to account for the non-random putatively aromatic mode contribution, a third component was introduced lacking any type of shape constraint. This unconstrained component was generated by subtracting the fitted amide I profiles from the amide I region of the UVRR spectrum of myoglobin (λex = 193 nm). The residuals were then used to initialize the unconstrained component. This unconstrained component (Fig. 3c) easily accounts for the influence of the neighboring aromatic modes as well as an additional small previously unmodeled contribution in the 1680–1690 cm−1 region of the spectra.

MCR-ALS analysis of the amide I region from all six excitation wavelengths yielded two amide I components, 1652 and 1683 cm−1 (Table 2, Fig. 3c), consistent with the presence of α-helical and non-helical structures, respectively.18 Interestingly, the mean values of 1653 (± 2) and 1686 (± 5) cm−1 (Table 1) from the NLLS analyses are quite close to predicted peak positions from MCR-ALS analysis.

4.3 MCR-ALS with NLLS optimization of Gaussian/Lorentzian shape constraints

The amide II, III and S regions were also analyzed using a partially constrained model. Each component was classified as either α-helical or non-helical according to previous studies7,14,44 (Table 3). Two amide bands were resolved in the amide II region, occurring at 1520 and 1550 cm−1, respectively (Fig. 4). These values are similar to those obtained by Huang et al. (1525 and 1552 cm−1) for α-helices.3 A unique non-helical component was not resolvable in the amide II region. This is likely due to the direct overlap of the amide II bands from α-helices and non-helical coil structures, which are both predicted to occur at 1552 cm−1.3 As with the amide I region, the predicted peak positions from MCR-ALS compare well with the mean values from the NLLS analyses, which are 1517 (± 3) and 1553 (± 2) cm−1 (Table 1).
Column 1: Spectral reconstruction of the amide I, II, III and S regions from the sum () of the individual components () from the partially constrained model, multiplied by the cross-section at 193 nm excitation. The original spectrum () and residuals () are shown in black. Column 2: estimated cross-section for the α-helical (black) and non-helical (gray) portions of the protein. The component at 1550 cm−1, which is thought to have contributions from both the α-helical and non-helical structures, is shown in both gray and black. Cross-sections (σ) are expressed in units of millibarns/(molecule·steradian).
Fig. 4 Column 1: Spectral reconstruction of the amide I, II, III and S regions from the sum (ugraphic, filename = b814392g-u1.gif) of the individual components (ugraphic, filename = b814392g-u3.gif) from the partially constrained model, multiplied by the cross-section at 193 nm excitation. The original spectrum ([thick line, graph caption]) and residuals ([dash dash, graph caption]) are shown in black. Column 2: estimated cross-section for the α-helical (black) and non-helical (gray) portions of the protein. The component at 1550 cm−1, which is thought to have contributions from both the α-helical and non-helical structures, is shown in both gray and black. Cross-sections (σ) are expressed in units of millibarns/(molecule·steradian).
Table 3 Results from MCR-ALS fitting of the amide regions using a partially constrained model
  Center/cm−1 Width/cm−1 % Lorentzian Classification
Amide III 1254 32 0 Coil/turn
1298 54 0 α-Helix
1345 40 0 α-Helix
 
Amide S 1397 44 74 Coil/turn
 
Amide II 1520 40 0 α-Helix
1550 38 100 α-Helix, coil/turn
 
Amide I 1652 37 0 α-Helix
1682 44 72 Coil/turn


The amide III region warrants special attention, as it is by far the most spectrally congested, with α-helical, β-turn and disordered structures present in the protein, this region could have up to seven different components.2,3,18 It has been shown that the amide III position is dependent upon the ψ dihedral angle;45 Huang et al. resolved amide III bands for β-turn and coil structures at 1244 and 1253 cm−1, respectively.3 Thus, this region was modeled with both three and four constrained components plus one unconstrained component. Models with >4 components generally failed to converge to reasonable solutions indicating that there was insufficient resolution in these data to resolve discrete β-turn and unordered amide III profiles. It is likely that the excitation profiles of the β-turn and unordered structures are too overlapped to be resolved, given the limited number of excitation wavelengths in this study.

The four-component model (three constrained, one unconstrained) yielded three peaks at 1254, 1298 and 1345 cm−1 (Fig. 4). The peaks at 1298 and 1345 cm−1 have been assigned previously to α-helical structures,3,18 while the contribution at 1254 cm−1 is consistent with published unordered structure contribution assignments.3 It has been proposed that α-helical structures have an additional lower frequency amide III band at ∼1263 cm−1.4,26 Indeed, the main α-helical band of poly(L-lysine) at 1293 cm−1 has a shoulder at 1276 cm−1. Most likely, this band is not resolvable from the main band at 1298 cm−1 because their excitation profiles are too similar to be uniquely resolved. Unlike the amide I and II regions, the predicted peak positions are slightly shifted from the mean values of the NLLS analyses (Table 1). For example, the mean value of the highest frequency component from NLLS analyses is 1333 (± 3) cm−1, 12 cm−1 downshifted from the position predicted by MCR-ALS. However, previous studies report that the higher frequency amide III component occurs at ∼1340 cm−1,3,18,44 closer to the value predicted by the MCR-ALS model.

As the amide S band is only resonance enhanced in non-helical structures,14,46 only one amide S component was expected in this region. Therefore, the amide S region was modeled using one constrained and one unconstrained component (Fig. 4). The constrained component was centered at 1397 cm−1, which is 6 cm−1 higher than the mean of the NLLS analyses (Table 1, Fig. 4). It should be noted that the contribution from non-helical conformations is stronger at lower excitation wavelengths, and at these wavelengths, NLLS does predict an amide S band at 1397 cm−1 (Table 1). These results suggest that simultaneous analysis of bilinear UVRR spectra using MCR-ALS with shape constraints provides a distinct advantage over sequential analysis by NLLS.

4.4 Pure α-helical and non-helical UVRR spectral profiles

An advantage of bilinear methods is the ability to produce estimates of the resolved pure component spectral profiles. Estimates of the pure spectral profiles for the α-helical and non-helical portions of myoglobin at each excitation wavelength are shown in Fig. 5. The amide II band at 1550 cm−1 has contributions from both α-helical and non-helical structures. Therefore, the fractional intensity of the α-helix and non-helical 1550 cm−1 bands were estimated relative to the amide I excitation profile. Interestingly, at 193 nm, 63% of the 1550 cm−1 band intensity was estimated to arise from α-helical structure and 37% from non-helical structures. At 210 nm, 82% of the intensity was estimated to arise from α-helical structure and only 18% from the non-helical structures, illustrating the greater selectivity for α-helical spectral signatures at longer wavelengths and non-helical structures at shorter wavelengths.
Estimated pure UVRR spectral profiles of the (a) α-helical and (b) non-helical portions of myoglobin at each excitation wavelength.
Fig. 5 Estimated pure UVRR spectral profiles of the (a) α-helical and (b) non-helical portions of myoglobin at each excitation wavelength.

The estimated α-helical profiles are qualitatively similar to previous estimates of the pure spectral profiles from multi-protein studies.2,3 In addition, the estimated α-helical profiles are strikingly similar to the UVRR spectrum of poly(L-lysine) in its α-helical conformation at high pH and in 40% trifluoroethanol (TFE), an α-helix promoting solvent that also dehydrates the protein.18 Likewise, the non-helical pure spectral profiles are qualitatively similar to the unordered UVRR profiles resolved previously,2,3 although, the non-helical amide I band is shifted about 20 cm−1 higher than the amide I band associated with unordered structures.2,3

4.5 Estimation of α-helical and non-helical cross-sections

The resolved excitation profiles were converted to cross-sections using eqn (8). The calculated α-helical and non-helical cross-sections (Fig. 4, column 2) both increase with decreasing excitation wavelength, consistent with previous observations.25,26 The non-helical cross-sections exhibited a steep increase from 203 to 197 nm, similar to the estimated cross-sections25 and absorption profile24 of poly(L-lysine) in the random coil conformation. The α-helical cross-sections increased more gradually from 210 to 193 nm, similar to the estimated cross-sections26 of a 21-residue, primarily alanine, peptide and the absorption profile of poly(L-lysine)24 in their α-helical conformations. However, the α-helical cross-sections calculated in this study lacked a shoulder around 205 nm, which was observed by Sharma et al.26 This is probably due to the limited number of excitation wavelengths. In addition, at 192 nm, the molar extinction coefficient of α-helices is approximately half that of unordered or β-sheet structures.24 This is reflected in the calculated cross-sections at the lowest excitation wavelength (193 nm) of the α-helical bands at 1298 and 1652 cm−1 with respect to the non-helical bands at 1254 and 1682 cm−1.

5. Conclusion

Through the use of a partially constrained MCR-ALS algorithm, we have demonstrated the successful deconvolution of the amide I, II, III and S regions of the two-dimensional multi-excitation UVRR spectrum of myoglobin without prior subtraction of the aromatic bands. In effect, we were able to identify ‘pure’ excitation wavelength-dependent α-helical and non-helical-derived UVRR spectral features for the first time and estimate the pure UVRR spectral profiles of the α-helical and non-helical portions of myoglobin at each excitation wavelength (Fig. 5). Additionally, we managed to remove the user-defined and instrumentation-based biases that occur with traditional spectral fitting methods. This represents a significant advancement from standard NLLS fitting methods, which commonly require prior knowledge of peak position, shape, etc., as well as rigid constraints to achieve reproducible results. The NLLS implementation of chemically relevant Gaussian/Lorentzian shape constraints merges the veteran NLLS fitting methods with the advantages of multivariate analysis. With the exception of the amide II band at 1550 cm−1, each amide component was assigned to either α-helical or non-helical secondary structures. Last, this study highlights the potentially powerful tool of incorporating multiple excitation wavelengths in resonance Raman spectroscopy to aid in the resolution of highly overlapped spectral components, with minimal user input.

Acknowledgements

The authors would like to like the thank Professor Thomas G. Spiro and his lab for providing the UVRR spectra used in this study and Dr. Jason W. Cooley for reviewing the many drafts of this paper. This project was funded in part by the University of Missouri Research Council.

References

  1. R. A. Copeland and T. G. Spiro, J. Am. Chem. Soc., 1986, 108, 1281–1285 CrossRef CAS.
  2. Z. Chi, X. G. Chen, J. S. W. Holtz and S. A. Asher, Biochemistry, 1998, 37, 2854–2864 CrossRef CAS.
  3. C. Y. Huang, G. Balakrishnan and T. G. Spiro, J. Raman Spectrosc., 2006, 37, 277–282 CrossRef CAS.
  4. A. V. Mikhonin, S. V. Bykov, N. S. Myshakina and S. A. Asher, J. Phys. Chem. B, 2006, 110, 1928–1943 CrossRef CAS.
  5. J. C. Austin, T. Jordan and T. G. Spiro, in Biomolecular Spectroscopy, ed. R. J. H. Clark and R. E. Hester, Wiley & Sons Ltd., New York, 1993, vol. 20, pp. 55–127 Search PubMed.
  6. A. Barth and C. Zscherp, Q. Rev. Biophys., 2002, 35, 369–430 CrossRef CAS.
  7. S. Krimmand J. Bandekar, in Advances in Protein Chemistry, ed. C. B. Anfinson, J. T. Edsall and F. M. Richards, Academic Press, New York, 1986, vol. 38, pp. 181–365 Search PubMed.
  8. A. T. Tu, in Spectroscopy of Biological Systems, ed. R. J. H. Clark and R. E. Hester, Wiley & Sons Ltd., New York, 1986, vol. 13, pp. 47–111 Search PubMed.
  9. G. Balakrishnan, C. L. Weeks, M. Ibrahim, A. V. Soldatova and T. G. Spiro, Curr. Opin. Struct. Biol., 2008, 18, 623–629 CrossRef CAS.
  10. R. A. Copeland and T. G. Spiro, Biochemistry, 1985, 24, 4960–4968 CrossRef CAS.
  11. J. S. W. Holtz, J. H. Holtz, Z. H. Chi and S. A. Asher, Biophys. J., 1999, 76, 3227–3234 CAS.
  12. T. Miura, A. Hori-i, H. Mototani and H. Takeuchi, Biochemistry, 1999, 38, 11560–11569 CrossRef.
  13. A. Ozdemir, I. K. Lednev and S. A. Asher, Biochemistry, 2002, 41, 1893–1896 CrossRef CAS.
  14. Y. Wang, R. Purrello, T. Jordan and T. G. Spiro, J. Am. Chem. Soc., 1991, 113, 6359–6368 CrossRef CAS.
  15. G. Balakrishnan, Y. Hu, G. M. Bender, Z. Getahun, W. F. DeGrado and T. G. Spiro, J. Am. Chem. Soc., 2007, 129, 12801–12808 CrossRef CAS.
  16. I. K. Lednev, A. S. Karnoup, M. C. Sparrow and S. A. Asher, J. Am. Chem. Soc., 1999, 121, 8074–8086 CrossRef.
  17. I. K. Lednev, A. S. Karnoup, M. C. Sparrow and S. A. Asher, J. Am. Chem. Soc., 2001, 123, 2388–2392 CrossRef CAS.
  18. R. D. JiJi, G. Balakrishnan, Y. Hu and T. G. Spiro, Biochemistry, 2006, 45, 34–41 CrossRef CAS.
  19. C. Y. Huang, G. Balakrishnan and T. G. Spiro, Biochemistry, 2005, 44, 15734–15742 CrossRef CAS.
  20. N. Haruta and T. Kitagawa, Biochemistry, 2002, 41, 6595–6604 CrossRef CAS.
  21. M. Xu, V. A. Shashilov, V. V. Ermolenkov, L. Fredriksen, D. Zagorevski and I. K. Lednev, Protein Sci., 2007, 16, 815–832 CrossRef CAS.
  22. V. A. Shashilov, V. V. Ermolenkov and I. K. Lednev, Inorg. Chem., 2006, 45, 3606–3612 CrossRef CAS.
  23. V. A. Shashilov, M. Xu, V. V. Ermolenkov and I. K. Lednev, J. Quant. Spectrosc. Radiat. Transfer, 2006, 102, 46–61 CrossRef CAS.
  24. K. Rosenheck and P. Doty, Proc. Natl. Acad. Sci. U. S. A., 1961, 47, 1775–1785 CAS.
  25. R. A. Copeland and T. G. Spiro, Biochemistry, 1987, 26, 2134–2139 CrossRef CAS.
  26. B. Sharma, S. V. Bykov and S. A. Asher, J. Phys. Chem. B, 2008, 112, 11762–11769 CrossRef CAS.
  27. R. Tauler, Anal. Chim. Acta, 2007, 595, 289–298 CrossRef CAS.
  28. R. Gargallo, R. Tauler and A. Izquierdo-Ridorsa, Anal. Chem., 1997, 69, 1785–1792 CrossRef CAS.
  29. R. Gargallo, M. Vives, R. Tauler and R. Eritja, Biophys. J., 2001, 81, 2886–2896 CAS.
  30. M. Vives, R. Gargallo and R. Tauler, Anal. Biochem., 2001, 291, 1–10 CrossRef CAS.
  31. J.-H. Wang, P. K. Hopke, T. M. Hancewicz and S. L. Zhang, Anal. Chim. Acta, 2003, 476, 93–109 CrossRef CAS.
  32. C. A. Holden, S. S. Hunnicutt, R. Sánchez-Ponce, J. M. Craig and S. C. Rutan, Appl. Spectrosc., 2003, 57, 483–490 CrossRef CAS.
  33. R. Tauler, A. K. Smilde, J. M. Henshaw, L. W. Burgess and B. R. Kowalski, Anal. Chem., 1994, 66, 3337–3344 CrossRef CAS.
  34. S. Navea, A. de Juan and R. Tauler, Anal. Chem., 2002, 74, 6031–6039 CrossRef CAS.
  35. S. Nigam, A. de Juan, R. J. Stubbs and S. C. Rutan, Anal. Chem., 2000, 72, 1956–1963 CrossRef CAS.
  36. G. Balakrishnan, Y. Hu, S. B. Nielsen and T. G. Spiro, Appl. Spectrosc., 2005, 59, 776–781 CrossRef CAS.
  37. R. Bro and N. D. Sidiropoulos, J. Chemom., 1998, 12, 223–247 CrossRef CAS.
  38. H. A. L. Kiers, J. Chemom., 2000, 14, 105–122 CrossRef CAS.
  39. R. Tauler, B. Kowalski and S. Fleming, Anal. Chem., 1993, 65, 2040–2047 CrossRef CAS.
  40. E. Bezemer and S. C. Rutan, Chemom. Intell. Lab. Syst., 2006, 81, 82–93 CrossRef CAS.
  41. S. P. A. Fodor, R. A. Copeland, C. A. Grygon and T. G. Spiro, J. Am. Chem. Soc., 1989, 111, 5509–5518 CrossRef CAS.
  42. X. Zhao, R. Chen, C. Tengroth and T. G. Spiro, Appl. Spectrosc., 1999, 53, 1200–1205 CrossRef CAS.
  43. Z. H. Chi and S. A. Asher, Biochemistry, 1998, 37, 2865–2872 CrossRef CAS.
  44. S. H. Song and S. A. Asher, J. Am. Chem. Soc., 1989, 111, 4295–4305 CrossRef CAS.
  45. S. A. Asher, A. Ianoul, G. Mix, M. N. Boyden, A. Karnoup, M. Diem and R. Schweitzer-Stenner, J. Am. Chem. Soc., 2001, 123, 11775–11781 CrossRef CAS.
  46. T. Jordan and T. G. Spiro, J. Raman Spectrosc., 1994, 25, 537–543 CAS.

Footnote

Present address: University of Washington, Department of Chemistry, Seattle, WA, USA. E-mail: balki@u.washington.edu; Tel: +1 206 685 4793

This journal is © The Royal Society of Chemistry 2009