X-ray snapshots reveal conformational influence on active site ligation during metalloprotein folding

Parameters of local heme structure and overall conformation are tracked to reveal conformational influences on ligation states.


Introduction
Protein functions are commonly established through their three-dimensional conformation, which is achieved through foldinga process that is driven by a delicate balance of forces arising from hydrogen bonding, electrostatics, hydrophobicity, and interactions with the environment. 1 A perturbation to the balance between these forces may change the protein conformation and folding state, and in turn, regulate functions. In certain cases, a change of conformation can lead to misfolding, which oen results in protein aggregation and precipitation, processes commonly involved in neurodegenerative disorders. 2 While most protein structures are controlled by interactions between the constituent amino acids and their environments, metalloproteins have additional forces arising from the metalligand coordination, which anchor cofactors and amino acids to metal sites in the protein and therefore participate in determining the protein tertiary structure. 3,4 As such ligation state changes are frequently involved in regulating native metalloprotein function, it is crucial to understand the interplay between transient backbone conformational dynamics and ligand binding properties. 5 Cytochrome c (cyt c) is a 104-residue heme protein with dual functions in both mitochondria electron transport chain and cell apoptosis. The former function involves cyt c adopting a folded state with six atoms coordinating to the iron center forming an octahedral environment: four nitrogens from the protoporphyrin IX, a nitrogen from His18, and a sulfur from Met80 (Fig. 1). The latter function, related to apoptosis, 6 involves the loss of the iron-sulfur bond, which allows cyt c to function as a peroxidase. 7 The regulation of the dual functions of cyt c is carried out through the stability of the Fe-S bond, which has been observed to be sensitive to the protein tertiary structure. Specically, the bond strength of Fe(II)-S itself is on the order of thermal uctuation (DH ¼ 2.6 kcal mol À1 ) and should therefore not allow a stable ligation in the reduced native folded state. 5 However, recent experiments on folded cyt c derived from ultrafast X-ray spectroscopy 5 and computational simulations 8 have indicated that the Fe(II)-S bond is sustained by a $4 kcal mol À1 entatic stabilization provided by the protein structure, likely a hydrogen bond network. While the entatic state considerations have claried how the Fe(II)-S bond is maintained, it is still unknown what degree of tertiary structure folding is required to achieve the entatic stabilization, and whether partially unfolded states can maintain the native bond.
Cyt c has historically been utilized as a model system for investigation of the interplay between the active site and the protein conformation, as the strong interactions between the heme and the protein backbone regulate the protein structure and function. 4,10 One of the most common methods to investigate the folding dynamics of cyt c is by photolysis of carbon monoxide (CO) ligated to the reduced heme of the protein. In CO-bound cyt c the native Met80 ligation is replaced by CO ligand under 4-5 M guanidine hydrochloride (GuHCl) denaturant condition, resulting in the protein assuming a partially unfolded state. 11 By exciting the p-p* transition of the heme, CO dissociates from the heme in <1 ps, 12,13 which triggers binding of other residues to the vacant site and associated conformational changes in the protein. Characterization of ligand binding and folding kinetics triggered by CO photolysis has been previously carried out with various indirect probing techniques such as optical transient absorption (OTA), 11,14,15 time-resolved tryptophan uorescence, 16 time-resolved circular dichroism (TRCD), 17,18 time-resolved magnetic circular dichroism (TRMCD), 18 and transient grating (TG). 15 The general folding pathway observed following CO photolysis is believed to involve some of the unfolded population adopting Fe(II)-Met80 ligation on a timescale of 2-40 ms, while other portions of the unfolded population undergo Fe-His26/33 nonnative ligation (misligation) on timescales of 40-400 ms. The Fe(II)-Met80 ligated population then proceeds to form the native state structure on a timescale of 200 ms to 1 s. However, despite the plethora of methods used to probe the kinetics of cyt c folding following photolysis, it is still unknown how the Fe(II)-S bond forms so quickly in a purportedly unfolded structure. Furthermore, it is not well understood how the intermediate structures differ from each other in the tertiary fold to allow for the different ligations to occur.
In order to answer these questions, we utilized two parallel time-resolved techniques that can directly characterize both active site and tertiary folded structures. We perform X-ray transient absorption (XTA) spectroscopy and time-resolved Xray solution scattering (TRXSS) to investigate the time evolution of the active site structure and the backbone structure, respectively, and discuss how the interplay between local and global structure affects the protein's folding process. XTA is a compelling method to directly observe local metal active site structure by specically probing the metal center electronic state and the geometry of its surrounding atoms. [19][20][21] This method has primarily been applied to study myoglobin and its model complexes, 22,23 including those initiated by CO photolysis, 24,25 in order to assign electronic transitions and determine bond distances between the heme iron and both the porphyrin nitrogens and axial ligands. Whereas these studies focused on local ligand dynamics, the current work expands the applications of XTA spectroscopy towards studies of protein folding, as well as new metalloproteins beyond myoglobin. TRXSS, on the other hand, offers a method to directly probe the overall folding state of the protein in solution. The method allows for direct investigation of lowresolution structure (in folded proteins), 26 radius of gyration 27 and exibility of the protein. 28 TRXSS is especially suitable for the current study because the signal arises directly from the changes in the protein structure. Moreover, it allows data acquisition in solution phase, which does not limit the conformational space of protein, unlike crystallographic methods. The ndings in XTA and TRXSS experiments are linked to reveal how the native active site is stabilized, and the degree of protein unfolding that is required to maintain the stabilizing effect during folding.

Sample preparation
Equine heart cyt c was purchased from Sigma-Aldrich and used without further purication. Cyt c was dissolved in a buffer with 50 mM phosphate and 4.0 M GuHCl at a concentration of 6 mg mL À1 ($0.5 mM), with the exception that an 18 mg mL À1 concentration was used to probe time delays <4 ms in the TRXSS experiment. The pH value was adjusted to 7.0 using a small amount of 1 M hydrochloric acid or sodium hydroxide solution. A few drops of polypropylene glycol (PPG) were added to suppress foam formation. The solution was rst purged with nitrogen for 25 minutes to remove oxygen. Sodium hydrosulte was then added in excess to ensure complete reduction of cyt c. Finally, the solution was bubbled with pure carbon monoxide (CO) for 30 minutes before the experiments started to convert cyt c to CO-bound cyt c. At this temperature and GuHCl concentration, CO-bound cyt c is mostly unfolded, while the COfree cyt c is fully folded. 29 Fig. 1 Structure of cytochrome c based on crystal structure (PDB entry 1HRC 9 ). In the native state, the Met80 and His18 residues are ligated to the heme iron, allowing the protein to act in the mitochondrial electron transport chain.

X-ray transient absorption (XTA) measurements
XTA was performed at beamline 11-ID-D of Advanced Photon Source (APS), Argonne National Laboratory. A detailed instrument design has been described elsewhere. 24,25 The sample was excited with a 527 nm laser pulse at the Q-band of the heme in cyt c. 11 The 3 kHz, 527 nm laser pulse was generated the same way as reported in the previous publication done at the same beamline. 30 The sample was probed by monochromatic X-ray pulses (6.536 MHz, $80 ps fwhm) in a standard 24-bunch operating mode with $10 6 photons per pulse evenly separated by 153 ns. The excited volume exits the probing point as the sample jet ows. Throughout experiments, the solution was kept under CO environment by gently bubbling CO through the solution. The sample integrity was monitored by comparing the two pre-edge peaks of CO-bound cyt c to the reference scan, and samples were replaced promptly once a deviation with the reference occurs. We separately measured the steady-state XANES spectrum for the native conformation and ligation states with Fe(II) heme, which matches well with the spectrum reported in literature. 31 The difference signal (laser-on-laser-off) was decomposed using Singular Value Decomposition (SVD) method, which le spectral components that have kinetic traces with mixed time constants. We further utilized Global Analysis (GA) which requires the spectral components to evolve following a kinetic model. The resulting species-associated difference spectra were then assigned as in the Results section. During assignment, we simulated the difference signal using FEFF9.6 (ref. 32) soware and compared it to the species-associated difference spectra. Details of the experiment, data processing, GA, and simulation can be found in ESI. †

Time-resolved X-ray solution scattering (TRXSS) measurements
TRXSS was performed at BioCARS 14-ID-B beamline of APS. The pump-probe TRXSS experiment setup and data acquisition methodology at BioCARS have been published previously. [33][34][35] To reduce oxidation, the sample was kept in a reservoir with a nitrogen ow over the surface. The sample was delivered by a syringe pump into a custom built, temperature-controlled capillary ow cell. 34 The Q-band was excited by laser pulses with a pulse duration of 7 ns at 532 nm. 11 The measurements were performed at 25 C. The full detail of the experiment, including data analysis methods, can be found in ESI. †

Overview of raw data
Transient iron K-edge X-ray absorption signals as a function of the delay time between the laser excitation and X-ray probe pulses from 1 ns and 60 ms were collected with a sampling period of 153 ns dened by the X-ray pulse train from APS at the repetition rate of 6.536 MHz. The time evolution of the resulting difference XANES signals (laser-on-laser-off) at select time delays, along with the ground (Fe-CO) state, are shown in Fig. 2a, showing the most pronounced difference signals at certain energy regions where the reaction kinetics were extracted and analyzed. The XTA difference signal at 7124 eV corresponds to the red-shi of the iron K-edge from that of the ground state spectrum. The XTA difference signal at 7135 eV, reects the intensity change of the second peak aer the edge. The kinetics traces extracted from regions near these two energies are shown in Fig. 2c. Immediately aer the excitation, the difference signal points to a large edge shi at 7124 eV. This peak sharply declines until about 10 ms, where it starts to decay slowly. In contrast, the 7135 eV trace shows growth until 10 ms, followed by a slower decay. While the signal evolution during the early sub-20 ms reects the events corresponding to the ligation dynamics of the heme, the slower decay at later time delays corresponds to the excited sample owing out from the probed region (see Experimental methods).
While the Fe(II) heme active site structures were followed by XTA, the global protein conformation along the course of the refolding aer the CO dissociation was followed by TRXSS. The scattering difference signals free of the buffer heating contribution were obtained by using standard procedures as in previous works (see ESI for details †). [34][35][36][37] The resulting difference signals containing protein only contributions at selected time delays are shown in Fig. 2b. The kinetic traces obtained by the time evolution of the scattering signals integrated in the SAXS (0.03 < q < 0.07Å À1 ) and the WAXS (0.08 < q < 0.35Å À1 ) regions, respectively, revealed processes taking place on multiple time scales and different kinetics in the SAXS and WAXS regions (Fig. 2c). The WAXS signal rises from 500 ns up to 10 ms with a trajectory similar to that of 7.135 keV from XTA data and remains constant until 1 ms, which is followed by a decay up to 10 ms and plateauing on the 100 ms time scale. By contrast, the SAXS region indicates a loss of intensity appearing as a stepwise process in the time window between 10 ms and 1 ms, which is followed by a decay up to 50 ms and plateauing at later time scales.
These observations suggest that there are multiple local and global conformational states involved in the folding process. Previous spectroscopic works identied that Met80 binding happens on the time scale of <10 ms, histidine binding in between 10 ms and 1 ms, and overall folding about 10 ms. 11,15,17,18,35,38,39 Since our raw data shows events with time scales in agreement with the literature values, we utilized a kinetic model derived from these assignments for our global analysis (GA), which is shown in Scheme 1. The model contains the initial photoproduct which undergoes two parallel paths leading to a native, methionine-bound state (Fe-Met80) and a non-native, histidine-bound (Fe-HisX, X ¼ 26 or 33) state, respectively. The Fe-Met80 state then forms the native folded conformation, while the Fe-HisX simply decays back to the Fe-CO ground state. Compared to previous studies, we had to add an additional intermediate state, a cyt c with a pentacoordinated heme Fe*, which was observed based on the XTA results, as discussed below. The presented kinetic model was used in GA, allowing us to extract of time scales for each transition, as well as time-independent species-associated signals which are then used for modeling and structural assessment.
Since the two techniques probed structural dynamics on different time scales, GA were carried out separately. Based on the autocorrelation function for the le singular vector in singular value decomposition (SVD), which indicates two and three signicant components for XTA and TRXSS datasets, respectively, we identied two species-associated absorption spectra and three species-associated difference scattering patterns aer GA tting (see ESI †). Additionally, the best t values for the time scale of each transition obtained from GA are summarized in Scheme 1. Below we describe each component and assign their ligation and conformational states.

X-ray transient absorption
The starting Fe-CO (ground) state. The Fe-CO state represents the starting ground state of the system, prior to the photolysis of the CO ligand. The XANES spectrum, k-space EXAFS spectrum, and R-space EXAFS spectrum of the Fe-CO state are shown in Fig. 3, top row. The XANES spectrum contains two pre-edge peaks at 7112.6 and 7115.1 eV, originating from the low spin Fe(II) 3d 6 conguration imposed by the strong eld CO ligand. The peak positions and the energy splitting between the two peaks are common among Fe(II)-CO heme compounds 25,40 and proteins, 24,41 with the lower energy peak assigned to the 1s / e g transition and the other the 1s / p* transition. 40,42,43 The transition edge energy, dened as the rst inection point of the spectrum, is at 7120.3 eV. The overall XANES spectral features and observed energies are in accordance with those observed for CO-bound myoglobin with a similar coordinating environment. 24 The EXAFS portions of the data were tted with appropriate theoretical signals generated using FEFF in the ATHENA/ ARTEMIS platform, 44 using the heme structure from the Protein Data Bank entry 1HRC 9 as template (Scheme 2) to extract structural information. The details of the structural analysis are outlined in the ESI. † EXAFS tting retrieved structural parameters ( Table 1) that generally agree with those observed previously, with a 1.73Å Fe-C(CO) distance. 45 The rst GA species. The rst species-associated signal resulted from GA has the highest population at the earliest time delays and monotonically decays with two time constants (2.1 AE 0.24 ms and 15 AE 8.0 ms). Since the previous studies reported no protein residues bind extensively to the heme at this short time delay, the possible heme structure for this species would be penta-coordinated or with a water axial ligation. We compared this signal to the FEFF-simulated XANES difference signals of the two candidate ligation states respect to the CO-ground state; the Fe-H 2 O signal matches better with the experimental difference spectrum than that from a penta-coordinated The reconstructed XANES spectrum of this state is shown in Fig. 3a, second row. The ground state (CO-bound) signal and the species-associated difference signal (Fig. 3a, lower panels) from GA with an excited state fraction of $0.49 (see ESI †) were used to reconstruct the total X-ray absorption spectrum. The XANES spectrum shows no apparent peaks in the pre-edge region which is in agreement with a more centrosymmetric Fe(II) coordinating geometry. This geometry stands in contrast to the plausible penta-coordinated state, which contains centroasymmetry due to the vacancy le by CO departure. The edge energy is at 7120.3 eV. It does not have an apparent shoulder at the edge ($7123 eV), which again points to an octahedral geometry. 46 The overall shape features enhanced intensity at 7128 eV, an indication that the Fe(II) is still in a high-spin state. 47 These observations suggest that the spectrum should be assigned to a Fe-H 2 O state.
Fittings of the k-space and R-space EXAFS spectra are shown in the third row of Fig. 3b and c, respectively. In EXAFS analysis, the oxygen was best t to 2.34Å from the heme iron (Table 1), indicating a weakly interacting ligation. The Fe-N distance is prolonged (2.05Å) compared to the CO-bound state, signaling the well-known heme doming effect is present, 24,25 and that the iron stays in high-spin state, consistent with the observations from XANES analysis. The second shell Fe-C a , Fe-C m , and Fe-C b distances are close to those of Fe-CO state, indicating a rather rigid heme structure.
The Fe* state right aer the CO dissociation. As the rst GA species was assigned to a water-bound state, whether the water can enter the heme pocket and bind to heme at 1 ns (the rst time point) becomes the next question. In Jones and coworkers' report, they argued there is no ligand binding in less than 10 ns. 11 In similar CO photolysis studies done on myoglobin and hemoglobin chains, the entry rate of water to the distal side of the heme pocket was found to be on the order of 100 ns, 48 two orders of magnitude slower than the earliest time delay in the XTA dataset. Therefore, we assumed that at 1 ns the heme is in a penta-coordinated (Fe*) state and separated the total signal using the excited state fraction for analysis (see ESI †).
The most apparent change in the XANES spectrum (Fig. 3a, third row) is the collapse of the two sharp pre-edge features and a signicant red-shi of the Fe K-edge energy from those seen in the starting Fe-CO state. Several structural changes could contribute to the change, (1) the low-spin to high-spin transformation of Fe(II) due to the departure of the strong eld ligand CO, (2) the breakage of Fe-C(CO) bond, and (3) the disappearing p-backbonding from Fe(II) to CO. 22,24,25 The reconstructed XANES spectrum showed an edge estimated at 7117.9 eV, 2.4 eV lower than that in CO-bound and in Fe-H 2 O state. The shi in edge energy is expected when the electronic environment near iron switches from a low-spin, octahedral geometry to a highspin, square pyramidal geometry. 25,46,47 In addition, the shoulder at 7123 eV is also consistent with the signal from a square pyramidal coordination geometry. 46 Fittings of the k-space and R-space EXAFS spectra are shown in the third row of Fig. 3b and c, respectively, and the structural parameters are summarized in Table 1. The Fe* state has elongated average Fe-N distances (from 2.02Å to 2.05Å), as expected from the heme doming observed in CO photolysis experiments on myoglobin. 24,25 Longer Fe-N distances also suggest that the iron is in the high-spin state with electron occupation in the molecular orbitals with higher energies. Fe-C a distance shortened from 3.11Å to 3.06Å, while Fe-C m and Fe-C b distances stayed largely unchanged, indicating some distortion of the macrocycle before returning to the values found in the Fe-H 2 O state.

Scheme 2 General model showing scattering paths included in EXAFS
fitting. Based on the distance to the heme iron, atoms of the second shell are grouped into C a , C m , and C b (see ESI †). The second GA species. The second species-associated XANES difference spectrum derived from GA (Fig. 3a, bottom row, lower panel) was assigned to a Fe-Met80 (methioninebound) state due to the observation that this spectrum rises with a time constant of 2.1 AE 0.24 ms (Fig. S5 and ESI †) which agrees with previous optical spectroscopy results. The GA signal also matches the difference spectrum generated from static measurements of Fe-CO and folded Fe(II)-cyt c (Fig. 3a, bottom  row, purple lines). However, the XANES spectrum of this Fe-Met80 state differs from the native Fe(II)-cyt c spectrum. The pre-edge feature is at 7112.8 eV, and the edge energy is at 7120.4 eV, both 0.6 eV higher than that in the folded state with the same set of coordinating atoms. The spectral shape resembles the native low-spin Fe(II) state spectrum (Fig. 3a, bottom row, purple line) but with a distinct difference at 7128 eV in intensity. These ndings indicate that the active site structure is different from that of the native state. Indeed, EXAFS analysis (bottom row of Fig. 3b and c and Table 1) revealed that in Fe-Met80 state, the Fe-S bond was best t to an unusually long distance of 2.65Å, 0.36Å longer than in its native ligation. 31 While the Fe-S distance can be >3Å in a highly transient state on a picosecond timescale, 5 most experimentally determined values for a heme-based iron-sulfur bond distance are within 2.2-2.5Å. 31,45,[49][50][51] The Fe-N distance shortened from 2.05Å to 2.03Å, indicating the heme doming effect is less pronounced, which is in agreement with an octahedral geometry. The Fe-C a , Fe-C m , and Fe-C b distances stayed roughly the same as those of CO-bound state ( Table 1).
The Fe-HisX state (see ESI †). As previously mentioned, one of the pathways of cyt c following the CO photolysis involves the adoption of a His26/His33 ligated state in parallel to the evolution of a Met80 ligated state. However, the simulated XANES difference spectra for the Fe-H 2 O and Fe-HisX states were found to be similar, which precluded the separation of a species-associated difference spectrum and subsequent structural analysis. Nonetheless, from the kinetic modeling, we retrieved a formation time constant of 15 AE 8 ms, which is in line with the value derived from TRXSS.

Time resolved X-ray solution scattering (TRXSS)
The kinetics derived from global analysis suggest that an intermediate state is populated on a time scale of 1.8 AE 0.1 ms (Fig. 4, orange), which matches the XTA result for Met80 binding time scale. For this reason, we assign this early intermediate, U M , to a Met80-bound state. Qualitatively, the scattering difference curve for this intermediate species appears as a uniform increase of intensity in the WAXS region, which indicates secondary structure formation and partial folding. Guinier analysis of the difference signal 23 indicates that U M species are more compact with a radius of gyration of R g ¼ 18.2 AE 1.2Å, compared to the ground CO-bound (U CO ) state with R g ¼ 24.6 AE 0.4Å (see ESI for details †). The partial folding of the protein is further corroborated by inspection of difference pair distribution function calculated using Bayesian Inverse Fourier Transform (BIFT), 52-54 which exhibits a gain of electron density at <10Å, indicating formation of secondary structure, and corresponding loss of electron density at longer distances, indicating the collapse of the protein (see ESI †). The observation of the collapsed state with TRXSS allows us to rule out burst phase folding into the native state. 55 The U M state converts into another state with a time constant of 6.6 AE 0.7 ms (Fig. 4, green) as predicted by our kinetic model. The time constant is in agreement with the previous folding time scales derived from transient grating measurements. 15 We compared the species-associated difference signal for this state with the static (steady-state) difference between folded cyt c and unfolded CO-bound cyt c (Fig. 5). Excellent agreement between the curves clearly conrms the correct assignment of the folded,  In parallel with the formation of U M , we observe the formation of U H species-associated with HisX (X ¼ 26 or 33) binding to the heme (Fig. 4, blue). The best t kinetic model has shown that the HisX binding is a biphasic process with two time constants: 18 AE 1 ms and 400 AE 40 ms. Such complex HisX binding kinetics have been reported previously in spectroscopic studies with similar time scales. 57 Additionally, the early time scale for HisX binding matches well with the XTA results discussed above. The biphasic binding of HisX has been proposed to arise due to separate binding of His26 and His33, each of which taking different amounts of time for ligation. From the kinetic tting we derive that the branching between fast and slow components is 0.66 AE 0.01. The U H species decays back to the ground state with a single time constant of 20.4 AE 0.7 ms due to the heme rebinding with CO. The species-associated scattering patterns indicate a large decrease in SAXS intensity, which indicates that this pathway involves the protein adopting an expanded and more disordered state. 34,58 Further unfolding of U H species compared to U CO is in line with the expanded denatured state assumed by oxidized cyt c with HisX ligation in mild denaturing conditions, which has a R g of $30Å. 37,59 The assignment of the U H signal to unfolding is further corroborated by BIFT analysis, which indicates the loss of electron density at distances spanning up to maximum dimension D max of U CO (see ESI for details †).

Overall folding scheme
Combining all information gathered above, the overall folding scheme, along with structural details near the heme, of cyt c aer CO photolysis is summarized in Scheme 1 with the rate constants now determined from our data. Aer the photolysis, the transiently populated heme site likely assumes a domed structure, followed by weak coordination of a water molecule. From here, one of the split paths leads to the longer-than-native coordination from Met80, accompanied by a partial folding, which nally arrives at the native folded conformation. By contrast, on the other path a histidine binds to the heme while the protein expands further and never reaches the folded structure. The striking difference of the outcomes points to the subtle balance between the ligation and conformations, which will be discussed below.

Interplay between conformational and ligation changes
Following the binding by water at the axial ligand site, some of the photolyzed protein population quickly adopts a Fe-Met80 ligation which also forms a partially folded structure with some secondary structure formation. Since both global analyses on XTA and TRXSS data yielded $2 ms time constants, we can link the concerted dynamics on the two spatial scales to assign the secondary structure formation as the backbone movement that pulls Met80 towards the heme. It is notable that the 2 ms Fe-S bond formation time constant here 11,15,18,39 contrasts with the tens of milliseconds Fe-S formation times observed in the cyt c refolding induced by the reduction of Fe(III) via an electron transfer process, where the starting structure is considerably more disordered. 60 It is also faster than the expected diffusion timescale (about 35 ms) required for the formation of a loop between His18 and Met80 when the diffusion process is modeled as a peptide chain undergoing random walk. 38,61 The drastic difference in the observed time constants suggests that Met80 is in proximity to the heme in the CO-bound ground state. However, the Fe-S distance in the Fe-Met80 state was determined by EXAFS to be 2.65Å, much longer than 2.29Å in the native conformation 31 indicating a very weak interaction. 5 Therefore, this state likely originates from a local structure that positions Met80 close to the heme, rather than a stable ligation from Met80 to the heme. Considering that this state ultimately forms the folded state, the local structure that supports the Met80 should be similar to that in the native conformation, namely the residues 65-85 that covers the distal side of the heme.
From the tertiary structure perspective, the Fe-Met80 state is partially folded, and the native state is only adopted on a much slower, millisecond timescale (6.6 ms À1 ). This disconnection between the formation rate of native active site ligation and native conformation therefore suggests further protein reorganization is required to fully form a stable Fe-S bond as in the native conformation, which has been suggested to be enabled by a hydrogen bond network from Tyr67 and nearby residues. 5 TRCD studies suggest that the secondary structure other than the terminal helices, including 60's helix on which Tyr67 sits, does not form until at least 5 ms aer photolysis. 17 Our results also suggest that the structure that fully supports the native bond between Fe-S is established only during the later phase of This journal is © The Royal Society of Chemistry 2019 the folding process and is in line with the recent description of protein stabilization by the hydrogen bond network.

Heterogeneous ground state structures could lead to different ligations
The thermodynamic description at the heme active site in cyt c has recently come under scrutiny. While the Fe-S bond has been thought to be stable due to its presence in crystal structure, it was recently reported to be fairly weak (DH ¼ 2.6 kcal mol À1 ). 8 In contrast, the competing non-native ligands, His26 and His33, are not only more energetically favored (DH ¼ 7.2 kcal mol À1 ) 8 as pointed out by Density Functional Theory calculations, but closer in sequence to the other axial ligand to the heme, His18, making the diffusion-induced contact rate higher. Therefore, to form the native Fe-S bond, the protein must presumably prevent other ligands from approaching and displacing Met80.
Since the CO-bound cyt c is in the disordered state, many subpopulations exhibiting different energetic and kinetic proles can exist simultaneously. As suggested by Latypov and coworkers, CO-bound cyt c could assume multiple disordered conformations. 62 Such an ensemble of disordered states may explain the parallel folding processes observed in many of the previous timeresolved studies on CO-bound cyt c. 11,15,57 Aer photolysis, the subpopulations with an intact local structure assume the Fe-Met80 ligation state, and the ligation is protected throughout the folding process. On the other hand, the subpopulations without an intact local structure should adopt a ligation state with lowest energy solely due to metal-ligand bond strength, in which case the Fe-HisX ligations dominate. 8,63 Consistent with our interpretation of Fe-HisX state detected in TRXSS experiments, a release of the sequence consisting of residues from His26/33 to Glu104 at the C-terminal would generate a large expanding and unfolding signal, sampling a larger conformational space and gaining entropy. The Fe-HisX ligation state is referred to as the misfolded "kinetic trap" during the folding process. 14,18,39 We corroborate this notation by observing this ligation state being both enthalpically and entropically favored.

Protein support for the Fe-H 2 O intermediate ligation state
A water-bound heme structure aer photolysis, determined by XTA, seems to contradict the previous works suggesting that Fe(II) does not bind water. In steady-state resonance Raman spectroscopy measurements, the non-native Fe(II) heme coordination states detected are assigned to either a bis-His or a penta-coordinated state, but not a water-bound state. 64,65 Even in CO photolysis studies, Jones and coworkers also attributed the immediate product aer photolysis to a pentacoordinated species. 11 However, it is possible that a water molecule stays bound to the heme transiently. In the TRCD study of CO-bound cyt c, Chen and coworkers could not completely rule out the water as a ligand for early ligation events. 17 In ET-initiated cyt c folding experiments, the water was observed to dissociate from the Fe(II) with a time constant of about 1 ms aer photoreduction. 60 Furthermore, the energy required for maintaining Fe(II) and H 2 O in proximity was calculated to be 2.0 kcal mol À1 , 8 a relatively small amount. Finally, Esquerra and coworkers determined the spectral change of a water molecule entering the myoglobin and hemoglobin heme pockets aer CO photolysis. 48 The intensity of the difference spectrum is 25-fold smaller than that caused by the photolysis itself. Therefore, given the multiple events during folding of cyt c that could affect the optical absorption spectrum, the Fe* and Fe-H 2 O states discovered in this work may be indistinguishable in optical studies.
For this XTA-detected ligation state, there is little TRXSS structural difference (at 500 ns), which allows us to deduce that in the CO-bound state a portion of the residue structure around the heme is supportive of a Fe-O bond. In addition, the dened bond distance without a large Debye-Waller factor s 2 implies that it is unlikely that many water molecules exchange at the binding site. Questions then arise regarding how this water was held xed in the proximity of the heme without a favorable ligation. The most probable scenario seems to be that the same local structure supporting Met80 is also responsible for the water molecule. In the crystal structure, the heme crevice of cyt c is arranged tightly with only one bound water allowed inside. 9 Given some secondary structures remain in CO-bound cyt c under 4.6 M GuHCl, the heme pocket may not have unfolded completely. The spatial restriction may have forced a single water molecule to stay close to the heme aer CO photolysis. An alternative explanation may be that the water molecule is stabilized by electrostatic effect. However, the heme does not attract water, and in the native state there is no charged group near the heme which is buried in the hydrophobic core, which renders this scenario unlikely. Yet another possibility is that the water molecule is stabilized by a hydrogen bond by forming a network as in the native state, but the associated energy for this interaction (a few kcal mol À1 ) is easily overwhelmed by any backbone reorganizations. 66 Given that the backbone in the CObound state is disordered, and that the main hydrogen bond contributor, Tyr67, does not seem to fold until later stages aer photolysis, the hydrogen bond should not be the main source of stabilization for the water. In any case, the backbone contribution plays a role in the cyt c folding process by serving as a barrier for ligands to depart or approach the heme center. In addition to the entatic interaction that stabilized Fe-S bond in the rather static, native conformation, 5 we have observed similar effects during the dynamic folding process that may be related to the change of functions in cyt c.

Future challenges in structural reconstruction for unfolded metalloproteins
Despite the retrieval of species-associated difference signals, in the discussions above, the structural reconstruction of a heterogeneous population of unfolded metalloprotein remains a great challenge limiting the interpretation of the structural change during the folding process. TRXSS experiments provides direct kinetic information on the tertiary structural dynamics by tracking the changes in electron density in the sample as an ensemble average. For systems with well-dened ground and excited states, additional structural characterization can be employed by techniques such as rigid body modeling 67,68 or shape reconstruction. 69,70 However, as previously stated, in the CO-bound cyt c the ground and excited states represent mixtures of unfolded exible structures requiring a higher-level modeling technique to accurately retrieve the shape data. One method to sample the unfolded ensemble is Molecular Dynamics (MD) simulations with enhanced sampling methods. 71,72 Unfortunately, the results of simulations on unfolded proteins are highly sensitive to force eld parameters, 73 especially in the case of metalloproteins where metal ligation plays a determining role in the conformation of the protein. 74 In general, force eld parameters for metalloproteins remain ill-dened, and this is true even in the case of cyt c. Therefore, new sets of force eld parameters for non-native ligations must be constructed and validated in order to reconstruct the candidate intermediate structures. In addition, the protocol to incorporate experimental data for sampling unfolded structures is under development. Our group is currently developing such force elds and protocol for MD simulations and separate works will be published in the future.

Conclusion
We have utilized complementary XTA and TRXSS methodologies to investigate the folding of cyt c following CO photolysis on multiple spatial and temporal scales. Our XTA results revealed four intermediate heme ligation states, Fe*, Fe-H 2 O, Fe-Met80, and Fe-HisX, and the structural parameters of the rst three states were obtained. Our TRXSS results provide evidence for the existence of parallel conformational pathways, specically a productive folding route through Met80 binding, and an unproductive pathway that proceeds through misligation of HisX. Combined, XTA and TRXSS measurements revealed new structural information on folding intermediates that has not previously been revealed by optical experiments, namely the Fe-H 2 O heme ligation state before a protein residue replaces the water, the collapsed phase intermediate in Met80 binding pathway with a prolonged Fe-S distance, and the further unfolding of the protein in the HisX binding pathway. We proposed that a local structure around heme that may spatially limit the motion of a water molecule to form the Fe-H 2 O state as well as the Met80 residue to form the Fe-Met80 state to motivate these otherwise disfavored ligations. Protected by the local structure, the Met80 remains in proximity to the heme until a later stage of folding when the bond is stabilized. We suggested that large-scale structural reorganization and loss of local structure may be the reason why some of the cyt c ensemble undergoes parallel folding pathways to the kinetic trap state, Fe-HisX. Overall, the parallel experiments of XTA and TRXSS contributes to the understanding of the interplay between ligation and conformation states in metalloprotein folding dynamics by directly probing both local and tertiary structures.

Conflicts of interest
The authors declare no competing nancial interests.