Asger Berg
Thomassen
a,
Thomas L. C.
Jansen
*b and
Tobias
Weidner
*a
aDepartment of Chemistry, Aarhus University, Langelandsgade 140, Aarhus C 8000, Denmark. E-mail: weidner@chem.au.dk
bZernike Institute for Advanced Materials, University of Groningen, Groningen 9747 AG, The Netherlands. E-mail: t.l.c.jansen@rug.nl
First published on 30th May 2024
Diatoms, unicellular marine organisms, harness short peptide repeats of the protein silaffin to transform silicic acid into biosilica nanoparticles. This process has been a white whale for material scientists due to its potential in biomimetic applications, ranging from medical to microelectronic fields. Replicating diatom biosilicification will depend on a thorough understanding of the silaffin peptide structure during the reaction, yet existing models in the literature offer conflicting views on peptide folding during silicification. In our study, we employed two-dimensional infrared spectroscopy (2DIR) within the amide I region to determine the secondary structure of the silaffin repeat unit 5 (R5), both pre- and post-interaction with silica. The 2DIR experiments are complemented by molecular dynamics (MD) simulations of pure R5 reacting with silicate. Subsequently, theoretical 2DIR spectra calculated from these MD trajectories allowed us to compare calculated spectra with experimental data, and to determine the diverse structural poses of R5. Our findings indicate that unbound R5 predominantly forms β-strand structures alongside various atypical secondary structures. Post-silicification, there's a noticeable shift: a decrease in β-strands coupled with an increase in turn-type and bend-type configurations. We theorize that this structural transformation stems from silicate embedding within R5's hydrogen-bond network, prompting the peptide backbone to contract and adapt around the biosilica precursors.
Biosilica is produced by diatoms (Bacillariophyceae) for their cell walls through biosilicification from low concentration (∼70 μM) silicic acid in the ocean.8–11 One of the key advantages of biomineralized silica for applications is that the formation can occur at room temperature, neutral pH and atmospheric pressure, while the reaction is fast and leads to a biomaterial with impressive material properties.8,12,13 Thus, biosilica could be a route to an environmentally friendly production of silica nanostructures.
In 1999 Kröger et al. found that silaffins are one of the major classes of biosilica precipitating proteins in the diatom species Cylindrotheca fusiformis.14 Within this class of proteins are the silaffin-1 peptides that are derived from the Sil1p polypeptide, which proteolytically cleaves the C-terminal in vivo into 7 repeat units (R1–R7).8,15 One of these repeat units from the silaffin-1A band of Sil1p is R5 (SSKKSGSYSGSKGSKRRIL) and it has been shown that this peptide produces biosilica nanoparticles from silicic acid in vitro at a pH value of ∼7.14
The precursor for biosilica, silicic acid Si(OH)4, is a weak acid with a pKa of 9.8, so at neutral pH when R5 produces biosilica, ∼0.18% of the silicic acid is deprotonated and found as the reactive silicate species Si(OH)3O−.16 Silica is produced from a condensation reaction where silicate reacts with silicic acid and releases water to form silica in a polymerization reaction that eventually leads to larger particles.16 Silaffin peptides are thought to act as templates for this reaction, where the polyamines of post-translational modifications can improve the reaction conditions of the condensation by leading the reactants together.2
R5–biosilica particles have been applied to immobilize enzymes,17,18 encapsulate and release cargo molecules,19 as a vaccine delivery platform20 and an R5 derivative was also used to synthesize antimicrobial peptide nanoparticles.21
Regardless of the many applications and vast potential of R5–biosilica, the structure of the peptide during silicification is not yet fully understood.22 In order to achieve increased control of the process for nanotechnology applications, it is crucial to gain further insight into the structure of the peptide at the molecular level. We here use two-dimensional infrared (2DIR) spectroscopy to follow the process of biosilicification.
There have been several contradictory studies describing the structure of R5 during silicification.22 Roehrich & Drobny studied the structure of the neat R5 peptide and the composite material of R5 and silica by using solid-state NMR.23 They observed a significant change in the chemical shifts of the backbone atoms, but no patterns that would resemble a defined change in the secondary structure. Instead, they attributed the large shifts of the lysine residues near the N-terminal to being due to solvent exposure in a micelle assembly.23 These findings backed the micelle-like self-assembly model, which was previously hypothesized by Knecht & Wright due to the positively charged side chains on arginine together with the hydrophobic leucine and isoleucine in the RRIL sequence.24 In another solid-state NMR study by Buckle et al., a possible secondary structure change was found between neat R5 and the R5–SiO2 composite, possibly to increase the exposure of charged side chains in a micelle to silicate.25
The possibility of β-strands and a secondary structure change of R5 in the silica composite and the micelle model were disputed by the findings of Senior et al. in a study of the structure of R5 where they used 1D and 2D solution-state 1H-NMR, circular dichroism (CD) and dynamic light scattering (DLS).26 In their study, they concluded that R5 is unstructured and monomeric in solution and the addition of silicic acid does not induce a structural change.
Since silicification is a surface process, Lutz and coworkers studied the R5 structure during silicification at the air–water interface, by combining experiments and simulations of the surface sensitive technique sum-frequency generation (SFG).27
In contrast to the findings of Senior et al.,26 they found that in addition to undefined structures, R5 contains β-strand structures, which were reduced during silicification at the air–water interface. Since the number of interpeptide bonds was reduced during silicification they concluded that silica possibly affects the H-bond network to induce a conformational change of the backbone during silicification.27
Roeters et al. conducted an SFG study at the peptide–silica interface where they combined MD simulations with spectra calculations based on NMR dihedral angles reported by Drobny et al.22,25 They found that R5 possibly changes structure from a β-sheet to turns during silicification at the peptide–silica interface and that a micelle-type assembly is plausible. The solid-state NMR torsion angles also showed that the peptide is extended in a strand-like structure and then contracts slightly into a coiled, more turn-rich structure.23
Gascoigne and coworkers attempted to reconcile the findings from solution-state NMR with the findings from solid-state NMR by combining scattering techniques with CryoTEM images.28 From their analysis they found that the peptide assembles into a mass fractal of 100–200 nm, which is composed of spherical units of 1 nm radius.28 The authors suggested that these findings28 could unify the theory of a larger assembly from solid-state NMR25 with the monomers found in solution-state NMR.26,28
However, the study by Gascoigne et al.28 provided no further insight into the secondary structure of R5. Thus, the existing structural studies of R5 remain conflicting: the solution-state NMR study, where they show that the structure of R5 is a monomeric, random coil and does not change the secondary structure upon silicification.26 The solid-state NMR and SFG studies suggest micelle assembly with a β-strand like R5 structure and the conformation changes during silicification.22,23,25,27 The scattering study instead suggested fractal assembly.28
More in situ structural studies are thus necessary to determine the secondary structure of R5 during mineralization. Two-dimensional infrared (2DIR) spectroscopy in the amide I region is a well-established method for determining protein and peptide secondary structures. The amide I mode originates from the amide bond and is dominated by CO stretching coupled to the adjacent NH bending modes. Since the mode is found throughout the peptide backbone and is sensitive to hydrogen bonding, it is an excellent reporter of secondary structures. The strength of 2DIR lies in the measurement of amide I resonances and their couplings across two spectral axes, which significantly enriches the lineshapes with detailed structural information.30 This attribute of 2DIR makes it particularly effective for in-depth structural analysis of peptides like R5.
For each sample, spectra were recorded in parallel and perpendicular relative polarizations between the pump and probe pulses. All spectra were recorded with a waiting time between pump and probe pulses of 250 fs with a spectrum at −10 ps subtracted from each spectrum to further reduce scattering. The beam path was purged with dry nitrogen to remove water absorption lines from the spectra.
Two different starting structures of R5 were used for two different 2 μs molecular dynamics (MD) simulations with and without silicate. Each simulation contained one R5 peptide and either 6 Cl− in the neat R5 simulation or 6 Si(OH)3O− in the other simulation.
The starting peptide structures were generated by Roeters et al. from solid-state NMR-derived torsion angles reported by Roehrich et al. for lyophilized R5 before and after interaction with silica.22,23 The structures are shown in Fig. 1a. The OPLS-AA force field35 was used for the peptide since a benchmark study has shown that 2DIR spectra that are calculated from simulations using this force field provide a high overlap with experimental spectra.36 SPC/E was used for the water molecules since it predicts the diffusion coefficients of water at a higher accuracy than many other models36–38 and the model has frequently been used for MD simulations in other studies for 2DIR simulations of proteins.39–41 A well-described diffusion constant of water is important since water has a significant contribution to the lineshapes in 2DIR spectra. All the MD simulations were performed using Gromacs 2019.442–49 and the simulations mostly follow the protocol outlined by Lemkul.50 During the production run the coordinates were saved every 10 ps. Further details of the MD simulations are available in the ESI.†
![]() | ||
Fig. 1 (a) The neat R5 peptide (top) and R5 during interaction with silica (bottom). The structures are from Roeters et al.23 with torsion angles reported by Roehrich et al.24 (b) An Eppendorf tube containing the R5–silica precipitate produced by R5 with silicic acid. The R5–silica nanoparticles were separated from the solution by centrifugation. (c) SEM image of the R5–silica nanoparticles. (d) Experimental 2DIR spectra of neat R5 (left spectra) and R5–silica particles (right spectra). The top row shows the spectra, which were recorded in parallel polarization (‖) and the bottom row shows the spectra, which were recorded with perpendicular polarization (⊥). A grey square is marked below 1620 cm−1 to show the area that was excluded in the comparison with the calculated spectra since this region mainly contains side-chain modes. All the spectra are normalized to the absolute max intensity in the plotted region with contour levels that are symmetric around 0 and spaced evenly. |
Representative structures were found by clustering the frames in the 2 μs trajectories. New simulations with a length of 1 ns were made by starting from these structures with the same settings as the 2 μs simulations but instead sampling the coordinates every 20 fs. 1 ns was chosen as the simulation time since 1 ns after the representative structure all the frames were still belonging to the same cluster in the 2 μs trajectories.
Sampling the coordinates at small enough intervals is crucial since it determines the maximum calculated linewidths with the NISE program.51–57 Furthermore, the amide I frequencies fluctuate on time scales less than 100 fs due to solvent librations and hydrogen bonding fluctuations,58 so motional narrowing will occur if the sampling timestep is too large.51 If the time step is too low the computational costs and disk space requirements increase, so 20 fs timesteps have been shown in a benchmark study to be sufficiently low and provide the best balance between lineshapes and computational costs.59 Additionally, 20 fs timesteps have been used in other studies for the purpose of calculating 2DIR spectra.39,51,60,61
The backbone structures of the R5 peptide were clustered in 2 μs MD simulations based on the root-mean-square deviation (RMSD) of the backbone. TTClust62 was used as the clustering software, which utilizes SciPy's clustering functions63 using Python 3.8.8. The Ward Linkage clustering algorithm64 was employed on each of the 2 μs trajectories, where every 10th frame was used due to the heavy computational load of calculating the large RMSD matrix. Ward Linkage was chosen since it has been found in a benchmark to perform well at clustering MD trajectories to retrieve representative structures from the ensemble.65 TTClust offers an algorithm to determine the minimum number of clusters that should be included called the elbow method.66 The elbow method was used for both 2 μs simulations, which yielded 4 clusters in both cases. However, upon visual inspection two of the clusters in the neat simulation appeared highly similar, so instead the number of clusters was set to 3 for the neat simulation and two clusters merged into the third cluster. Finally, representative structures were extracted from each cluster. The representative structures are defined as the frames with the lowest average RMSD to the other frames within the cluster.62
The waiting time was set to 240 fs. The waiting time is 10 fs lower than the experimental waiting time but was chosen because the coordinates were sampled every 20 fs in the MD simulations and the calculated waiting time must be an integer multiple of the time between coordinate sampling steps. Alternatively, 12.5 fs sampling time steps in the MD simulations could be used, but that would result in significantly larger files. A 10 fs difference in waiting time should be negligible within the accuracy of the experiments. All spectra were calculated in the frequency range 1540 cm−1 to 1740 cm−1. When simulating 2DIR spectra of proteins it is a common practice to shift the frequency axes either by optimizing the overlap with the experimental spectra such that lineshapes can better be compared to experiments36,51,59 or by using a systematic shift such that the peak positions can also be compared to experiments.36,39,59,72
Cunha et al. found a systematic frequency underestimation of 9.6 cm−1 independent of the secondary structure when using the same force field and frequency map as in this work.36 Based on their findings, a systematic shift of +9.6 cm−1 was applied to the frequencies in all the calculated spectra in this paper.
Based on visual inspection, the difference between the amide I peaks in the spectra of neat R5 and R5 with silica appears small. The main difference between the spectra is that the amide I peaks in the neat spectra are rounder and the spectra with silica are more elongated along the diagonal. In addition, the amide I mode was shifted off the diagonal in the silica spectra. Since R5 is a small and flexible protein, it is known to contain little22 or no26 defined secondary structure. Therefore, it is expected that the differences between the amide I peaks are subtle. The main resonances are centred around 1645 cm−1, which is typically assigned to disordered and coiled structures. To extract more detailed structural information from the data, we must go beyond direct spectral interpretation and have therefore performed MD simulations and spectral calculations.
Theoretical 2DIR amide I spectra were calculated from the 1 ns MD simulations. The calculated spectra are summarized in Fig. 2. Note that the arginine side chain modes have a small overlap with the amide I modes, but these side chain modes have not been included in the calculations because no frequency maps are available.
The main amide I peaks of neat R5 clusters 2 and 3 are narrower than the experimental features. When comparing the simulated amide I spectra to the experimental spectra, the spectra of clusters 2 and 3 miss the intensity in the high-frequency region, whereas the intensity for cluster 1 overshoots in the high-frequency region. All the clusters are missing some width in the low-frequency side of the main amide I peak compared to the experiment, but this could be explained by the overlap between the arginine modes and the amide I modes which possibly convolutes to artificially broaden the amide I peak of the red side in the experimental spectrum. Since the arginine modes are not part of the simulation this broadening on the red side of the amide I peak is not seen in the simulated spectra.
Moving to the simulation clusters for R5 when bound to silica, the 2DIR spectra of cluster 1 qualitatively match the experimental spectra best – in both spectral position and shape. Cluster 3 also matches the resonance position of the experimental data. The peak position of R5–silica clusters 2 and 4 show a significant mismatch in the spectral position when compared with the experiment.
2DIR experiments were used to measure an ensemble of many different protein structures where the amide I modes of many proteins overlap. For this reason, we have investigated how far a linear combination of the structural clusters can capture the experimental data. To determine how much each of the clusters contributes to the experimental ensemble, we have calculated linear combinations of the 2DIR spectra from each cluster by fitting them to the experimental data.
The spectral overlaps between the calculated spectra and the experimental spectra were determined using the normalized standard deviation S2D, which compares the points in the 2D spectra with a value between 0 and 1, where 1 means a perfect match.75 The calculated data points were interpolated to match the frequencies of the experimental data points. To do this, in every iteration of the fitting procedure, the frequencies of the calculated spectra were interpolated to the experimental frequencies with the quintic spline interpolation from SciPys interp2d.63 Scipy.optimize.minimize63 was used with the Broyden, Fletcher, Goldfarb, and Shanno (BFGS) method76 to maximize the overlap by maximizing S2D. The spectra were normalized to the maximum of the absolute intensities before calculating the overlap and the fitting coefficients were normalized to the sum of the coefficients. The arginine region below (1620 cm−1, 1620 cm−1) was omitted from the fit. The spectra obtained from the fitting procedure are depicted in Fig. 3, along with the associated fitting coefficients for each cluster. Notably, the spectra, both in the parallel and perpendicular polarizations, demonstrate a balanced distribution of influence between clusters 1 and 3. This distribution corroborates the preliminary observations based on the visual inspection of the spectra.
In the fitted spectra for R5 bound to silica, for parallel polarization, the results show that cluster 1 exhibits the most significant weighting with cluster 4 and cluster 2 following with lower weights. Cluster 3 has no significant contribution. The results are similar for the spectra in perpendicular polarization. The contributions of clusters 1 and 4 to the R5–silica spectrum are approximately equivalent, each accounting for around 50% of the total spectra.
DSSP (define secondary structure of proteins) analysis of a structured ensemble can assign the secondary structure for each residue based on hydrogen bond patterns in the backbone.79
The secondary structure was assigned using DSSP with MDTraj80 for each 1 ns simulation. Besides a significant assignment to the random coil structure, there was a significant amount of defined secondary structures identified using DSSP analysis (see Fig. S1, ESI†). The structural contents in percentage were summed for both neat R5 and R5–silica, while weighted by the corresponding average fitting coefficients. Finally, the sum of neat R5 was subtracted from the sum of R5–silica to obtain the difference DSSP plot shown in Fig. 4b.
The results corroborate the Ramachandran analysis. There is a decrease in β-sheets and coils and a slight decrease in H-bond turns during silicification. In addition, an increase in bends and β-bridges (single pair of β-sheet hydrogen bonds) is observed.
It is important to note that DSSP analysis was developed for the analysis of large globular proteins.79 When exploring shorter peptides with unconventional and diverse structures, it is prudent to interpret these assignments with a grain of salt. One example is the assignment to a small amount of 310 helical motifs, which likely is based on a minority of very transient species. Nonetheless, it can be concluded that consistent with the Ramachandran plot, DSSP analysis suggests a shift toward more contracted structural poses when interacting with silica.
The change in the structure from β-strands to β-turns could be explained by the R5 backbone being more stretched out without silicate, while it bends and contracts around silicate molecules as they are added to the solution (Fig. 5).
The decrease in β-sheets during silicification could be caused by the silicate molecules disrupting the backbone H-bond network as also suggested by Lutz et al.27
In addition, DSSP analysis showed an increase in turns without H-bonds (bends), so possibly silicate binds to the residues that are assigned as β-turns without H-bonds since DSSP only counts H-bonds between backbone atoms.79
Our findings are in agreement with previous reports. The β-strand folds we identify in our work support previous findings based on solid-state NMR23,25 and SFG studies.22,27 Previous SFG and MD studies of interfacial R522,27 found a decrease in β-strands and an increase in the turn structure content during silicification, which is also seen in our data.
Furthermore, β-sheet formation in neat R5 has also been predicted by others using SFG.22,27 At the same time, the considerable extent of disordered structures we see in the 2D IR data is in agreement with the solution-state NMR results of Senior et al.26 However, in the latter study is it concluded that there are no ordered secondary structures in R5, and that the peptide stays disordered during silicification. Our data provide evidence that in addition to undefined secondary structures in R5 the peptide also folds into ordered and defined secondary structures and these structures change during silicification.
Footnote |
† Electronic supplementary information (ESI) available: Simulation details and Ramachandran plots. See DOI: https://doi.org/10.1039/d4cp00970c |
This journal is © the Owner Societies 2024 |