Circular dichroism for secondary structure determination of proteins with unfolded domains using a self-organising map algorithm SOMSpec

Many proteins and peptides are increasingly being recognised to contain unfolded domains or populations that are key to their function, whether it is in ligand binding or material assembly. We report an approach to determine the secondary structure for proteins with suspected significant unfolded domains or populations using our neural network approach SOMSpec. We proceed by derandomizing spectra by removing fractions of random coil (RC) spectra prior to secondary structure fitting and then regenerating α-helical and β-sheet contents for the experimental proteins. Application to bovine serum albumin spectra as a function of temperature proved to be straightforward, whereas lysozyme and insulin have hidden challenges. The importance of being able to interrogate the SOMSpec output to understand the best matching units used in the predictions is illustrated with lysozyme and insulin whose partially melted proteins proved to have significant βII content and their CD spectrum looks the same as that for a random coil.


SELCON output for BSA 100 o C 0%RC
Structure fitting output for BSA 100 o C 0%RC (random coil) using SELCON 3 (The Self-Consistent Method) with Reference dataset: SP175 via the Dichroweb server 1 is given in Table S1.

SOMSpec secondary structure prediction output for BSA
The SOMSpec secondary structure predictions for BSA as a function of temperature for each percentage RC content subtracted are shown in Figures S1. Figure S1a is for the original protein, whereas Figures S1(b-j) are the regenerated proteins that are obtained by adding back the fraction of RC removed during derandomization. Please do not adjust margins Please do not adjust margins

SOMSpec secondary structure prediction output for Lysozyme
The SOMSpec secondary structure predictions for lysozyme as a function of temperature at each percentage RC content subtracted are shown in Figures S2. Figure S2a is for the original protein, whereas Figures S2(b-j) are the regenerated proteins that are obtained by adding back the fraction of RC removed during derandomization.

SOMSpec secondary structure prediction output for insulin
The SOMSpec secondary structure predictions for insulin as a function of temperature at each percentage RC content subtracted are shown in Figures S3(a-. Figure S2a is for the original protein, whereas Figures S3(b-j) are the regenerated proteins that are obtained by adding back the fraction of RC removed during derandomization. Please do not adjust margins Please do not adjust margins   Figure S7 shows the overlay of the model specta and experimental spectra for BSA for each fraction of RC removed over the temperature range of 20 o C to 100 o C in 10 o C steps.  Figure S9 shows the overlay of the model spectra and experimental spectra for insulin for each fraction of RC removed over the temperature range of 20 o C to 110 o C in 10 o C steps. 11) Ensure the input files for SOMSPEC are in txt format, and meant to be without column labels and wavelengths' data. Please note these files must have wavelength ranges corresponding to that of the reference set to be used. The step size does not matter.

SOMSpec model and experimental spectra for insulin
12) Launch SOMSPEC and click on the Train module to train it with a reference set of secondary structures. Select the training set file named "SP175_full_240-190_5Pplus random a& 100 helix2" in txt format by clicking the "select training set file" push button. Then specify the input parameters: for this work, map size(50x50), number of iteration (50,000), number of structures (5), number of best matching units (5), wavelength range (240-190 nm). Check that the numbers on the right-hand side of the window mirror the input numbers. Afterward, hit the Train SOM button to train with the self-organizing map, and this output a pretrained SOM folder. The Module will output a folder named <SOM-50-50,000>. You can rename it to indicate the identity of the reference set used to train it.
13) Once you have a trained map you can use it again by entering the input parameters in the Train tab as above before moving to the Predict module (tab) where you select your previously trained Map.. 14) Switch to the Predict module and click the select button to choose the pretrained SOM folder (SOM-50-50,000) containing trained maps, set parameters, and training set size information. Click on the "select input spectra" button to choose the data file named, e.g., "BSA Tm wavelength (0.1-0.9)," and check the <disable scaling of spectra> box. Then, hit the run prediction button to predict spectra and secondary