Giulio
Caracciolo‡
a,
Reihaneh
Safavi-Sohi‡
b,
Reza
Malekzadeh‡
c,
Hossein
Poustchi‡
c,
Mahdi
Vasighi
d,
Riccardo
Zenezini Chiozzi
e,
Anna Laura
Capriotti
f,
Aldo
Laganà
f,
Mohammad
Hajipour
b,
Marina
Di Domenico
gh,
Angelina
Di Carlo
i,
Damiano
Caputo
j,
Haniyeh
Aghaverdi
k,
Massimiliano
Papi
j,
Valentina
Palmieri
j,
Angela
Santoni
a,
Sara
Palchetti
a,
Luca
Digiacomo
a,
Daniela
Pozzi
a,
Kenneth S.
Suslick
l and
Morteza
Mahmoudi
*bk
aDepartment of Molecular Medicine, “Sapienza” University of Rome, Viale Regina Elena 291, 00161 Rome, Italy
bNanotechnology Research Center, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran, Iran. E-mail: Morteza.mahmoudi@gmail.com
cDigestive Oncology Research Center, Digestive Disease Research institute, Tehran University of Medical Sciences, Tehran, Iran
dDepartment of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences, Zanjan, Iran
eDepartment of Chemistry, “Sapienza” University of Rome, P.le A. Moro 5, 00185 Rome, Italy
fDepartment of Biochemistry, Biophysics and General Pathology, Second University of Naples, Via S.M. Costantinopoli, 16, 80138 Naples, Italy
gDepartment of Biology, Temple University's College of Science and Technology, Philadelphia, USA
hDepartment of Medico-Surgical Sciences and Biotechnologies, “Sapienza” University of Rome, Viale del Policlinico 155, 00161 Rome, Italy
iUniversity Campus Bio-Medico di Roma, General Surgery, Via Álvaro del Portillo 200, 00128 Rome, Italy
jInstitute of Physics, Fondazione Policlinico Universitario A. Gemelli, IRCCS, Università Cattolica del Sacro Cuore, Rome, Italy
kDepartment of Anesthesiology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
lDepartment of Chemistry, University of Illinois at Urbana-Champaign, 600 South Mathews Avenue, Urbana, Illinois 61801, USA
First published on 17th June 2019
The earlier any catastrophic disease (e.g., cancer) is diagnosed, the more likely it can be treated, providing improved patient prognosis, extended survival and better quality of life. In early 2014, we revealed that various types of disease can substantially affect the composition/profile of protein corona (i.e., a layer of biomolecules that forms at the surface of nanoparticles upon their interactions with biological fluids). Here, by combining the concepts of disease-specific protein corona and sensor array technology we developed a platform with disease detection capacity using blood plasma. Our sensor array consists of three cross-reactive liposomes, with distinct lipid composition and surface charge. Rather than detecting a specific biomarker, the sensor array provides pattern recognition of the corona protein composition adsorbed on the liposomes. As a feasibility study, sensor array validation was performed using plasma samples obtained from patients diagnosed with five different cancer types (i.e. lung cancer, glioblastoma, meningioma, myeloma, and pancreatic cancer) and a control group of healthy donors. Although no single corona composition is specific for any one cancer type, overlapping but distinct patterns of the corona composition constitutes a unique “fingerprint” for each type of cancer (with a high classification accuracy, i.e. 99.4%). To finally probe the capacity of this sensor array for early detection of cancers, we used cohort plasma obtained from healthy people who were subsequently diagnosed several years after plasma collection with lung, brain, and pancreatic cancers. Our results suggest that the disease-specific protein corona sensor array will not only be instrumental in the screening, detection, and identification of diseases, but may also help identify novel protein pattern markers whose role in disease development and/or disease biology has not been appreciated so far.
New conceptsIn 2014, our group introduced the concept of “personalized”/“disease-specific” protein corona. Here, by combining the concepts of “disease-specific” protein corona and sensor array technology, we have created a platform for the detection and identification of diseases (five distinct human cancers were used as a model disease) ex vivo. The protein corona sensor array platform provides a library of corona compositions containing disease signatures. By analyzing the corona compositions of different nanoparticles, using supervised classifiers, we created a unique protein corona pattern which was the “fingerprint” of each type of cancer. Our results revealed that although no single protein corona composition from a single nanoparticle provides this “fingerprint” feature, we found that the pattern of corona composition derived from the nanoparticle sensor array provides a unique “fingerprint” for each type of cancer. To probe the capacity of this platform for very early detection of cancers, we used cohort plasma obtained from healthy people who were later diagnosed with lung, pancreas, and brain cancers several years after plasma collection and the outcomes revealed that the approach could identify and discriminate the cancers. We expect that the protein corona sensor array may also prove useful for the diagnosis of other devastating diseases. |
The possibility to measure panels of specific and selective biomarker proteins has the potential to revolutionize cancer screening, detection and monitoring.16 Among emerging tools, transition metal complexes have recently found use as luminescent probes for the detection of protein biomarkers.17–19 With respect to organic dyes, their long-standing phosphorescence allows them to be distinguished from an auto-fluorescent background that is common in biological milieu. Moreover, as phosphorescence of metal complexes changes with local environment, they can act as chemosensors for a variety of analytes. Other promising approaches for cancer detection and staging are photoacoustic imaging20 and plasmonic biosensing.21
The use of sensor arrays has proven very sensitive, specific, robust, and versatile for the detection of a wide range of chemical and biological compounds, where specificity is derived from the pattern of response among an array of cross-reactive sensors rather than from individual sensors for specific (bio)molecules.22 The sensor array strategy has been used to successfully detect and differentiate among diverse families of analytes,23 various foods and beverages,24 pathogenic bacteria and fungi,25,26 biomolecules,27 and even nanoparticles.28
Here, we combined nanoparticle sensor-array technology, which offers the advantage of improved accuracy while not being limited to known disease biomarkers with protein corona and developed a label-free protein corona sensor array for early detection of diseases (here five different types of cancers were selected as a disease model). The sensor array is composed of three different cross-reactive liposomes with various lipid compositions: (i) anionic liposomes made of DOPG (1,2-dioleoyl-sn-glycero-3-phospho-(1′-rac-glycerol)); (ii) cationic liposomes made of a binary mixture of DOTAP (1,2-dioleoyl-3-trimethylammonium-propane) and DOPE (dioleoylphosphatidylethanolamine); (iii) zwitterionic liposomes made of DOPC (dioleoylphosphatidylcholine) and cholesterol. Protein corona profiles were characterized by nano liquid chromatography tandem mass spectrometry (nano-LC MS/MS) after exposure to the plasma of patients diagnosed with five cancers: lung cancer, glioblastoma, meningioma, myeloma and pancreatic cancer. Although no single protein corona composition is specific for any one cancer type, we demonstrate that changes in the corona composition pattern could provide a unique “fingerprint” for each type of cancer. Finally, the nanoparticle sensor-array technology was validated using cohort plasma obtained from healthy people who were subsequently diagnosed with cancer several years after plasma collection.
![]() | ||
| Fig. 1 Protein corona sensor array profiles. (A) TEM images of liposomes with size distribution profiles. (B) Physicochemical properties of different liposomes before and after interactions with human plasma from patients with different diseases. DLS and zeta-potential data on various liposomes before interactions with human plasma and corona complexes (free from excess plasma) obtained following incubation with plasma from healthy subjects and cancer patients (Pdi: polydispersity index from cumulative fitting). (C) Classification of the identified corona proteins from sensor array elements according to their physiological functions in human plasma of healthy individuals and of patients having different types of cancers. (Complement proteins on the surface of cationic liposomes are shown here as an example; other protein categories, including coagulation, tissue leakage, lipoproteins, acute phase, immunoglobulins, and other plasma proteins, are shown in the ESI† Fig. S1A–G). | ||
Quantitative evaluation of the total protein adsorbed onto the nanoparticles was performed via the BCA (bicinchoninic acid) or NanoOrange assays, and the results confirmed significant differences in the amounts of adsorbed proteins after incubation in plasma derived from patients with various types of cancers (Fig. 1B). The quantitative evaluation of the total protein adsorbed on the surface of liposomes showed dependency of the protein amount on the cancer type (Fig. 1B). The protein corona composition at the surface of three liposomes was evaluated via liquid chromatography-tandem mass spectrometry (LC-MS/MS) in which the abundance of ∼1800 known proteins was defined (the full raw and analyzed data are provided in Excel files (1–3) in the ESI†). The contribution of individual proteins and their categories (i.e., complement, coagulation, tissue leakage, lipoproteins, acute phase, immunoglobulins, and other plasma proteins) to the corona composition was defined (Fig. 1C and ESI,† Fig. S1A–G). This result demonstrated significant associations between the protein composition and not only the cancer type but also the type of sensor elements (i.e., type of liposome nanoparticles).
According to an extensive body of literature, there are strong relationships between cancer development and variations in protein classes: complement,30–33 coagulation,34–37 tissue leakage,38,39 lipoproteins,40–44 acute phase,45,46 and immunoglobulins.47–50 Therefore, the cross-reactive interactions of these protein categories with nanoparticles may provide unique “fingerprints” for each type of cancer, which would facilitate cancer identification and discrimination. Consequently, one would expect the protein corona sensor array to cross-reactively adsorb a wide range of proteins involved in cancer induction and development that could be used for cancer identification and discrimination. Aside from disease specific proteins, we have recently revealed that the variation of disease related metabolomes in protein solution (e.g., plasma) can substantially change the interaction site of proteins with nanoparticles and can therefore affect protein corona composition.51,52 As cancer development has a capacity to substantially alter the metabolomic composition of plasma,53–58 the cancer extracted plasma can substantially change the protein–nanoparticle interaction sites and therefore alter the protein corona composition.
PLS-DA and the counter-propagation artificial neural network (CPANN) were then applied to the whole samples and selected variables as linear and nonlinear supervised classification approaches, respectively. In agreement with the linear PLS-DA results, the CPANN was also successful in precisely discriminating the six cancers using the selected 69 variables (Fig. 3C and D).
![]() | ||
| Fig. 3 Identification and discrimination of cancers using protein corona sensor arrays. (A and B) PLS-DA plots showing the separation of different cancerous samples from each other and from controls (n = 30 samples). (A) PLS score-plot obtained using the PLS-toolbox, projecting the objects into the subspace created by the 1st, 2nd, and 3rd latent variables of the model. (B) Objects displayed where the 4th and 5th latent variables of the model are shown. As can be seen, meningioma and glioblastoma cases were not separated in three dimensions appropriately, but they can be separated in the fourth and fifth dimensions of the PLS model. (C and D) Assignation map obtained by using the CPANN with all variables and selected variables. (C) Assignation map obtained by the training of a CPANN network (8 × 8 neurons) using the whole data set (1823 variables). The mapping quality is not good and there are conflicts of different types of cancer in terms of mapping. (D) Assignation map attained by the training of a CPANN network (8 × 8 neurons) using 69 variables. High-dimensional input vectors (samples) are mapped on a two-dimensional network of neurons, preserving similarity and topology. Colors indicate the similarity of a neuron to a specific type of input vector (class type). This map also demonstrates the importance of the predictor selection step and the effect of deletion of non-informative and irrelevant predictors on the model quality. (E and F) 51 proteins identified as capable of distinguishing among the six groups are presented in a ‘Heat Map’ generated using an unsupervised cluster algorithm (agglomerative HCA with furthest neighbor linkage). Visual inspection of both the dendrogram (E) and the heat map (F), based on the raw data of 69 important markers, demonstrates cancer-specific protein corona signature and clear clustering of six groups of samples (five groups of cancerous samples plus normal samples) and also expected similarities among five patients from each group. The heat map also indicates substantial differences in the patterns of variables (markers) of different cancers (each column represents a patient, and each row represents a protein). Higher and lower protein levels are indicated in red and green, respectively; the ID of 69 proteins in the heat map (right y-axis) variables, from top to bottom, are: 7, 1, 68, 8, 47, 36, 55, 37, 60, 48, 43, 50, 28, 51, 38, 3, 42, 58, 63, 46, 53, 31, 54, 17, 14, 44, 24, 21, 39, 40, 52, 5, 27, 11, 69, 65, 56, 57, 32, 16, 15, 13, 10, 26, 22, 62, 49, 6, 2, 41, 12, 45, 67, 59, 29, 4, 19, 64, 20, 33, 66, 61, 30, 23, 18, 35, 34, 25, and 9 (the protein names are provided in Table 1). | ||
Next, to further verify and analyze the data, we decided to take advantage of a nonlinear classification and mapping method. Visualizing the feature space can help us understand the hidden structures and topological relationships among the patterns. To reduce the dimensionality of the feature space while preserving the topological relations of the data structure, the CPANN (a supervised a variant of self-organizing maps, SOMs) was used to learn and predict the class membership of the patterns, simultaneously producing a two-dimensional map of “neurons” (the processing units which compete and cooperate to learn the pattern information) and provide valuable information (using a nonlinear approach) about the data structure. Details of the CPANN are provided in the Methods section. Different sizes for the CPANN map were compared using 10-fold cross-validation; a map including 64 (8 × 8) neurons was chosen due to the minimum classification error (ESI,† Fig. S2C). Moreover, the topological structure of data in the high-dimensional space is reflected in the assignation map produced by the CPANN (Fig. 3C). Considering the similarity of the neurons to the input vectors, the map can be partitioned into six distinct zones related to different type of cancers and control samples. Samples with the same class label are mapped onto nearby or the same neurons, which means that the selected variables provide valuable information for discriminating the samples in the feature space. The relative position and orientation of six zones on the map contribute qualitative information on the similarities between types of cancers. To represent the effect of variable selection on the quality of mapping, another CPANN was trained using all 1823 variables, and the resulting map shows that the selected biomarkers (variables) play an important role in discriminating among cancer types and classifying them properly (Fig. 3C and D).
On the basis of the obtained results, both linear and nonlinear models showed high accuracy, deduced from their acceptable specificity, sensitivity, and classification error values. Consistent with these findings, unsupervised clustering (HCA) based on the raw data of 69 markers was able to strongly distinguish various types of cancerous and control samples (Fig. 3E and F). As can be seen in Fig. 3, there is close similarity between the glioblastoma and meningioma groups of samples, implying difficulty in discrimination, most probably related to similar plasma proteomics patterns in these two brain cancers. These results reflect the fact that the plasma concentrations of many proteins in the corona differ considerably, not only among subjects with different types of cancers, but also among healthy individuals.
To illustrate the sensor array's capability for pattern recognition, a set of analyses was performed on the data matrix (all variables) obtained from individual nanoparticles. Importantly, the pattern of cancer-specific fingerprints could not be extracted solely from each class of liposome nanoparticle's PCF (ESI,† Fig. S4). As shown in ESI,† Fig. S4 (ESI†), no one class of liposomes could discriminate all 6 groups of samples as well as the composite response of the full array. The classification error using data obtained individually from anionic, cationic and neutral liposomes is 54%, 24% and 10%, respectively, whereas the combined pattern gave a classification error of only 3%. This substantial reduction in the classification error of the combined pattern is due to the power of the sensory part of the protein corona which provides more proteomics data (even for one specific proteins) for the classifier. Using the nano-sensor array with liposomes that have different chemistries (cationic, anionic, and neutral) combined with pattern-recognition techniques correctly discriminates not only cancerous from control samples, but also each type of cancer under consideration from the others. Notably, 62 proteins out of 69 important variables are unique, because some of the selected proteins are presented in the protein corona profiles of more than one liposome, confirming the key role of those same protein variations [e.g., FCN3 (Ficolin 3)] in different sensor elements. Another specific feature that is presented by using sensor array technology can substantially increase the data dimension of the proteomics outcomes compared to the human plasma proteins. In other words, each protein provides one concentration in human plasma while that specific protein may provide several different concentrations for protein corona profiles of various nanoparticles.
After the training of the CPANN, the importance and relevancy of the variables with the produced map can be investigated. A correlation analysis was also performed between the assignation map of the CPANN and 69 weight layers (weight maps) (Fig. 4). Therefore, six correlation coefficients (CCs) can be obtained for each biomarker and these values can show the relevance of that biomarker with the control and cancer classes. The value of a correlation coefficient ranges between −1 and 1 for negative and positive linear correlations respectively. The CC values near to 1 or −1 represent strong correlation and relevancy and a CC value near zero means that there is a weak or non significant correlation between the marker and cancer type. Considering the CC values (ESI,† Table S3), several biomarkers, such as FCN3, CO4A, CO4B, CO7, and C4BPA, can easily be distinguished according to the strong correlation between pancreatic cancer zones on the assignation map and also reported as pancreatic cancer biomarkers in the literature.60–62 Moreover, for lung cancer APOH, CO6, CO8A, CO8G, KNG1, and VTNC have significant correlation with the CPANN assignation map as specific biomarkers.63
The high specificity of the selected markers for discriminating among the five groups of cancers, which derives from our protein corona sensor array approach, demonstrates an acceptable level of correlation with the work now under way in the complex cancer proteomics space; therefore, this strategy not only provides a basis for cancer prediction but also translates that promise into reality. It is noteworthy that the discrimination between different cancer groups occurs as a result of the pattern of response of several predictors (and not individual biomarkers) that change simultaneously in a systematic manner, forming patterns unique to each specific type of cancer. On the basis of this evidence, the most informative predictors selected by the proposed model that have not already been reported as cancer-specific biomarkers may have great potential as new diagnosis biomarker candidates. It is noteworthy that the protein corona layer provides different protein concentration compared to the plasma proteins. This means that increasing concentration of cancer specific biomarkers in plasma may not lead to higher participation of that specific protein in the corona composition. However, variation of these cancer specific proteins together with other metabolomic variations may substantially change the interactions of other proteins with the surface of nanoparticles which results in the formation of disease-specific protein corona. To define the role of corona specific proteins in cancer development, the variation and functionality of these promising candidates together with their associated metabolomic pathways in cancer patients should be carefully monitored. By focusing on the unique patterns derived from huge numbers of subjects via a set of informative predictors, researchers should be able to predict cancers at different stages more accurately which is not possible using current methods.
To allow for unbiased classification and prediction of cohort samples, we used two approaches: first, the discriminatory power of the 69 important variables was checked for the cohort samples. Because 15 variables (proteins) out of the 69 markers were absent from the proteomics profile of our protein corona sensor array of cohort samples, classification was performed based on the 54 existing markers and the amount of 15 absent variables in the cohort data matrix was considered zero. Despite such defects and missing markers in the cohort data matrix, both linear and nonlinear models provided proper separation for three groups of cohort samples with reasonable statistics (38% classification error in 10-fold cross validation) (Fig. 5A and C). Second, the cohort samples were classified separately, i.e., not compared with the library of the protein corona sensor array for previous fresh samples. In this regard, the informative markers were selected based on the cohort protein corona profiles in a similar manner as mentioned earlier, and then linear and nonlinear classification approaches were evaluated. Interestingly, the cohort samples could be discriminated by employing both linear and nonlinear classification models using only 8 markers with excellent statistics (the classification error minimized to zero using 8 variables). All detailed results are provided in the ESI,† Table S2, and Fig. 5B and D. As shown in Fig. 5, the cohort samples were significantly discriminated in the score plot of both PLS-DA and the CPANN map.
In summary, we have developed a disease-specific protein corona sensor array platform for disease detection using plasma samples. Our sensor array differs from other known sensor arrays that involve individual sensors that detect specific biomolecules. In the present sensor array, the biomolecules do not have to be known, as the system does not rely on the presence or absence of specific biomolecules or amounts of specific disease (here, cancer) markers. This new sensor array detects changes in the composition of the biomolecule coronas associated with different liposome nanoparticle sensor elements. This ability to detect changes in the patterns of the biomolecule corona composition associated with each sensor element allows one to determine a unique biomolecule fingerprint that can differentiate the health or disease states of subjects with high accuracy. As we demonstrated very recently,51,66 variation of other plasma biomolecules (e.g., metabolomes) can substantially change the protein corona composition. This shows that the patterns presented by disease-specific protein coronas should not solely be composed of disease biomarkers, as other disease-specific features (e.g., metabolome variations) can substantially affect the composition of corona around nanoparticles. Using partial least squares discriminant analysis, we were able to discriminate among five cancers and healthy patients with >99% accuracy (n = 90). Results of the cohort samples revealed that the biomolecular fingerprint can even determine a pre-disease state in a subject who will develop one of three cancers at a later time, with an accuracy of >99% (n = 45). This is a significantly different approach to diagnosis compared to systems that detect specific biomarkers associated with a disease or disorder. The present sensor is able to detect a disease early in its development; in other words, it can pre-diagnose the disease before any specific symptoms appear. It is likely that the sensitivity of the protein corona sensor array can further be increased by the addition of more sensor elements (more nanoparticles). It is also obvious that the number and score of the introduced protein patterns for cancer detection in this feasibility study will be changed (and would be more robust) by increasing the numbers/types of patients and/or sensor array elements. It is also noteworthy that this system needs a huge number of patient plasmas in order to end up with ∼0% false negative results as any false disease prediction may cause huge anxiety and unnecessary medical procedures for patients. Beside cancers, the protein corona sensor array may also prove useful for the diagnosis of a wide range of other devastating diseases, where very early detection can significantly improve patients’ survival and quality of life.
:
1 molar ratio), and DOPC–Chol (1
:
1 molar ratio) by dissolving appropriate amounts of the lipids 9
:
1 (v/v) in chloroform:methanol. The chloroform:methanol mixture was evaporated via rotary-evaporation. Lipid films were kept under vacuum overnight and hydrated with 10 mmol l−1 phosphate saline buffer (PBS) (pH 7.4) to a final lipid concentration of 1 mg ml−1. The liposome suspensions obtained were sized by extrusion using a 50 nm polycarbonate carbonate filter by employing an Avanti Mini-Extruder (Avanti Polar Lipids, Alabaster, AL).
000 healthy subjects, over 1000 of whom went on to develop various types of cancers in subsequent years. Samples from five individuals per cancer were used in this study.67 These important plasma samples provide us the unique opportunity to probe the capacity of our innovative protein corona sensor array for early detection of cancers.
:
1 v/v) for 1 hour at 37 °C. Subsequently samples were centrifuged at 14
000 rpm for 15 minutes at 4 °C to pellet liposome–HP complexes. The resulting pellet was washed three times with phosphate-buffered saline (PBS) and resuspended in ultrapure water. For size and zeta-potential measurements, 10 μl of each sample was diluted with 990 μl of distilled water. All size and zeta-potential measurements were performed at RT using a Zetasizer Nano ZS90 system (Malvern, UK) equipped with a 5 mW HeNe laser (wavelength λ = 632.8 nm) and a digital logarithmic correlator. The particle diffusion coefficient D distribution is derived from a deconvolution of the measured intensity autocorrelation function of the sample. D is converted into an effective hydro-dynamic radius RH by using the Stokes–Einstein equation (RH = kBT/6πηD), where kBT is the thermal energy and η is the solvent viscosity. Electrophoretic mobility of the samples, u, was measured via laser Doppler electrophoresis. Zeta-potential was calculated by using the Smoluchowski relation (zeta potential = uη/ε) where η and ε are the viscosity and the permittivity of the solvent phase, respectively. Size and zeta-potential of liposome–HP complexes are given as mean ± standard deviation (S.D.) of five independent measurements.
:
1 v/v) for 1 hour at 37 °C. Afterwards, liposome–HP complexes were pelleted at 15
000 × g for 15 minutes at 4 °C and washed three times with PBS. The washed pellet was resuspended in urea 8 mol l−1, NH4CO3 50 mmol l−1. 10 microliters of each sample were added to five wells of a 96-well plate. Protein quantification was performed by adding 150 microliters per well of protein assay reagents (Pierce, Thermo Scientific, Waltham, MA, USA). The multiwell was shaken and incubated at room temperature for 5 minutes. Absorbance was measured using the GloMax Discover System (Promega, Madison, WI, USA) at 660 nm. Background effects were properly corrected, and the protein concentration was calculated using the standard curve. Results are given as mean ± S.D. of five independent replicates.
:
1 v/v) for 1 hour at 37 °C. Samples were centrifuged at 14
000 × g for 15 min to pellet liposome–HP complexes. It is noteworthy that while bare liposomes cannot be collected via centrifugation at 14
000 × g, the formation of protein corona at the surface of liposomes changes their physicochemical properties which can be collected at this centrifugation rate.8,71–83 The pellet was washed three times with 10 mmol l−1 Tris HCl (pH 7.4), 150 mmol l−1 NaCl, and 1 mmol l−1 EDTA. After washing, the pellet was air dried and resuspended in the digestion buffer. Digestion and peptide desalting were carried out as previously described.84 In brief, the pellet was resuspended in 40 μl of 8 mol L−1 urea, and 50 mmol l−1 NH4HCO3 (pH = 7.8). Afterwards, the protein solution was reduced with 2 μl of 200 mmol l−1 DTT, alkylated with 8 μl of 200 mmol l−1 IAA and newly added with 8 μl of 200 mmol l−1 DTT. Finally, the sample solution was diluted with 50 mmol l−1 NH4HCO3 to obtain a final urea concentration of 1 mol L−1 and digested overnight with 2 μg of trypsin at 37 °C. The enzymatic reaction was stopped by adding TFA. The digested peptides were desalted using the SPE C18 column, reconstituted with a suitable volume of a 0.1% formic acid solution, and stored at −80 °C until analysis. Digested peptides were stored at −80 °C in labeled Protein LoBind tubes for no more than one week. Digested peptides were analyzed by nano-high-performance liquid chromatography (HPLC) coupled to tandem mass spectrometry (MS/MS). NanoHPLC MS/MS analysis was carried out using a Dionex Ultimate 3000 system (Dionex Corporation Sunnyvale, CA, USA) directly connected to a hybrid linear ion trap-Orbitrap mass spectrometer (Orbitrap LTQ-XL, Thermo Scientific, Bremen, Germany) using a nanoelectrospray ion source. Peptide mixtures were enriched on a 300 μm ID × 5 mm Acclaim PepMap 100 C18 precolumn (Dionex Corporation Sunnyvale, CA, USA), employing a premixed mobile phase made up of ddH2O/ACN, 98/2 (v/v) containing 0.1% (v/v) HCOOH, at a flow-rate of 10 μl min−1. Peptide mixtures were then separated via reversed-phase (RP) chromatography. The largest set of peptides was detected using a 3 hour optimized LC gradient composed of mobile phase A of ddH2O/HCOOH (99.9/0.1, v/v) and mobile phase B of ACN/HCOOH (99.9/0.1, v/v). MS spectra of eluting peptides were collected over an m/z range of 350–1700 using a resolution setting of 60
000 (full width at half-maximum at m/z 400), operating in the data-dependent mode. MS/MS spectra were collected for the five most abundant ions in each MS scan. Further details can be found elsewhere.84 For each experimental condition, three independent samples (biological replicates) were prepared, each of which was measured in triplicate (technical replicates), yielding nine measurements for each experimental condition. RAW data files were submitted to Mascot (v1.3, Matrix Science, London, UK) using the Thermo-Finnigan LCQ/DECA RAW file data import filter to perform database searches against the non-redundant Swiss-Prot database (09-2014, 546
000 sequences, Homo Sapiens taxonomy restriction). For the database search, trypsin was specified as the proteolytic enzyme with a maximum of two missed cleavages. Carbamidomethylation was set as the fixed modification of cysteine, whereas oxidation of methionine was chosen as the variable modification. The monoisotopic mass tolerance for precursor ions and fragmentation ions was set to 10 ppm and 0.8 Da, respectively. Charge state of 2+ or 3+ was selected as precursor ions. Proteome output files were submitted to the commercial software Scaffold (v3.6, Proteome Software, Portland, Oregon, USA). Peptide identifications were validated if they surpassed a 95% probability threshold set by the PeptideproPhet algorithm. Protein identifications were accepted if they could be established at >99.0% probability and contained at least two unique peptides. Proteins that contained shared peptides and could not be differentiated on the basis of MS/MS analysis alone were grouped to satisfy the principles of parsimony. Unweighted spectrum counts (USCs) were used to assess the consistency of biological replicates in quantitative analysis, and normalized spectrum counts (NSCs) were used to retrieve protein abundance.
An approach based on the VIP score was developed to identify the best subset of variables. VIP scores can be calculated by performing PLS-DA on the dataset. In that approach, VIP scores of variables are calculated 50 times, each time using a random permutation of training and validation sets (random training sets were selected iteratively by considering 80-percent coverage of each class of objects). Considering the most important variables, the large VIP-score values (>2), the top 200 variables can be selected at each repetition and added to the top-variables pool. Afterward, a frequency of occurrence (Freqi) and an average VIP-score
for each variable can be obtained according to the top-variable pool. Thus, the selection of variable i (the high
value and the low
value) is less recommended than variable j (high values for both VIPj and
) because the selection of variable i is more dependent on the training and validation sets than variable j. Therefore, the
value of each variable can be weighted by Freqi, and the most relevant variables can be ranked using weighted
. Fig. 4A is a schematic diagram of the proposed approach. Selection of the most relevant variables to build the classification model can be guided by the obtained ranking as follows: the highly ranked variables were added one by one to the dataset, and the classification error of PLS-DA was calculated to find the minimum number of relevant predictors (Fig. 1A).
Footnotes |
| † Electronic supplementary information (ESI) available. See DOI: 10.1039/c9nh00097f |
| ‡ These authors contributed equally to this work. |
| This journal is © The Royal Society of Chemistry 2019 |