Drishya Rajan
Parachalil
*ab,
Brenda
Brankin
c,
Jennifer
McIntyre
a and
Hugh J.
Byrne
a
aFOCAS Research Institute, Dublin Institute of Technology, Kevin Street, Dublin 8, Ireland. E-mail: drishyarajan.parachalil@mydit.ie
bSchool of Physics and Optometric & Clinical Sciences, Kevin Street, Dublin 8, Ireland
cSchool of Biological Sciences, Dublin Institute of Technology, Kevin Street, Dublin 8, Ireland
First published on 8th October 2018
This study explores the potential of Raman spectroscopy, coupled with multivariate regression techniques and a protein separation technique (ion exchange chromatography), to quantitatively monitor diagnostically relevant changes in high molecular weight proteins in liquid plasma. Measurement protocols to detect the imbalances in plasma proteins as an indicator of various diseases using Raman spectroscopy are optimised, such that strategic clinical applications for early stage disease diagnostics can be evaluated. In a simulated plasma protein mixture, concentrations of two proteins of identified diagnostic potential (albumin and fibrinogen) were systematically varied within physiologically relevant ranges. Scattering from the poorly soluble fibrinogen fraction is identified as a significant impediment to the accuracy of measurement of mixed proteins in solution, although careful consideration of pre-processing methods allows construction of an accurate multivariate regression prediction model for detecting subtle changes in the protein concentration. Furthermore, ion exchange chromatography is utilised to separate fibrinogen from the rest of the proteins and mild sonication is used to improve the dispersion and therefore quality of the prediction. The proposed approach can be expeditiously employed for early detection of pathological disorders associated with high or low plasma/serum proteins.
In this paper, the sensitivity of Raman spectroscopy to detect subtle changes in a simulated plasma protein-mixture concentration is explored, specifically for the higher molecular weight proteins. Albumin is the most abundant plasma protein, normally constituting about 50% of the plasma protein and has a molecular weight of 66 kDa.14 The normal concentration of albumin in the human body is 30 mg mL−1, although it dramatically decreases in critically ill patients and does not increase again until the recovery phase of the illness.15 Several studies have demonstrated that the functions of albumin, such as ligand binding and transport of various molecules, can be applied to the treatment of cirrhotic patients and patients suffering from other end stage liver diseases.16–18 It is clear that closely monitoring the variation in albumin concentration could act as an indicator of liver diseases and other related pathologies. Fibrinogen is a 340 kDa (0.4% in human plasma) dimeric plasma glycoprotein synthesised by the liver and plays a major role in blood coagulation.19 The normal concentration of fibrinogen in human body is ∼3 mg mL−1, and any variation in this concentration can be an indicator of disease states.20–22 Many clinical studies have consistently shown elevated levels of fibrinogen in patients with cardiovascular disease and thrombosis.23–25
The conventional test kits available in a hospital for plasma/serum analysis suffer from long time delays for the availability of results due to the need of specialised laboratories, which may in turn delay the therapy, and prolong patient anxiety. The potential of vibrational spectroscopy techniques coupled with multivariate analysis techniques have been previously investigated for a range of clinical applications.1–9,26–29 This paper evaluates the potential of Raman spectroscopy as a diagnostic tool to detect minute changes in the plasma protein concentrations in aqueous samples and explores the challenges to such liquid based biopsy techniques, including sample scattering and fractionation of individual constituent components.
A simulated plasma protein mixture of high and low molecular weight proteins, i.e. albumin, fibrinogen, cytochrome c and vitamin B12, at physiologically relevant concentrations, was prepared and variations were made to these concentrations over physiologically relevant ranges. Separation of proteins in the solution was performed by ion exchange chromatography to separate high molecular weight proteins from low molecular weight proteins, and high molecular weight fraction proteins from each other. The efficiency of data pre-processing methods (rubberband and Extended Multiplicative signal Correction (EMSC)) in removing the background, to build an accurate prediction model, was explored and mild sonication was used to improve the dispersion of fibrinogen. The standardisation of measurement protocol and other experimental parameters is detailed and the results of concentration dependence study of proteins, in isolation and protein mixtures, and the chemometric methods used to build the prediction model are presented. This study presents a systematic assessment of some of the challenges presented by measurements of high molecular weight protein mixtures, and some potential solutions to improve the protocols of liquid biopsy monitoring using Raman spectroscopy.
000g for 5 minutes. The unbound material was present in the supernatant and was transferred to a fresh tube. The pellet was washed using 2 mL glycine buffer by repeated inversion, followed by centrifugation at 14
000g for 5 minutes. The supernatant that contains the fibrinogen was carefully transferred to a fresh centrifuge tube and Raman analysis was performed.
Fig. 1 presents the spectra of the fingerprint region of the stock solutions of proteins recorded in the inverted geometry. The raw spectra of the proteins were baseline corrected using the rubberband method and smoothed using the Savitzky–Golay algorithm (polynomial 5, window 13). Measurement in the inverted geometry, using a water immersion objective, is found to be the best instrumental set up that enables an increase in the overall spectral intensity accompanied by an improved signal to noise (S/N) ratio with small sample volume.
The spectra of albumin and fibrinogen shown in Fig. 1 clearly reveal the common Raman peaks of these two proteins. These include the amide I band around ∼1659 cm−1, a relatively sharp band at 1003 cm−1 associated with phenylalanine, intense bands at ∼1336 cm−1 and ∼1450 cm−1 due to C–H deformation, and a vibration band at ∼940 cm−1 related to C–C stretching mode backbone of α-helix structure. The signature peaks of albumin that differentiate it from fibrinogen are bands at 899 cm−1 and 1102 cm−1, that can be related to ν(CC) and ν(CN).34 The signature peaks of fibrinogen are sharp bands observed at 758 cm−1 and 1552 cm−1 that can be assigned to tryptophan.35 Raman bands of cytochrome c and vitamin B12 are highly specific and can be easily distinguished, as evidenced in Fig. 1.36,37
Based on the percent variance explained by the latent variables and the minimum value of RMSECV, the optimum number of latent variables to reach the best model is determined. The PLSR coefficient plot displayed in Fig. 2C, confirms the correlation of the data in Fig. 2D is based on albumin features, such as the peaks at ∼1665 cm−1, ∼1448 cm−1 and ∼1337 cm−1. Finally, after selecting the optimum number of components for the data set analysed, a predictive model is built from the PLSR analysis (Fig. 2D), to compare the observations to the known concentrations of albumin in the samples with the estimated concentrations from the spectral data sets. Fig. 2D indicates that a good linear model could be obtained with the raw data set. However, the PLSR coefficient is not a clean albumin spectrum and has a large background due to scattering, indicating that scattering could have influenced the model. Furthermore, the minimum value of RMSECV was found to be 22.59 mg mL−1, indicating a poor accuracy of prediction over the range 5 mg mL−1 to 50 mg mL−1. Analysis of the raw albumin concentration dependence serves as an initial illustration of some of the issues presented by measurement of high molecular weight macromolecules in solution. Appropriate pre-processing steps could help to minimise the background from scattering effects. Hence, rubberband pre-processing steps were performed on the data set before PLSR analysis and the model obtained is displayed in Fig. 3.
Fig. 3A shows the albumin data set after background correction using the rubberband method. Fig. 3B shows the percent variance explained by the latent variables, indicating that three components accounted for the majority of the variance. Five latent variables were chosen for this model and the resultant PLSR coefficient exhibits strong albumin features, as shown in Fig. 3C. A linear predictive model can be defined from the rubberband corrected data set of varying concentration of albumin in water Fig. 3D. The RMSECV was found to be 1.58 mg mL−1 after applying the rubberband pre-processing steps for the same data set. The results suggest that there is a significant improvement in the predictive capacity of the constructed model when rubberband pre-processing steps are applied to the data set.
Simulated “pathological” plasma protein mixtures were prepared by varying the concentration of albumin in order to achieve the physiologically relevant range from 5 mg mL−1 to 50 mg mL−1 and by maintaining the concentrations of fibrinogen, cytochrome c and vitamin B12 constant at the concentrations of the “healthy” human plasma. The concentrations for hypoalbuminemia (>30 mg mL−1) and hyperalbuminemia (<30 mg mL−1) have been deliberately included in the set of samples being prepared. Based on the results of Fig. 2, rubberband correction was applied to the dataset in an attempt to improve the accuracy of the prediction by performing baseline correction. Notably, the Raman spectral features of the protein mixture were seen to decrease with increasing albumin concentration (Fig. S1A in ESI†), and the PLSR coefficient obtained from this data shows inverse albumin features (Fig. S1C†), indicating that the model built from this dataset is not reliable, as the high degree of scattering is effecting the dataset and the prediction model is not based on the albumin features. Hence, the EMSC based algorithm was applied to the data set in an attempt to eliminate the scattering associated with the albumin data in the simulated plasma and subsequently improve the prediction model. EMSC of polynomial order 4 was performed on the data set of varying concentration of albumin in simulated plasma protein mixture. The reference used for EMSC is a spectrum of albumin which has been diluted with a minimum amount of water, recorded with 532 nm.
Fig. 4A displays the albumin spectra after performing background correction using the EMSC algorithm. The amide 1 band at 1665 cm−1 and CH2 deformation band at 1445 cm−1 can be clearly seen in the corrected spectra. Based on the percentage variance explained by the latent variables (Fig. 4B) and the minimum value of RMSECV, seven latent variables were found to be optimal for this model. The PLSR coefficient shows albumin features (Fig. 4C), indicating that the prediction is now based on the variation in the albumin peak intensity. A linear prediction model was achieved from this model (Fig. 4D). The minimum value of RMSECV is 1.5844 mg mL−1, indicating an improved prediction capacity. This value is the same as the minimum value of RMSECV recorded for the varying concentration of albumin in distilled water, indicating that the PLSR model of EMSC corrected simulated plasma spectra is as accurate as the PLSR model of rubberband corrected spectra of varying concentrations of pure albumin in water. The results demonstrated in this section suggest that this model can be effectively used to detect variations in the concentration of albumin in human plasma, as a result, for example, of liver disorders at an early stage. A strong reduction in the RMSECV indicates that the EMSC algorithm can efficiently subtract the background without altering the albumin features, which in turn improves the prediction of the model.
In an attempt to overcome the lack of solubility of the protein, the stock solution was ultrasonicated to enhance the dispersion of fibrinogen and obtain a clear solution. Ultrasonication for approximately 10 seconds at 30% amplitude resulted in a clear solution of fibrinogen with a significantly improved Raman signal (Fig. S4 in ESI†). Varying concentrations of fibrinogen samples in the physiologically relevant range were prepared using the ultrasonicated fibrinogen stock.
The spectrum of sonicated fibrinogen after background correction using the EMSC algorithm with polynomial of order 3 displays strong fibrinogen features with higher intensity over the same concentration range, compared to the non-sonicated fibrinogen samples (Fig. 5A). Applying PLSR, it is clear from Fig. 5B that a total of six components made significant contributions to explain the variance in the sonicated fibrinogen spectra. Based on the percent variance explained, six latent variables were used to build the prediction model. The PLSR coefficient plot shows signature peaks of fibrinogen, indicating that the prediction was based on variation in the fibrinogen spectral intensities (Fig. 5C). A linear prediction model was defined from the data set, showing correlation between the Raman peak intensity and concentration (Fig. 5D). The minimum value of RMSECV is found to be 0.0615 mg mL−1. The reduction in the RMSECV value recorded for fibrinogen data after sonication indicates that the accuracy of the model increases as a result of the improved solubility following sonication. Hence, it can be concluded that sonication improves the solubility of the fibrinogen and increases the spectral intensity, in turn leading to a considerable improvement in the predictive capacity of the model.
Simulated “pathological” plasma protein-mixture was prepared by varying the concentration of fibrinogen stock in order to achieve the physiologically relevant range from 0.5 mg mL−1 to 5 mg mL−1 and by maintaining the concentrations of albumin, cytochrome c and vitamin B12 constant at the normal concentrations in healthy human plasma. The concentrations for heart disorders (<3 mg mL−1) and liver disorders (<3 mg mL−1) have been deliberately included in the concentration range. The raw spectra of varying concentrations of fibrinogen in simulated plasma were smoothed by Savitzky–Golay, polynomial of 5, window 13 (Fig. 6).
![]() | ||
| Fig. 6 Smoothed spectra of varying concentration of fibrinogen in simulated plasma (0.5 mg mL−1 to 5 mg mL−1). The arrow indicates the order of increasing concentration. | ||
The arrow indicates that both the background and spectral features themselves decrease with increasing concentration of fibrinogen. However, noting that albumin is the dominant contributor to the Raman signal, and that fibrinogen is the dominant scatterer, this can be understood as a (fibrinogen) concentration dependent loss of (albumin) Raman scattering.
The PLSR coefficient obtained after pre-processing the data using the EMSC based algorithm shows an inverse spectrum of albumin rather than fibrinogen, as shown in Fig. S5 in ESI.† As in the case of the water dispersions, the dominant effect of increasing concentrations of the poorly soluble fibrinogen is the scattering of the dominant Raman spectrum. Hence, although the predictive model built from this dataset shows a good correlation with fibrinogen concentration, it is not based on the characteristic spectroscopic signature of fibrinogen, and the variation of the albumin signal could equally be due to any other scatterer.
Ultracentrifugation using 100 kDa centrifugal filters failed to separate fibrinogen from the rest of the protein in the protein mixture. Fig. S6† shows that the Raman spectrum of the concentrate obtained has pronounced characteristic albumin features at 899 cm−1 and 1102 cm−1. Ion exchange chromatography was therefore explored as an alternative method for fibrinogen separation from the protein mixture, based on its charge. Carboxymethyl-cellulose acts as a weak cationic exchanger and fibrinogen is eluted out by altering the net charge of the bound protein, and thus its matrix binding capacity. Fibrinogen was detected in the unbound fraction. Albumin was not detected in the unbound fraction by Raman spectroscopy and it is concluded adsorption of the albumin fraction to the carboxymethyl cellulose resin occurred at the pH values employed. Other studies have shown carboxymethyl cellulose may form insoluble complexes with serum albumin.40
Fibrinogen was extracted from the protein mixtures over the full concentration range, and Raman spectra were recorded from the separated fibrinogen and EMSC was performed on the data set before doing PLSR analysis. In the absence of sonication the prediction model performed poorly, due to the high degree of scattering, as seen in Fig. S7.† Mild sonication can be employed to improve the solubility of and reduce the scattering from fibrinogen, and thus the performance of the prediction model.
The spectrum of sonicated fibrinogen separated by ion exchange chromatography after background correction using the EMSC algorithm displays strong fibrinogen features. In Fig. 7B, it is clear that nine components made significant contributions to the variance in the sonicated fibrinogen spectra. The minimum value of RMSECV is found to be 0.0568 mg mL−1. The PLSR coefficient plot shows the signature peaks of fibrinogen (Fig. 7C), indicating that the linear prediction model obtained was based on the correlation between the Raman spectral intensities of fibrinogen and concentration (Fig. 7D). Hence, it can be concluded that ion exchange chromatography can successfully separate fibrinogen for Raman analysis from the protein mixture within 30 minutes and an accurate prediction model can be built from the Raman data to detect subtle changes in the fibrinogen concentration. Early detection of fibrinogen concentration could help to prevent disorders that are associated with increased fibrinogen level in plasma such as thromboembolism,41 various cardiovascular events and post-surgical arterial reocclusion.42
Raman analysis in the inverted geometry using a water immersion objective is found to be the optimal method to record well defined spectra with minimal background, and notably samples of volumes as low as 1 μL can be measured. In a sample set of varying concentrations over physiologically relevant ranges, the albumin contributions to the spectrum dominate over those of the water, and, after minimal preprocessing, PLSR can be employed to establish a regression model whose predictive performance shows a close correlation between the concentrations of the proteins and the Raman spectral profile. However, in a the more complex simulated plasma mixture of proteins, improved data preprocessing techniques are required to account for the increased spectral background.
Although the broad background to Raman spectra is often attributed to fluorescence, this cannot be the case for materials with are nonresonant at the Raman source wavelength. Proteins such as albumin and fibrinogen can, however, contribute to stray Mie scattered light by causing diffusely scattered radiation that is not well collimated by the collection objective of the Raman microscope, enters the spectrometer effectively as stray light, and is dispersed across the detector.21 The rubberband pre-processing method appeared to efficiently remove the background from the data set of varying concentration of albumin in water, but failed to satisfactorily deal with the background of varying concentrations of albumin in the simulated plasma protein mixture. The more sophisticated EMSC based algorithm helped eliminate the scattering associated with the albumin data in the simulated plasma, improving the prediction model, and also helped to extract the spectral features of fibrinogen from water. In both cases, before subtraction, the primary effect of varying the protein concentrations was to decrease the contribution of the dominant Raman scatterer, which can be understood in terms of the presence of the poorly soluble, highly Mie scattering fibrinogen component. This proposed method can be efficiently used to detect albumin as a standard biomarker for detecting diseases associated with hypoalbuminemia (<30 mg mL−1), such as liver diseases, gastrointestinal protein loss, edema and hyperalbuminemia (>30 mg mL−1), such as severe dehydration and abnormal increase in body fat.43,44 The accuracy of the proposed method is comparable to that of the most commonly used method for detecting albumin from biological fluids, the enzyme linked immunosorbent assay (ELISA),45,46 which is sensitive and selective but is very time consuming and requires extensive sample preparation steps.
In varying concentrations of fibrinogen in aqueous solution, the Raman signal of the water itself is diffusely scattered, increasingly so with increasing fibrinogen concentration, and thus the PLSR identifies a decreasing Raman contribution of water as the dominant concentration dependent effect. In the case of albumin in the simulated protein mixture, a concentration dependent Mie scattering of the Raman signal of albumin itself is the dominant effect of increasing albumin concentration. While one would expect a linear concentration dependent increase in the Raman signal of albumin, the inability of the ultracentrifugation technique to separate the two high molecular weight proteins may suggest an interaction between the albumin and fibrinogen, such that increased albumin Raman scattering is overwhelmed by increased Mie scattering.
Mild sonication is seen to improve the dispersion of fibrinogen in aqueous solutions, and significantly improve the Raman signal. Removing the water contribution using EMSC is seen to significantly improve the predictive model (Fig. 5).
Separation of the fibrinogen by ion exchange chromatography from the plasma protein mixture and application of the ultrasonication technique to reduce aggregation helped to detect fibrinogen features from the plasma solution even at a concentration as low as 0.5 mg mL−1. The RMSECV of 0.0568 mg mL−1 compares favourably with similar observations, for example for attenuated total reflection – Fourier transform infrared absorption monitoring of glucose in blood serum.47 The accuracy of this study is closer to that of the most commonly used gold-standard method i.e., the Clauss assay, which has a detection limit of ∼0.4 mg mL−1.48 The Clauss assay is relatively time consuming and suffers from inconsistencies in the results due to calibration standards, methodologies and variation in the reagents from various manufacturers.41 These steps are relevant only in the case of human plasma and can be avoided while working with human serum as fibrinogen is absent in the serum. The optimised protocol can be applied to detect low abundant protein in bodily fluids after depletion of the abundant proteins to reduce the spectral variability. Currently, such studies are conducted and the results are promising.
Ion exchange chromatography is a quick method to separate the proteins from each other by altering their net surface charge, making it an ideal tool for separating all the protein constituents and a better alternative to ultracentrifugation. In this case, ultracentrifugation failed to separate HMWF proteins from one another, as they tend to form hydrophobic bonds and nonspecific binding interactions with the membrane material (Fig. S6†). However, the ion exchange chromatographic method has to be tailored to the specific protein, depending on its charge, and cannot be applied as a ‘one-for-all’ separation kit for all the proteins.
It has been shown that measurement in the inverted geometry using a water immersion objective yields high quality spectra and the sample volume can be as small as 1 μL. This experimental set up is advantageous for clinical purposes where the volumes of patient samples are minimal. In the simulated plasma protein mixture, the poorly soluble fibrinogen component was seen to obscure the systematic variations of the protein concentrations, due to the high degree of scattering. Extraction of the fibrinogen by ion exchange chromatography is seen to be more specific than by ultracentrifugal filtration, such that the variations of fibrinogen levels themselves can be quantified. In general, the scattering problems caused by fibrinogen favour the use of blood serum for the analysis of the remaining lower molecular weight fractions.
However, to further ensure relevancy and consistency of these results, experiments need to be carried out in pooled plasma/serum. The use of Raman spectroscopy coupled with chemometric techniques not only gives a mere estimate of whether the protein levels are high or low but also gives higher accuracy of quantification. Once appropriate experimental methods are established, a hypothesised point-of-care device that can be used in real clinical applications for spectroscopic analysis of body fluids can be realised. The proposed approach can be expeditiously employed for early detection of pathological disorders associated with high or low plasma proteins.
Footnote |
| † Electronic supplementary information (ESI) available. See DOI: 10.1039/c8an01701h |
| This journal is © The Royal Society of Chemistry 2018 |