FTIR microspectroscopy for rapid screening and monitoring of polyunsaturated fatty acid production in commercially valuable marine yeasts and protists †

The increase in polyunsaturated fatty acid (PUFA) consumption has prompted research into alternative resources other than fish oil. In this study, a new approach based on focal-plane-array Fourier transform infrared (FPA-FTIR) microspectroscopy and multivariate data analysis was developed for the characterisation of some marine microorganisms. Cell and lipid compositions in lipid-rich marine yeasts collected from the Australian coast were characterised in comparison to a commercially available PUFAproducing marine fungoid protist, thraustochytrid. Multivariate classification methods provided good discriminative accuracy evidenced from (i) separation of the yeasts from thraustochytrids and distinct spectral clusters among the yeasts that conformed well to their biological identities, and (ii) correct classification of yeasts from a totally independent set using cross-validation testing. The findings further indicated additional capability of the developed FPA-FTIR methodology, when combined with partial least squares regression (PLSR) analysis, for rapid monitoring of lipid production in one of the yeasts during the growth period, which was achieved at a high accuracy compared to the results obtained from the traditional lipid analysis based on gas chromatography. The developed FTIR-based approach when coupled to programmable withdrawal devices and a cytocentrifugation module would have strong potential as a novel online monitoring technology suited for bioprocessing applications and large-scale production.


Introduction
In recent years, omega-3 (n À 3) polyunsaturated fatty acids (PUFAs), particularly the n À 3 docosahexaenoic acid (DHA; 22:6n À 3) and eicosapentaenoic acid (EPA; 20:5n À 3), have become increasingly popular in the nutraceutical arena due to their important roles in brain function and prevention of cardiovascular diseases as well as maintaining good health. 1his has led to a rapid increase in PUFA consumption, based mainly on sh.Other potential alternative resources not relying on sh stocks have been the subject of active research.3][4][5] As a consequence, these marine microbes have received extensive attention in the elds of biotechnology and biodiesel due to their potential to form the basis of a viable industry to supply a large scale of vegetative biomass containing oils rich in PUFAs.
In particular, recent studies have shown that pigmented marine-derived yeasts of the genus Rhodotorula are capable of accumulating high lipid content, including essential PUFAs, 6 and of growing at a high rate under optimised culture conditions, thus providing a rapid increase in biomass. 7Such characteristics are crucial for a large-scale production and therefore the yeasts promise to play key roles in modern biotechnology.In this study, Rhodotorula species were collected off the coast near Queenscliff (Victoria, Australia), and molecular identication was carried out using 18s rDNA gene sequence analysis aer strain isolation. 8Each of the four strains of Rhodotorula sp.selected for this study possesses distinctive colors varying from pale yellow to orange, pink and red tones.The colours arise from pigments, which are produced to screen wavelengths of light that can potentially damage the cell. 9The traditional identication of yeasts is based mainly on the morphology and physiological tests that determine enzyme production proles and growth characteristics, which involve an intensive use of reagents and are cumbersome as well as time consuming.
7][18] Our present study further demonstrates the potential to discriminate strains of novel marine yeasts from thraustochytrids using chemometric approaches developed based on the FTIR spectral data.The technique is fast, non-destructive and requires only minimal sample preparation.In practice, the marine microorganisms can be directly examined as intact cells. 10,19This results in highly accurate analyses of the chemical compositions of the whole cells, which can lead to a better understanding and optimisation of PUFA production in these cultured microorganisms.Focal plane array (FPA) FTIR imaging has proven to be very powerful for the rapid acquisition of thousands of spectra and collection into one spectral image within minutes compared to the hours required for single-point measurements for the same number of spectra.By applying multivariate data analysis to the thousands of spectra collected simultaneously from a monolayer of cells, complex information on the chemical variation within cell populations can be rapidly assessed for identication, classication, and quality control standardization purposes.Furthermore, there is potential for direct quan-tication of PUFA produced in the cells.
In this study, we report applications of FPA-FTIR microspectroscopy combined with the multivariate data analysis methods, including principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), so independent modelling of class analogy (SIMCA) and hierarchical cluster analysis (HCA), for discrimination and classication of the newly isolated Rhodotorula yeast strains in comparison to a commercially available PUFA-producing thraustochytrid that has been used in the commercial production of vegetative n À 3 PUFAs, and is used here as a standard for assessing the potential of these marine yeasts.In addition, partial least squares regression (PLSR) analysis using the FPA-FTIR spectral datasets and their lipid proles acquired from the traditional gas chromatography (GC) technique was applied to monitor the production of unsaturated fatty acids (UFAs) and total lipids in a Rhodotorula strain grown in an optimised glucose medium.The optimal UFA and lipid contents were then compared to those of the control in a nutrient medium without glucose, and of thraustochytrids grown under a recommended culture condition.The accuracy of the PLSR calibration models was subsequently tested using a cross-validation approach based on two independent replicate datasets, in order to evaluate the capability and the potential of the developed technique as a rapid lipid analyser of cultured cells.

Materials and reagents
Four marine-derived yeast strains genetically identied as Rhodotorula sp. were collected from the Queenscliff region in Victoria, Australia in August 2010. 8A commercial Thraustochytrium sp.AH-2 (PRA-296Ô) was procured from ATCC (Manassas, VA, USA) and grown under the recommended optimised condition for this specic thraustochytrid strain according to the product information sheets.All chemicals used in this study were of analytical grade and purchased from Sigma-Aldrich (Australia) and Merck Chemicals (Australia).

Biological methodology
As illustrated in Fig. 1, the isolation and screening of the Rhodotorula sp. were performed simultaneously in ve biological replicates, which were prepared in ve independent growth cultures (i.e.different cultivation asks) under the same controlled conditions based on the recently published procedure. 8The growth study and the GC-based fatty acid analysis of each microorganism were performed and observed in triplicate according to the published protocols, 8 whilst FPA-FTIR measurements were performed in duplicate using the other two biological replicates.
In brief, the yeast samples from the original sea water and sediments were directly placed into 50 mL polyethylene Falcon tubes containing penicillin and streptomycin (300 mg L À1 each), and kept in ice prior to laboratory use.Suspensions were Fig. 1 Flow diagram of the experimental procedure used in the study including biological and FPA-FTIR microspectroscopic methodologies followed by spectral pre-processing, prior to the multivariate data analysis.The number of spectra mentioned in the figure represents the total number of spectra remaining after each processing step.
spread on Petri plates containing an agar medium prepared using 1 g yeast extract, 1 g peptone, 2 g glucose and 10 g agar in 1 L of Instant OceanÔ articial seawater (Aquarium Systems, Mentor, OH) and the same combination of the antibiotics (i.e.300 mg L À1 penicillin and streptomycin) prior to incubation at 25 C for 5 days.Aer that, the colonies were picked and subcultured on agar plates to ensure purity.
To grow the yeast isolates for lipid production, an optimised liquid medium was prepared by adding 10 g yeast extract, 15 g peptone and 30 g glucose to 1 L of the articial seawater.A nutrient medium without glucose to be used as a control was prepared by adding 15 g beef extract, 15 g yeast extract, 5 g peptone to 1 L of the articial seawater.The prepared growth media were autoclaved at 121 C for 20 min and then subsequently brought to room temperature prior to use.Cultures of the four yeast strains, labelled as AMCQ10C, AMCQ12C, AMCQ1D and AMCQ8A, were then inoculated from their agar plates into autoclaved 250 mL Erlenmeyer asks containing 50 mL of the sterile media.The culture solutions of each yeast isolate were collected on a daily basis and the growth was observed in terms of cell concentration using a Bright-LineÔ haemocytometer (Sigma-Aldrich, New South Wales, Australia).The harvested yeast cells were subsequently preserved in 5% formalin in an isotonic saline solution.The onset of the stationary phase, at which optimal lipid accumulation was observed in a broad range of marine eukaryotic and prokaryotic cells, 20 was found to occur on day 5 for these yeast isolates aer the cells were sub-cultured into the liquid media.
Thraustochytrium sp.AH-2 (PRA-296Ô) was grown in 50 mL of ATCC recommended medium #2673 prepared by adding 1 g yeast extract, 15 g peptone and 20 g glucose to 1 L of the articial seawater.The medium was autoclaved and allowed to cool to room temperature.Thraustochytrids were then grown and harvested at the onset of the stationary phase, found to occur on day 7 of their growth following inoculation of the culture according to observation of the cell concentration and the optical density (OD) at 600 nm using a UV-visible absorption spectrophotometer (Model UV-1800, Shimadzu Scientic Instruments, Japan) performed at regular time intervals.Similarly, the harvested thraustochytrium cells were preserved in 5% formalin in an isotonic saline solution prior to use for FPA-FTIR measurements.

FPA-FTIR microspectroscopy methodology
The formalin solution was initially discarded from the preserved cells by decanting the supernatant aer centrifugation of the cell suspension at 5000 Â g and 4 C.The cell pellet was then washed with sterile isotonic saline solution and centrifuged twice before being deposited using a cyto-centrifuge (Cytospin-III, Thermo Fischer Scientic, MA, USA) to produce monolayers of cells on IR reective glass slides (MirrIR slides, Kevley Technologies, OH, USA).The lms were le to dry in a desiccator containing silica gel beads for ca.30 min to dehydrate the cells, prior to FTIR spectral data collection.The unique advantage of using this specic cytocentrifugation instrument is due particularly to the use of disposable lter pads that adsorb the salt solution from the cell suspension during deposition of the cell monolayer, thereby no residual salt crystals that can lead to strong scattering artefacts in collected spectra were observed in this study.
In addition, it should be noted that the main purpose of using the formalin xation method is to preserve and thus to minimise degradation of the cell content, particularly the PUFAs that are prone to oxidation, from the time the cells were harvested until the FTIR spectral datasets were acquired.There are, of course, macromolecular changes especially those associated with cross-linking in proteins produced by the xation in this step.However, previous studies have shown that these changes are largely conned to the amide I modes, with little or no effect on lipid modes. 21PA-FTIR spectra were collected using a FTIR microscope (Model 600 UMA, Agilent Technologies, Santa Clara, CA, USA), equipped with a liquid-N 2 cooled 64 Â 64 element Stingray FPA detector (Agilent Technologies) and a 15Â objective lens, coupled to a FTIR spectrometer (Model FTS 7000, Agilent Technologies).Spectra were collected in reectance mode in the 4000-800 cm À1 spectral region as a single FTIR image covering a sampling area of 350 Â 350 mm 2 .Each FTIR spectral image consisted of a 32 Â 32 array of spectra resulting from binning the signal from each square of 4 detectors on the 64 Â 64 element FPA array.As a consequence, a single spectrum contained in a FTIR image represented molecular information acquired from ca. 10.9 Â 10.9 mm 2 area on the sample plane, which was equivalent to the average size of one single yeast cell (i.e. 10 mm diameter), whilst a few single spectra could be obtained from the same thraustochytrium cell because their size was on average twice the size of the yeast cells.For each biological replicate, at least ve high-quality FTIR spectral images were collected at 8 cm À1 resolution, 128 co-added scans, Blackman-Harris 3-Term apodization, Power-Spectrum phase correction and a zero-lling factor of 2 using Resolution ProÔ IR imaging soware (Agilent Technologies).Background measurements were performed prior to each sample spectral image measurement, by focusing on a clean unused surface of the substrate using the same acquisition parameters.

Spectral pre-processing
FTIR spectra embedded in each spectral image were rst extracted and quality-screened using CytoSpecÔ v. 1.4.02(Cytospec Inc., Boston, MA, USA).Two criteria were selected for the quality screening test.The rst involved an appropriate sample thickness, which was assessed according to the absorbance over the 950-1750 cm À1 spectral range, to remove spectra with the maximum absorbance less than 0.2 or greater than 0.8.The second criterion aimed for high-quality spectra was based upon a minimum signal-to-noise (S/N) ratio of 150 measured using the signal and the noise over the spectral ranges of 1600-1760 and 1800-1900 cm À1 , respectively.The absorbance and S/N ratio gures used in these two criteria were set based on the previous experience in our laboratory with spectra acquired from monolayers of biological cells using the FPA-FTIR microspectroscopy in a similar optical setting.From our experience, these cut-off values have been shown to eliminate noisy spectra and those from regions of the sample where the substrate may be only partially covered with cells, as well as those possessing very high absorbance outside the linearity range of the detector.As a result, the quality-screening procedure ensured that only spectra of high quality (i.e.good S/N ratio) were used for further analysis.
Aer the quality test, averaging of every 64 spectra was performed on the raw spectra that passed the prior quality-test screening criteria to further improve the quality of the spectra and to produce spectra most representative of the sample population, before spectral pre-treatment and further analysis.It should be noted that although the spectral averaging procedure reduces the spatial discriminatory features among spectra in the same image set, the trade-off is the improvement of the model robustness and classication performance as a result of high quality spectral input.In this light, the FPA-FTIR technique provides a key advantage over single-point data acquisition, through its unique capability of efficient spectral selection to remove spectra of poor quality including those possessing low S/N ratio, signal saturation and scattering artefacts, and subsequently for the generation of pristine average spectra.
In each cell strain, the representative average spectra (approximately 30-50 spectra) from the two replicates were combined and converted to 2nd derivatives using a 9-point Savitzky-Golay algorithm to eliminate the broad baseline offset and curvature. 22The resultant derivative spectra were corrected by the extended multiplicative scatter correction (EMSC) method 23 in the spectral regions 3100-2800 and 1780-965 cm À1 that contain the molecular information relevant to most biological samples (i.e.protein, lipid, carbohydrate and nucleic acid signals).In essence, the EMSC algorithm removes lightscattering artefacts and normalises the spectra accounting for pathlength differences.The EMSC pretreatment oen yields a better interpretability, more robust calibration models, and thereby an improved predictive accuracy as the EMSC-corrected spectra respond more linearly to the analyte concentration when compared to those obtained from untreated spectra.

Multivariate data analysis
Multivariate data analyses including PCA, 24 PLS-DA, 25,26 SIMCA, 27 HCA, 14 and PLSR 25 were performed using The UnscramblerÒ 10.1 soware package (CAMO Soware AS, Oslo, Norway).The PCA approach was rst applied to individual groups of the four yeast isolates and the thraustochytrids containing two replicates, in order to identify and eliminate outliers among samples in the same class.This initial PCA outlier assessment additionally revealed a good consistency in the spectral variation between the two biological replicates in each cell group.In fact, the outliers represented less than 5% of the total dataset in all cases, and approximately 95% of the total spectra from both replicates in each class were selected from the main cluster of spectra in inuence (residual versus leverage) plots with low levels of the residual variance and model leverage and hence most representative of the PCA models.
Aer the selection of representative spectra from each cell group, the EMSC-corrected 2nd derivative spectral datasets of all the yeast isolates and the thraustochytrids were combined into one single set.PCA was subsequently performed on the entire combined set in order to investigate similarities and differences between the cell groups.Note that due to the good consistency of the data previously observed within the same class, the duplicate datasets of each cell class were presented as a single set in the PCA and HCA analyses in order to simplify and provide a better clarity for the presentation of the results.
Classication of spectra using PLS-DA and SIMCA, on the other hand, was performed by keeping the replicate datasets separate following the outlier removal.The spectral datasets of each replicate from every yeast isolate and the thraustochytrids were subsequently combined to form replicate I and II sets including 81 and 79 spectral samples, respectively.A similar data pre-processing procedure as described previously including 2nd derivatisation and the EMSC approach was applied to spectra in each replicate set individually within the sets.Initially, the pre-processed replicate I and II sets were used to perform as training and independent validation (test) sets, respectively.Spectra in the training set were then used to construct PCA-based regression or local models, while samples in the independent validation (test) set were set aside for subsequent classication.Aer acquiring the classication results of the rst model, the role of the two replicate sets was reversed in the second model by using the replicate II dataset as the training set and replicate I spectra as the independent test samples.The classication results obtained from the two crossvalidation models were later compared.The cross-validation employing independent biological replicates was used to investigate the inuence of each dataset on the model robustness and predictive accuracy.The classication performance was estimated from the number of correctly classied samples in each validation (test) set, whereas the discriminative capability particularly in the HCA was assessed based on the good correlation between the biological identity of the samples and the dendrogram structure.

Quantitative analysis of lipid contents produced during the growth
Initially, semi-quantitative analysis of the lipid content accumulated in the cells was performed in terms of %UFAs per total lipids, using integrated areas under EMSC-corrected 2nd derivative bands.Peaks centred at 3006 (or 3014 for thraustochytrids) and 1743 cm À1 were used as representatives of UFAs and total lipids, respectively.The %UFA values were thus calculated as the percentage ratio of the integrated area covering the band centered at 3006 (or 3014) cm À1 to that covering the band centered at 1743 cm À1 .
Consecutively, quantitative determination of %UFAs was performed using PLSR analysis by combining the EMSC-corrected 2nd derivative FPA-FTIR spectra of the replicate I dataset and their corresponding reference %UFA values obtained from the GC technique, 8 in order to construct an initial PLSR calibration model.The validation was subsequently conducted on the pre-processed replicate II spectral dataset to obtain predicted %UFA values.Similar to PLS-DA and SIMCA, the crossvalidation approach was implemented by reversing the roles of the replicate datasets to cross-check and compare the model performance and the predictive accuracy obtained from the two cross-validation models, in relation to the reference values derived from the GC data.

Results and discussion
3.1 FTIR spectral comparison of yeast and thraustochytrium cells Fig. 2 presents the representative EMSC-corrected absorbance and 2nd derivative spectra of the yeasts Rhodotorula sp., in comparison to that of thraustochytrids, obtained by averaging the spectra in each of the 5 datasets.The detailed assignments of the minima found in 2nd derivative spectra and the references for these are given in Table 1.Typically, spectral features in the range of 3100-2800 cm À1 are characteristic of the C-H stretching vibrations of lipids. 28,29The C-H stretching band of olenic C]CH-chains observed either at 3014 or 3006 cm À1 (for thraustochytrids and the Rhodotorula yeasts, respectively) is a result of UFAs produced inside the cells, and is thereby used for examining the degree of unsaturation in lipids and oils. 30,31he bands at 2960/2872 and 2925/2852 cm À1 are attributable to asymmetric/symmetric C-H stretching vibrations of -CH 3 and aliphatic -CH 2 functional groups, respectively.The other prominent peak relevant to the lipid moiety occurs in the lower wavenumber region at $1743 cm À1 assignable to n(C]O) stretches of ester functional groups from lipid triglycerides and FAs, 28 and therefore represents total lipids in the cells.
Of note is the peak representing UFAs observed at 3014 cm À1 for thraustochytrids, but red-shied to 3006 cm À1 in the yeast spectra.The difference between the mean positions of these UFA band minima in 2nd derivative spectra for the thraustochytrid and the yeasts was found to be highly signicant statistically (i.e.3014 AE 0.14 cm À1 and 3006 AE 0.11 cm À1 , respectively, with P < 0.001 by ANOVA).In accordance with the fact that the higher the number of olenic (C]CH-) double bonds the higher the wavenumber of the peak maximum, 32 the shi of this peak maximum to a lower wavenumber suggests a lower degree of unsaturation in the yeast oil compared to that produced by thraustochytrids.The intensities of the band at 1743 cm À1 suggests that the yeast AMCQ8A produced the highest amount of the total lipids among the other cells.The GC-based FA composition prole from the oil extracted from the yeast isolate in our recently published results 8 revealed three types of UFAs present inside the cells consisting mainly of mono-unsaturated oleic acid (C18:1n À 9) with di-unsaturated linoleic acid (C18:2n À 6) and tri-unsaturated a-linolenic acid (C18:3n À 3) present to a lesser extent.In contrast to this, our recent GC results from the thraustochytrium oil reported a number of UFAs with higher numbers of olenic bonds.The rst ve PUFAs with highest % total fatty acids are docosahexaenoic acid (DHA, C22:6n À 3; 34.67 AE 2.07%), docosapentaenoic acid (osbond acid, 22:5n À 6; 9.73 AE 0.42%), Band assignment a eicosapentaenoic acid (EPA, C20:5n À 3; 3.75 AE 0.09%), docosapentaenoic acids (DPA, C22:5n À 3; 1.63 AE 0.07%) and eicosatetraenoic acid (ETA, C20:4n À 3; 1.26 AE 0.05%) (see the ESI S1 † for the complete list of the fatty acid composition of the thraustochytrids).As a consequence, these highly unsaturated FAs in the oil produced by thraustochytrids with at least four C]C bonds in the structures are consistent with the shi of the band to a higher wavenumber as compared to those observed for the yeast cells.Nevertheless, the value of the yeasts for fatty acid production is indicated by the formation of high levels of linoleic and a-linolenic acidstwo essential FAs that cannot be synthesised in mammals, but play a crucial role as precursors in an enzymatic conversion to convert into DPA, EPA and DHA in the human body. 33Together with the advantages of fast growth, high biomass and high total lipid content, the Rhodotorula yeast particularly for the isolate AMCQ8A shows potential as an alternative resource of essential FAs suitable for large-scale vegetative oil production in both the biotechnology and biodiesel elds.
The bands in the ranges of 1680-1630, 1560-1510 and 1260-1220 cm À1 arise due to amide I, II and III modes in proteins, respectively.Among these spectral regions, amide I and III spectral bands have been found to be the most sensitive to the variations in secondary structure folding of peptides and proteins. 34,35In particular, the amide I modes, which primarily represent C]O stretching vibrations of amide groups, are most oen used and by far best characterised for types of secondary protein structures due to their strong absorbance.Accordingly, the amide I bands were primarily used in this study to determine differences in protein conformation present in the two types of cells.Specically, the amide I bands found in the yeast isolates have a distinct peak at 1654 cm À1 between two weaker bands around 1638 and 1670 cm À1 , suggesting the dominance of a-helical proteins in the yeast cells with substantially smaller contributions from b-sheet and b-turn protein conformers in respective order.In contrast, proteins in the thraustochytrium strain are prominently in a-helices and b-turns combined with b-sheets to a lesser extent as evidenced by the doublets observed at 1670 and 1654 cm À1 with a weaker band at 1638 cm À1 .The amide III bands present at 1310 and 1243 cm À1 , albeit relatively weak, further support the presence of characteristic a-helix and b-sheet protein conformations, respectively, in the thraustochytrium strain.
Of interest is the presence of the sharp peak at 1695 cm À1 observed only for thraustochytrids.Although a band at this position is commonly attributed to C]O stretching vibrations of the nucleic acid bases in single-stranded DNA, 36 the intensity of the peak is far stronger than those normally found for DNA components and the majority of nuclear DNA in cells will rather be double stranded. 37,38Due to the thraustochytrium cells being very rich in PUFAs, the band is more likely due to C]O stretching modes in isoprostanes as well as a,b-unsaturated aldehydes and ketones, which are the end products of spontaneous lipid peroxidation through a free radical mechanism. 33ince this lipid peroxidation predominantly occurs with PUFAs or their esters that contain three or more C]C bonds, it explains why such a strong C]O band is observed only for the PUFA-rich thraustochytrids, and not for the yeast cells that produced only fatty acids of a low degree of unsaturation.The formation of aldehyde products through lipid peroxidation is very common and certain aldehyde species have been used as biomarkers to measure the level of oxidative stress in an organism in vivo. 39The presence of an additional peak within the amide III region at 1264 cm À1 in the thraustochytrid spectrum further supports the existence of lipid peroxidation in the cells as this feature corresponds to C-O stretching and/or O-H in-plane bending vibration, which was previously used as evidence of peroxidative damage in model phospholipids and human erythrocytes. 31ccording to our on-going experiment using synchrotron FTIR microspectroscopy to examine live thraustochytrium cells (see ESI S2 †), the 2nd derivative synchrotron FTIR spectra of the live cells revealed prominent bands around 1695, 1638 and 1264 cm À1 similar to those observed for the formalin-xed cells using a laboratory-based FPA-FTIR microspectroscope.However, these bands which represent oxidative moieties (e.g.aldehydes, ketones, carboxylic and carboxylate species) are present at substantially lower intensities than seen in the spectra of the dehydrated (formalin-xed) cells as described above.Because polyunsaturated acyl chains of membrane phospholipids are particularly sensitive to lipid peroxidation that is self-propagating in the cellular membrane, 33 the prolonged period of time spent for cell xation increases the likelihood of the cell membrane being exposed to atmospheric conditions, and this is speculated to be the main factor inuencing the larger amount of peroxidation products in the formalin-xed cells.The presence of spectral bands indicative of lipid peroxidation in the live cells was presumably due to oxidative stress promoted by the environmental conditions experienced in the IR wet cell used in the current experiments, 40 which did not include temperature control and was not a ow-through design.Further FTIR experiments should aim at following the lipid peroxidation process in extracted thraustochytrium oils under UV exposure and subjected to anti-oxidants to gain a better understanding of lipid chemistry in the thraustochytrids.

Classication of cells by multivariate analysis
3.2.1 PCA.The PCA results were obtained by using two spectral windows in the ranges 3100-2800 and 1780-965 cm À1 , covering spectral features characteristic of lipids, proteins, carbohydrates and nucleic acids.Initially, the PCA was performed to differentiate the yeast cells (AMCQ10C, AMCQ12C, AMCQ1D and AMCQ8A) from the thraustochytrids, all of which were collected on the day the onset of the stationary phase was observed.The resultant score plot shown in Fig. 3a clearly reveals distinct separation of clusters of spectra according to the different cell types.In particular, the cluster of spectra from the yeast AMCQ12C set are closest in the PCA score plot to the spectral cluster from the yeast AMCQ10C, with the cluster from AMCQ1D also located at a close distance in PCA space, suggesting that there was a similarity in cell composition between these three yeast strains, whilst clusters of spectra from the yeast isolate AMCQ8A and the thraustochytrids are separated into different quadrants in the PCA score plot.The difference in cell composition between the Rhodotorula sp. and thraustochytrids can be examined through the PC1 loading plot showing strong negative loadings at 1695 and 1670 cm À1 caused by the C]O stretches of oxidative products and b-turn protein conformers that are predominant in thraustochytrids.The other inuential component involves the negative loading at 1475 cm À1 suggesting that the preferred orientation of methylene groups in the phospholipid bilayers exists more as orthorhombic (rather than hexagonal) packing in the thraustochytrium cells. 28,41,42As expected, the loading plot also reveals a substantial negative loading at 1264 cm À1 attributed to stretching vibrations of C-O bonds possibly in carboxylic acids, 31,43 which further supports the existence of lipid peroxidation in the PUFA-rich thraustochytrids as discussed above.Other differences with a considerable impact on classication are indicated by the negative loadings at 1222 and 1172 cm À1 (i.e.asymmetric phosphate stretching modes of phosphorylated moieties 44 and symmetric stretches of C-O-C bonds in esters, 34 respectively), as well as the positively loaded peaks present at 1065 and 1025 cm À1 due to C-O stretching vibrations from carbohydrates. 42,44,45The PC2 loading plot, on the other hand, reveals major components that set the yeast isolate AMCQ8A apart from the other isolates.While the heavily loaded peaks at 2925, 2852 and 1743 cm À1 are accounted for by differences in lipid composition, the positive loadings at 1654 and 1550 cm À1 suggest differences involving a-helical conformations of proteins in the different strains.The loadings at 1078 and 1065 cm À1 further indicate contributions from the stretching vibrations of phosphorylated molecules and carbohydrates. 42,44ubsequently, PCA was performed with only the datasets of three yeast isolates AMCQ10C, AMCQ12C and AMCQ1D, where the spectral clusters were previously located close to each other in the PCA score plot.The results in Fig. 3b clearly show distinct separation of spectral clusters on score plots from the three isolates explained by strong loadings at 1080 and 1065 cm À1 for phosphate and carbohydrate moieties.The other substantial negative loadings at 1025 and 992 cm À1 are attributed respectively to major functional groups in polysaccharides and conjugated trans,trans isomers. 44,45.2.2PLS-DA and SIMCA.The cells were rst classied by the PLS-DA methodology based on the PLS2 algorithm required for two or more dependent variables, 26 using the same two spectral windows as used in the PCA with the Y values of +1/À1 set as a yes/no decision in the reading of prediction whether or not the sample belongs to the assigned class.The zero line (Y ¼ 0) is drawn as a decision borderline.Initially, PLS-DA was performed by using the replicate I spectral data to serve as a training set, while spectra in the replicate II set were utilised as independent test samples for the validation.Fig. 4 displays the classication results including the linear regression models obtained using data in the training (replicate I) set, and the corresponding predictions of the samples in the independent validation (replicate II) set from the trained PLS-DA models.A minimum root mean standard error of calibration (RMSEC) was achieved with 6 latent factors resulting in the coefficient of determination R 2 $ 0.92 and 0.99 for the yeast isolates and the thraustochytrids, respectively, with the Y-variance plot indicating that 97% of the total variance in the dataset is explained.As shown, these optimised PLS-DA models led to 100% accuracy in predicting the total independent samples from the different cultivation in the replicate II set.The deviations of the predicted values were high when the models were used to predict the yeast isolates AMCQ10C, AMCQ12C and AMCQ1D, but substantially reduced for the prediction of the yeast isolate AMCQ8A and the thraustochytrids.This conforms well with the previous PCA results as the yeast isolate AMCQ8A and the thraustochytrids possessed their own unique characteristics of cell compositions described previously through the PCA loading plot (Fig. 3).Complementary to the described model, cross-validation of the PLS-DA classication was performed by reversing the role of the two replicates.The results of the reversed model revealed very similar outcomes to the rst model with all the test samples in the independent validation set correctly classied into their classes (see ESI S3 †), indicating the ability of the PLS-DA to classify spectra acquired from cells drawn from independent replicate cultures.
Next, SIMCA was applied to test the robustness and discrimination power of different classication methods using the same cross-validation approach.Although both PLS-DA and SIMCA are based upon PCA, the main difference between the two classication methods is the criterion used to build models -SIMCA computes individual models based on PCA to identify variations within each class, but the PLS-DA identies directions in the data space that discriminate classes directly and due to the number of variables PLS-DA was performed in this study in order to model several Y-variables simultaneously.The prediction results obtained by SIMCA according to the crossvalidation approach are presented in Table 2 and ESI S4, † showing that classication of some of the test samples that belonged to the yeast isolates AMCQ10C and AMCQ12C were confounded.This typically occurs for SIMCA when the intercluster distance becomes close, which was true for these two yeast isolates since their PCA clusters were observed to be overlapped in the score plot as depicted in Fig. 3a.Because of this, the test samples that belong to the yeasts AMCQ1D, AMCQ8A and the thraustochytrids of which the PCA clusters are well isolated were all correctly classied by SIMCA.
Considering the fact that every test sample was correctly classied by PLS-DA and only 5 in a total of 160 test samples from both models (i.e.ca.3% of the total population) were falsely classied into two classes, the differentiation between four yeast and thraustochytrid strains was quite distinct as evidenced by the ability to classify them at a high level of sensitivity and specicity using these two totally different clas-sication methods (i.e.PLS-DA and SIMCA).Therefore, both multivariate data analysis approaches particularly PLS-DA demonstrated satisfactory linearity, robustness and predictive accuracy suitable for classication of the specic marine microbes used in this study.It should also be emphasised that our cross-validation approach based on the use of separate replicates for different roles was designed in order to ensure that the test samples used for the validation purpose are totally independent of those involved in the model construction because each replicate came from different cultivations and was pre-processed individually within the set.In addition to providing a fair assessment of the model performance, the approach also imitates a realistic practice in an actual experimental setting in a way that a model is initially built and optimised by a standard set prior to the validation step to identify unknown samples from different cultivations.
3.2.3HCA.Initially a number of clustering algorithms and interspectral distance criteria were tested to achieve the optimum discriminative performance judged by a good correlation between biological identity of the samples and spectroscopy.The result obtained in the form of a dendrogram as shown in Fig. 5 reveals the best discriminative capability achieved when using a combination of squared Euclidean distance measure criterion and Ward's algorithm 46 with 4 clusters.The dendrogram reveals that spectra from the Rhodotorula sp. were discriminated from those of thraustochytrids in the rst cluster, which could be explained as these two different marine microbes possess the highly distinctive FTIR spectral characteristics as previously inspected through the average 2nd derivative spectra in Fig. 2.Among Rhodotorula sp., the isolate AMCQ8A was rst separated from the other three isolates, with AMCQ1D subsequently being clustered into its own distinct grouping.Similar to the PCA result obtained with the entire yeast and thraustochytrid dataset in Fig. 3a, the HCA approach failed to discriminate the yeast isolates AMCQ10C and AMCQ12C even when the number of clusters was further increased suggesting a high degree of similarity in cellular composition.In fact, the HCA result coincides well with the biological identities of the yeasts previously derived using 18s rDNA gene sequence analysis.According to the gene sequences, the isolate AMCQ8A is closely related to R. mucilaginosa L10-2 (Genbank accession number EF218987.1),which exhibited substantial differences in genetic and phenotypic expression compared with R. mucilaginosa PTD3 (Genbank accession number EU563926.1)and R. graminis WP1 (Genbank accession number EU563924.1), the two Rhodotorula strains that were matched with the isolates AMCQ10C (the same for AMCQ12C) and AMCQ1D, respectively. 8The HCA Euclidean dendrogram therefore demonstrates a high accuracy and good reliability in discriminating lipid-rich yeast cells of the specic strains examined in this study.

Monitoring lipid production in the yeast isolate AMCQ8A
The FPA-FTIR technique was further applied to monitoring lipid accumulation during the growth period of the yeast isolate AMCQ8A grown in the optimised glucose medium.In this medium, the highest amount of total lipids in the cells was achieved, in comparison to that produced by the same isolate grown in a control medium without glucose.Our initial qualitative analysis using the same PCA approach with the same spectral windows and 4 PCs produced the results displayed in Fig. 6, which shows discrete clustering of spectra from the yeast isolate grown in the glucose media at different harvest periods as well as a separate cluster of spectra acquired from cells grown in the control medium.The corresponding loading plots reveal the main inuence of the separation to be lipid moieties Table 2 SIMCA classification results at 95% significance limit obtained by using replicate I spectral data as a training set and spectra in replicate II set as independent validation (test) samples.Note that the parameters used for the classification were similar to those used in the PLS-DA including 3100-2800 and 1780-965 cm À1 spectral ranges and 6 PCs
explained by the strong negative loadings at 2925, 2852 and 1743 cm À1 , in association with a strong positive loading at 990 cm À1 , which is due to the characteristic olenic ]CH deformation modes in conjugated trans,trans UFAs and esters.Other loaded bands observed at 1654, 1080 and 1150 cm À1 suggest additional contributions from a-helical proteins, phosphate groups (in nucleic acids and phospholipids), and stretching vibration related to the structure CO-O-C found mainly in glycogen and nucleic acids, respectively.
Fig. 7a further demonstrates the average spectra of the yeast isolate AMCQ8A grown in the glucose medium that were collected on a daily basis, and of those grown in the control medium harvested at the onset of the stationary phase in which an optimal amount of total lipids was found according to the FTIR and GC data as follows.For an initial semi-quantitative purpose, band areas of total lipids and UFAs were measured aer the individual spectra were converted to 2nd derivatives and EMSC-corrected.These pre-processed spectra were  subsequently offset over two spectral ranges within 3025-2990 and 1760-1725 cm À1 , to cover the peaks centred around 3006 and 1743 cm À1 under which the integrated areas directly represent the proportions of UFAs and total lipids, respectively.The total lipids and the ratio of UFAs per total lipids in terms of %UFAs, based on the semi-quantitative band area approach, are plotted in Fig. 7b along with the cell concentration (million cells per mL) as a function of time.Note that the absence of the lipid data for the yeast AMCQ8A on day 0 (inoculation day) and days 1-3 (cultivation days) was due to insufficient cell density in the medium that resulted in a failure to produce good continuous monolayers of cells on an IR substrate for the FPA-FTIR measurements.As anticipated from previous studies and supporting literature, 20 the optimal amounts of total lipids produced in the yeast isolate AMCQ8A were also achieved at the onset of the stationary phase of its growth.By comparison, the yeast isolate AMCQ8A grown in the glucose medium produced signicantly more total lipids than the others throughout the growth phase, even though the proportions of the UFAs were found to be substantially lower than that observed under the optimised conditions of the thraustochytrids and the same yeast isolate grown in the control medium without glucose, respectively.It is interesting to note that the FTIR results do not take into account the degree of unsaturation of the FAs produced.However, the results are apparently in good agreement with the GC-derived FA prole (see ESI S1 †), indicating that the FAs produced in the thraustochytrids are of a higher degree of unsaturation including mainly DHA and EPAthe two essential FAs highly in demand by industry.
To achieve a higher level of accuracy in determining %UFAs in these yeast cells, PLSR analysis was conducted using a similar cross-validation approach to those used in PLS-DA and SIMCA by using each replicate dataset to individually perform as calibration (training) and validation (test) sets.The pre-processed spectral data used for the PCA in Fig. 6 were then transferred and input into the PLSR analysis together with their corresponding %UFA values obtained from the GC technique.By using the same spectral windows that contain biological information about the cells (i.e.3100-2800 and 1800-965 cm À1 ) and 2 latent factors, optimised PLSR calibration models with good linearity were produced as indicated by good values of coefficient of determination R 2 $ 0.92 for both cross-validation models (see ESI S5 †).It should be emphasised that although the optimal number of latent factors was initially found to be 4 factors based on the two commonly accepted criteria of (i) the minimum explained variance and root mean standard error of calibration (RMSEC) and (ii) the correlation coefficient R 2 close to 1, we have chosen for this study a conservative approach to  present the results obtained with 2 latent factors in order to avoid the possibility of model over-tting and to make sure that only chemical information was employed in the model optimisation rather than random or spurious correlations. 47,48To support the claim, a comparison of the PLSR results obtained using different number of latent factors and their corresponding regression coefficients was made according to the same cross-validation approach (see the ESI S5 and S6 †).The model performance and the predictive accuracy achieved with only 2 latent factors, as compared to those of 4 latent factors, appeared to still be in an acceptable range for both cross-validation models.The respective regression coefficients additionally revealed only spectral features relative to the 2nd derivative spectra of the cells, providing strong indication that the calibrations were based on genuine chemical features and not on noise contributions.Accordingly, the %UFAs obtained from the PLSR analysis as illustrated in Fig. 7b reect the results obtained using only 2 latent factors.As a result, the two complementary PLSR models led to highly accurate predictions of %UFA values according to (i) good linear ttings (R 2 $ 0.93) obtained in the plots of predicted versus reference %UFA values, and (ii) low root mean square errors of prediction (RMSEP ¼ 3.99% and 3.96%) in both cases with the reference %UFAs in the cell samples over the range of 17-60%.To evaluate the model performance and predictive accuracy of the developed PLSR approach, the predicted %UFA values of the cell samples that were harvested on the same day were averaged and plotted along with their corresponding %UFA values previously obtained from the band area ratio approach and those acquired from the GC technique in triplicate, 8 as illustrated in Fig. 7b.By comparison, the %UFA values acquired from the FTIR-based methods (i.e.band area ratio and PLSR analysis) were found to be in good agreement with their GC counterparts, suggesting a high accuracy of the method and thus a strong potential of the combined FTIR and PLSR approaches for lipid monitoring purposes.These investigations additionally provide insights into the UFA production in these marine microorganisms, showing an invariable change in the UFA level from the exponential stage until reaching the end of the stationary phase of the yeast isolate AMCQ8A grown in the glucose medium.Although these yeast cells produced substantially higher total lipids throughout their growth period, the results from the three different analysis methods further indicated that the optimum %UFAs were rather achieved at signicantly higher levels in the thraustochytrids and the same yeast isolate were grown in the control medium, respectively, than those in the glucose medium.Such ndings point out the advantage of the yeast isolates in terms of total lipid production suited for biodiesel applications, for example.
In summary, our present results based on the independent biological replicates that were prepared simultaneously under the same controlled conditions suggest the potential application of the FPA-FTIR technique for classication and rapid monitoring of lipid production, both in terms of total lipids and %UFAs, in these marine cells.However, it should be emphasised that, further from the preliminary investigation of this nature, prospective testing of models across independent experiments will be needed in order to gain the best measure of the model performance and a more accurate assessment of the developed approach towards its use in actual routine practice.
Although the GC technique can provide the details of FA species and their actual quantities, the technique involves invasive cell processing as well as time-consuming and tedious sample preparation in order to convert the lipids into free FAs, which could take a day or more before an accurate measurement is achieved and therefore cannot be considered to be a rapid monitoring technique.FTIR microspectroscopy, on the other hand, requires minimal simple sample preparation to transfer the preserved intact cells onto an IR substrate as a monolayer with subsequent removal of water through desiccation prior to the spectral data collection, resulting in a fast analysis.With advances in bioprocessing technology, it is possible to couple programmable withdrawal devices to a cytocentrifugation module to obtain the cultured cells from a bioreactor for the acquisition of the spectral datasets, which can be subsequently transferred to an automated spectral processing unit for further analysis based on the developed multivariate data analysis approach.Such an implementation will further lead to an automated lipid analysis platform that is suitable for online monitoring purposes.
Furthermore, the 'speed' advantage of the FPA-FTIR imaging technique over a conventional single-point FTIR microspectroscopic measurement should be emphasised because, with the same number of scans per spectrum, an acquisition of 32 Â 32 array of FPA-FTIR spectra (i.e.1024 spectra in total) takes approximately the same period of time as acquiring one single-point spectrum using a single-point detector.This is due to the fact that each element on the 32 Â 32 array FPA detector works as a single-channel detector, and thus processes the data collection simultaneously.Although a previous study has indicated better spectral quality, in terms of S/N ratio, of a singlepoint spectrum than a FPA-FTIR spectrum, 49 the 'speed' advantage of the FPA-FTIR approach far outweighs the differences in the spectral quality between the two measurement systems, given the still acceptable spectral S/N ratio obtained using the FPA-FTIR technique.Moreover, with the large spectral resources acquired from each spectral image, spectral averaging as used in this study can provide a solution for improving the quality of the spectral input before further analysis.Although such a practice may compromise the 'speed' advantage of the technique, the quality-screening procedure can be easily performed using computer-programming soware in a rapid or even automated fashion, and still requires less time to obtain a satisfactory number of high-quality spectra compared with acquiring single-point measurements.

Conclusion
A technique based on FTIR microspectroscopy and multivariate data analysis to discriminate and classify lipid-rich marine microbes including four yeast strains in Rhodotorula sp. and thraustochytrids has been developed.Rapid FPA-FTIR data collection with minimal sample preparation, in conjunction with the powerful multivariate data analysis methodologies including PCA, PLS-DA, SIMCA, HCA and PLSR, demonstrated combined attributes for rapid, low-cost, online monitoring of lipid production in these marine microorganisms, which have strong commercial potential.The techniques in combination are shown to be capable of probing differences in cellular composition between the diverse cell types examined.The results from PLS-DA, SIMCA and HCA indicated satisfactorily high accuracy in identication, model robustness and strong performance in classifying and discriminating among the Rhodotorula yeast strains and thraustochytrids into classes that are closely correlated with their classication based on morphology and genotyping.The FTIR technique with the PLSR approach was additionally shown to possess the potential for online quantitative monitoring of total lipids and UFAs produced in the cells during the growth period.

Fig. 2
Fig.2Comparisons of the average EMSC-corrected (a) absorbance and (b) 2nd derivative spectra of the four Rhodotorula yeast isolates and the thraustochytrids (PRA-296ä) taken at the onset of the stationary phase.Note that the EMSC-corrected 2nd derivative spectra were processed by 2nd derivatisation and then EMSC in a similar order used throughout the manuscript.

Fig. 3
Fig.3PCA score (left) and loading (right) plots showing projections against the first 3 PCs that explain the majority of the spectral variation with the inclusion of the datasets from (a) all yeast isolates and a thraustochytrium strain and (b) yeast isolates AMCQ10C, AMCQ12C and AMCQ1D, alone.

Fig. 4
Fig. 4 PLS-DA results showing linear regression models of individual yeast isolates and the thraustochytrid trained by using the replicate I spectral dataset (left) and their corresponding prediction results to identify the yeast/thraustochytrid samples in the replicate II set as independent validation samples (right).The nominated Y values of +1/À1 in the prediction represent yes/no classification decisions, respectively, showing that 100% of samples in the independent validation set were correctly classified.Note that the numbers of the cell samples included in replicate I and II sets are 81 and 79, respectively.

Fig. 5
Fig.5HCA dendrogram obtained by Ward's algorithm and squared Euclidean distance measure criterion, using the entire dataset that included the four Rhodotorula yeast isolates and thraustochytrids harvested at the onset of the stationary phase.

Fig. 6
Fig.6PCA score (left) and loading (right) plots of the yeast isolate AMCQ8A grown in the optimised glucose medium (days 4-8), in comparison to that of a control medium (without glucose) collected at the onset of the stationary phase (day 3).

Fig. 7
Fig.7(a) Average EMSC-corrected 2nd derivative spectra of the yeast isolate AMCQ8A grown in the glucose and the control media.(b) Cell concentration plotted together with the normalised 2nd derivative band area of total lipids, and %UFAs per total lipids observed for the yeast AMCQ8A in the media with glucose (days 4-8) and without glucose (day 3), in comparison to that of thraustochytrids (day 7).Three different methods were used to obtain %UFAs including (i) percentage ratio of integrated 2nd derivative band areas, (ii) PLSR analysis, and (iii) GC technique.

Table 1
FTIR band assignments for functional groups found in the 2nd derivative spectra of the yeasts of Rhodotorula sp. and thraustochytrids Wavenumber values (cm À1 )