OpenFluor – an online spectral library of auto- ﬂ uorescence by organic compounds in the environment †

An online repository of published organic ﬂ uorescence spectra has been developed, which can be searched for quantitative matches with any set of unknown spectra. It ﬁ lls a critical gap by increasing access to measured and modelled (PARAFAC) spectra, and linking across studies and systems to reveal “ global ” ﬂ uorescence trends.

Fluorescence spectroscopy offers an inexpensive, non-destructive method for obtaining sensitive measurements of a diverse group of organic compounds that contain uorophores. This technology is now widely used to characterise naturally-occurring organic matter in natural and articial aquatic systems with the purpose of understanding how the uorescent fraction of carbon is partitioned between different organic matter fractions, and inferring the processes responsible for its formation and removal. 1-5 With Excitation-Emission Matrix (EEM) spectroscopy, uorescence emission is measured over a range of excitation wavelengths to produce three-dimensional uorescence landscapes (Fig. 1). Each EEM represents total uorescence from an unknown number of underlying uorophores which in ideal conditions uoresce independently following Beers Law, but under non-ideal conditions may interact. 6 Over the past ten years it has become common practice to decompose EEM datasets mathematically using PARAllel FACtor analysis (PARAFAC). 7-9 PARAFAC reduces the EEM dataset into a small number of building blocksreferred to as 'underlying components'each with a characteristic excitation and emission spectrum (Fig. 1). Each EEM in a dataset is modelled by a simple recipe in which the same building blocks are combined in varying amounts, reecting their variable concentrations.
There are now well over 100 published PARAFAC models of dissolved and natural organic matter (both referred to hereaer as NOM) and over 500 published PARAFAC components. 9,10 However, no agreed measure exists for determining whether the same PARAFAC components were found in different studies. Furthermore, while scientists have some idea of the chemical structures likely to be responsible for NOM uorescence, few reference data are readily available and even fewer studies have drawn reliable comparisons between PARAFAC components and pure organic compounds. It is presently unclear how oen PARAFAC components extracted from NOM accurately represent the spectra of pure compounds or mixtures, or the degree to which PARAFAC decompositions are impaired by potential non-ideal chemical behaviours such as spectral shiing, 4 energy or electron transfer, 6,11 and chargetransfer interactions. 12 It is widely supposed that spectrally similar PARAFAC components extracted from unrelated datasets are attributable to similar organic matter sources, and depict the same or similar underlying compounds having similar ecological functions. However, since the spectra of published PARAFAC components are only typically available as images or summary tables in the original publications, this hypothesis is extremely difficult to test. Thus, Ishii and Boyer 13 recently reviewed the reported distributions and responses to physicochemical processes of three apparently widespread humic-like PARAFAC components, nding numerous inconsistencies between studies with regard to their reported behaviours. However, in that review as in the overwhelming majority of reviewed studies, PARAFAC components were equated on the basis of broad criteria such as the number and positions of spectral peaks, with peak positions approximately dened and allowed to vary over a broad wavelength range. Previously, In the literature, PARAFAC components have been equated to specic compounds and redox states with little or no quantication of spectral similarity. This widespread use of qualitative or subjective criteria for equating components between studies is a serious confounding factor for interpreting global trends in component distributions and behaviours, or for deducing the organic structures likely to be responsible for the observed patterns. Recent papers have emphasised the importance of standardised approaches to measuring EEMs 14,15 and deriving PARAFAC models, 9,16 and a systematic way of comparing the results of different studies is urgently needed.
To support quantitative comparisons of uorescence spectra between studies, an open-access spectral database (http:// www.openuor.org) has been developed. The database is accessible using any modern web browser (e.g. Mozilla, Chrome, Internet Explorer) on desktops, tablets or smartphones. All interactions between the user and the database occur via a simple graphical user interface with no programming necessary. The supporting use of HTML5, jQuery and JavaScript create a rich and interactive graphical user interface within the browser. When search query is implemented on an unknown set of reference spectra, quantitatively similar spectra are retrieved from the database.
Algorithms for quantifying spectral similarity have been the subject of extensive research in other branches of analytical chemistry, [17][18][19] but are undeveloped in the context of uorescence. Currently, OpenFluor identies similar spectra as having Tucker congruence 20 q exceeding 0.95 on the excitation and emission spectra simultaneously (eqn (1)). A more targeted search for matching spectra will be implemented in the future as improved algorithms for matching spectra become available. (1) Records in the OpenFluor database are accompanied by synopses of the study that generated the data, including a short methodological description and an active link to the published record at http://dx.doi.org. Unregistered visitors to the website may temporarily upload spectra and search for quantitatively similar spectra in the database. Completion of a free one-time registration process allows the user to browse descriptions of matching models, generate plots, and download matched data. Registered users may elect to submit published spectra to the database, thereby making their own research results available for searching by other members of the uorescence community. Fig. 2 illustrates the potential for a spectral database to reveal similarities as well as differences between PARAFAC spectra. Each of the humic-like components depicted in Fig. 2A-C full the description of "reoccurring Component 2" described by Ishii and Boyer 13 (excitation maxima approximately <240-275 nm and 339-420 nm; emission approximately 434-520 nm). Dozens of other spectra in the OpenFluor database also conform to this general description, yet are relatively poor quantitative matches for these spectra. In Fig. 2B, the four PARAFAC components shown share nearly identical emission spectra, but the excitation spectra fall into two distinct groups, corresponding with datasets from water treatment plants in Denmark 21 and Australia 5 which have different excitation spectra than in the models of datasets from the Florida Everglades 22 and the South Atlantic Bight. 23 Since nearly all components have primary excitation maxima near the limits of the measured or modelled range (<250 nm), they are mainly distinguishable by the position of their secondary excitation peak in conjunction with the position of the emission maximum (C ex/em ). In Fig. 2C, the strongly overlapping components shown appear to mainly cluster in two sets, described here as C 400/518 nm and C 380/500 nm. The ESI † lists published sources for components in Fig. 2.  approximately 374-450 nm). 13 The component depicted in Fig  2D was identied repeatedly in a study of water treatment plants around Australia,5 in which samples were measured on a single instrument but independent PARAFAC models were developed for each plant. A similar component is seen in several other studies (Fig 2E), although those spectra are more variable. Fig  2F depicts a different component, or given the apparent continuum of peak locations, possibly a suite of components representing different compounds or groups thereof. As the number of datasets in OpenFluor increases, a more robust picture of such components should emerge. Fig. 2G-I illustrate three different protein-like components in the database that have each been described as "tryptophanlike". Fig. 2G depicts a component common to studies that sampled in Baltic 24 sea ice, Antarctic 25 sea ice, the North Atlantic ocean 24,26 and the Florida Everglades. 22 The spectra are extremely similar in each study, down to ne detail in the emission spectra, which suggests that a discrete organic compound rather than a mixture of compounds may be responsible for this signal. Fig. 2H depicts a component iden-tied in models from natural and articial environments. 5,27,28 The component depicted using dashed lines in this gure was strongly correlated with lignin concentration in one study. 27 Fig. 2I depicts a commonly-observed component with spectra similar to free dissolved tryptophan. The shape of the emission spectrum for this component differs between studies, possibly because it is derived from a group of compounds, and possibly also because interference by Raman scatter makes it difficult to accurately resolve its spectra.
The OpenFluor spectral database aims to address a serious deciency affecting the current interpretation of NOM-PAR-AFAC models. Thus, although it is widely assumed that spectrally similar PARAFAC components identied in unrelated studies have similar sources and ecological functions, quantitative spectral comparisons have been implemented only rarely 5,10 and with respect to a small number of studies. At the same time, many studies have drawn conclusions about the origins and behaviours of various components on the basis of qualitative comparisons with earlier studies. It is therefore likely that inconsistencies between reported behaviours of similar PARAFAC components are at least partly attributable to the unintentional grouping of NOM components that are spectrally similar, yet chemically and behaviourally distinct.
It is also important to realise that many uorophores could have very similar spectra, so identifying similar PARAFAC components in two different studies does not guarantee that the same compounds are responsible in both cases. Fig. 3 compares a PARAFAC component identied in the Mackenzie River plume 29 in northern Canada with the spectrum of pure dissolved sodium salicylate (C.A. Stedmon, unpublished data), a common pharmaceutical derived from wintergreen plants.
Since the Mackenzie River watershed is mostly covered by virgin forests and wetlands and is minimally inuenced by human activities, 30 a pharmaceutical source for this component can be ruled out. Instead, it is more likely to represent forest-derived phenolic compounds with very similar spectral characteristics to sodium salicylate. The database may therefore be more useful for detecting patterns in the occurrence of uorescence components, and deducing relationships between them, than as a tool for identifying the specic chemical structures responsible for the observed signals.

Conclusions
OpenFluor enables quantitative comparisons of uorescence spectra between studies for the rst time via a simple browserbased user interface. At release, the database contains over 200 PARAFAC spectra derived from more than 30 published studies of NOM in natural and industrial aquatic systems. Its size is expected to increase rapidly, since users can submit published spectra to the database via the online system in a matter of minutes, and doing so could greatly increase the chances that a study is encountered and cited by other researchers. Future developments to the database are planned to further increase its usefulness, including the incorporation of automated routines for checking the quality of uorescence spectra, and the implementation of enhanced spectral-matching algorithms incorporating chemical as well as statistical criteria.