Complexity of dissolved organic matter in the molecular size dimension: insights from coupled size exclusion chromatography electrospray ionisation mass spectrometry

This paper investigates the relationship between apparent size distribution and molecular complexity of dissolved organic matter from the natural environment. We used a high pressure size exclusion chromatography (HPSEC) method coupled to UV-Vis diode array detection (UV-DAD) and electrospray ionisation mass spectrometry (ESI-MS) in order to compare the apparent size of natural organic matter, determined by HPSECUV and the molecular mass determined online by ESI-MS. We found that there was a clear discrepancy between the two methods, and found evidence for an important pool of organic matter that has a strong UV absorbance and no ESI-MS signal. Contrary to some previous research, we found no evidence that apparently high molecular weight organic matter is constituted by aggregates of low molecular weight (<1000 Da) material. Furthermore, our results suggest that the majority of apparent size variability within the ESI ionisable pool of organic matter is due to secondary interaction and exclusion effects on the HPSEC column, and not true differences in hydrodynamic size or intermolecular aggregation.


Introduction
Dissolved organic matter (DOM) in aquatic environments is by far the most abundant form of organic matter in natural waters. It is an ultra-complex mixture of phenolic, carboxylic acid rich material that ranges in concentration from >50 mg L À1 C À1 in terrestrial wetlands down to <0.5 mg L À1 C À1 in the deep sea. DOM is operationally dened as organic matter that is not retained by ltration, charge density in order to evaluate the two main theories about the size distribution of DOM presented above. Based on current paradigms about DOM size distribution and aggregation, we expected to nd that high molecular weight material measured by HPSEC-UV-DAD would be detected by ESI-MS as monomers with m/z 200-800, however, our results instead provided evidence for the existence of genuinely large, UV active and ESI-MS invisible DOM, and no evidence for the expected molecular aggregates.

Chemicals and samples
Ultrapure water (18.2 MU resistivity) was generated with a MilliQ system (Millipore, Burlington MA, USA). Methanol was high purity hypergrade (LiChroSolv for LCMS, Merck, Kenilworth, NJ, USA). High purity ammonium acetate (NH 4  Suwannee River Fulvic Acid standard (lot 2S101F) was purchased from the International Humic Substances Society (IHSS), and four other samples were prepared as follows: Soil humic and fulvic acids -1.42 g of soil from Stadsskogen, Uppsala, Sweden (59.840 N, 17.636 E), was mixed with 10 mL 0.25% ammonia. The mixture was shaken, sonicated and vortexed twice, le at room temperature for 90 minutes, and then centrifuged at 4000 rpm for 10 minutes. One aliquot (2 mL) was taken from the supernatant without further modication and labelled 'Humic acids', and a second aliquot (2 mL) was acidied to pH 2 by adding 40 mL 6 M hydrochloric acid, labelled 'Fulvic acids'. The two aliquots were centrifuged to remove precipitated material and the resulting supernatants were injected onto the SEC column at 100 mL injection volume without further treatment.
Wetland stream reverse osmosis DOMorganic matter was concentrated from a small stream that drains a wetland named Börje Sjö near Uppsala, Sweden (59.918 N, 17.347 E) using a RealSo PROS/2S portable reverse osmosis system. 34 DOM was concentrated to 700 mg L À1 C À1 . This sample was stored for >2 years at 4 C in the dark, and was ltered through glass bre lter (GF/F 0.7 mm) before injection (20 mL) onto the SEC column.
Leaf cold water extract -25.96 g of green birch (Betula sp.) leaves were taken from a tree beside the Biomedical Centrum, Uppsala, Sweden (59.841 N, 17.634 E), and mixed with 554 g ultrapure water. The mixture was kept at 4 C in the dark for 18 hours to extract 'rainwater extractable DOM'. A sample (2 mL) was taken to an Eppendorf tube, dried down by vacuum centrifuge (Eppendorf Concentrator Plus) and redissolved in 100 mL 20 : 80 methanol : water, 25 mM NH 4 Ac for injection (80 mL) onto the SEC column.

High pressure liquid chromatography-mass spectrometry
High pressure size exclusion chromatography (HPSEC; Tosoh TSKgel G3000SW 300 Â 7.5 mm, 10 mm) was conducted with an Agilent 1100 HPLC system using combinations of water, methanol and NH 4 Ac as the mobile phase, with a ow rate of 1 mL min À1 . A YMC-Pack-Diol-300 (300 Â 8 mm, 5 mm) column was also tested, but found to be much more affected by hydrophobic interactions between analytes and the stationary phase than the Tosoh column. UV light absorbing DOM was detected using a UV-Vis Diode Array Detector (DAD; Agilent 1100) at wavelength 254 nm.
For online ESI-MS analysis, the ow was split (aer UV-Vis DAD) approximately 9 : 1 to waste, leaving a ow of $100 mL min À1 for heated ESI-MS (100 C, sheath gas setting 28, 2.5 kV negative mode). The splitting and heating were to assist with vaporisation of the solvent, which in most cases was largely water. The time delay from UV-Vis DAD detection and ESI-MS detection was determined to be 0.28 minutes using the 206 Da PSS standard, and all ESI-MS times are reported so as to be aligned with UV-Vis DAD detection times. The effect of injection volume was tested by analysing SRFA at four concentrations (0.25-5 mg L À1 C À1 ) with differing injection volumes (1-100 mL) in various combinations ( Fig. SI1 †), and this had minimal effect on the retention prole determined by UV-DAD and ESI-MS.
The LTQ is a linear ion trap that is used to collect ions before transfer to the Orbitrap analyser, and is also used for collision induced fragmentation (CID) using nitrogen gas. The ion trap was set to trap and transfer ions with masses 200-2000, and the Orbitrap mass analyser was set to ll to 1 Â 10 6 ions (maximum ll time 250 ms) at a resolution setting of '100 000', giving an actual resolution >115 000 at m/z 401. Formulas were assigned in each transient as in our recent papers 35 up to C 40 H 80 O 30 NS 13 C. The formula possibilities were restricted to the following rules: O/C < 1.0, H/C 0.3-2.2, m/z 200-800, double bond equivalence minus oxygen (DBE-O) < 10, 36 N + S + 13 C < 2. Formulas were assigned if a peak in the resulting list could be found within 3 ppm (Dm/m Â 1 Â 10 6 ) of a measured peak in each individual transient. Isotopologues containing one 13 C were assigned but not considered in further data processing (i.e. molecular formulas were not counted twice via isotopologues).
Calibration and method validation was conducted with PSS standards (representing charged polymers), PPG standards (representing uncharged polymers) and three model compounds with 0, 2 and 4 carboxylic groups, but the same m/z (369) and formula (C 16 H 18 O 10 ). The size standards of PSS and PPG were prepared to 1 mg mL À1 in 10 mM NH 4 Ac and analysed at a 2 mL injection volume with detection by UV-Vis DAD. Aer initial testing, the PSS standards were mixed into two mixed standards at equal volumes (A: 218 kDa, 15 kDa, 1.4 kDa and acetone, B: 38 kDa, 4.4 kDa, 206 Da) and injected at a 10 mL volume. The model compounds (a malonic acid derivative, fraxin and an isoferulic acid glucuronide) were prepared to 1 ppm concentration in 50 : 50 water : methanol, and 50 mL of the solution was injected onto the SEC column. They were distinguished by their unique fragments aer collision induced dissociation (CID) in the ion trap. This was carried out at a normalised collision energy of 25 eV aer selected ion monitoring (SIM) of m/z 368.5-369.5.
In one experiment, chromatographic fractions were collected manually aer UV-Vis detection, and fractions (1.5 minutes ¼ 1.5 mL) were dried by vacuum centrifuge (Eppendorf Concentrator Plus) and redissolved in 100 mL 50 : 50 water : methanol. They were then reinjected onto the column at an 80 mL injection volume. This was done to examine potential redistribution of aggregates when size fractions were isolated in solution.
Molecular peaks are later discussed in terms of their polydispersity (PD), which is calculated as: where the weight averaged molecular weight (M w ) and number averaged molecular weight (M n ) are calculated from the mass spectrum peak intensity (I k ) and HPSEC estimated peak mass (M k ):

Development of a HPSEC method for DOM
Coupling HR-ESI-MS to HPSEC imposes important restrictions on the mobile phase used in the chromatography, in that it has to be volatile and high ionic strength should be avoided. Three types of standard were used to evaluate size exclusion behaviour. PSS was used, as in most recent literature, as a charged polymer that is similar in hydrodynamic size to natural humic substances, 17,31,32,37 but the validity of PSS standards or any other model compound in representing the HPSEC retention of DOM must be questioned, as DOM is so chemically diverse. 17,31,32,37 PPG is uncharged and was used to compare the behaviour of analytes that would not be subject to charge exclusion, and three model compounds with identical mass but differing charge in solution were analysed to conrm the differences found in the two polymer series. We used the PSS polymer series to test combinations of NH 4 Ac in water and methanol with measured pH > 6, similar to most natural aquatic systems with low DOC. The highest molecular weight standard, 218 400 Da, was always found at $4.9 minutes, which we suppose is near to the interstitial volume (V 0 ) of the column. Linearity of response in log(MW) vs. time was only achieved up to the 37 500 Da standard, making this the maximum estimated PSS-equivalent mass. Later, it will be shown that most DOM was retained within the linear range, but some DOM can elute at V 0 , making it an unknown size higher than 37.5 kDa in PSS equivalence (i.e. taking charge repulsion into account, as explained below). The total volume of mobile phase in the column was not measured reliably (e.g. with D 2 O 32 ), and the retention time of acetone (58 Da) varied depending on the extent of hydrophobic interaction imposed by the mobile phase conditions ( Fig. 1; see discussion later), meaning that linearity between log(MW) and retention time always broke down at the low mass end. In aqueous phase, without addition of methanol, calibration curves of log(MW) vs. time using PSS were not linear across the entire range, and instead attened signicantly from the 1440 standard to the 206 Da standard and acetone (Fig. 1a). This non-linearity has also been documented using ammonium bicarbonate solutions as the mobile phase. 2,11 An increase in NH 4 Ac from 10-100 mM improved retention of the PSS standards due to decreased inuence of ionic exclusion, which greatly affects charged molecules in SEC 19,32,38 due to repulsive interactions between charged analytes and the slightly charged stationary phase. 39 The increase in NH 4 Ac in pure aqueous solvent did not improve the attening of the calibration at low masses. The adjusted r 2 of the slopes using all 5 standards up to 37.5 kDa was 0.93-0.98. At the lowest NH 4 Ac concentration, no retention was found for the standards >14 800 Da because of ion exclusion, and retention of the 37 500 Da standard began at 25 mM NH 4 Ac, similarly to the previous observations. 32,40 An increase in methanol notably improved linearity by decreasing the retention time of the 206 Da standard and acetone, while not greatly affecting the higher molecular weight standards ( Fig. 1b and c). This suggests that the longer time in the column of the low molecular weight standards allows a greater toll to be taken by secondary hydrophobic interaction effects on the stationary phase in aqueous solution. The attening of the calibration curve away from linearity due to hydrophobic retention is not desirable as it leads to an incorrect assessment of the molecular size of small, hydrophobic constituents, and unpredictable and variable chromatographic resolution at the crucial mass range of 200-2000. 11 An addition of 10% methanol made a large improvement to linearity by decreasing the retention of the smallest standards (adjusted r 2 0.96 compared with 0.93; Fig. 1c), and 20% methanol made the calibration nearly linear (adjusted r 2 0.99) across the range 206-37 500 Da. The methanol also improves the quality of spray observed with ESI, being more volatile and decreasing surface tension, leading to a better signal to noise in the MS detection. We also noticed at the highest NH 4 Ac concentration, and particularly at high sample concentrations of DOM, that organic matter began to appear more hydrophobic (longer retention times, data not shown). This may be due to decreased solubility of the DOM in the high ionic strength solution and resulting non-linear chromatography behaviour where the phase equilibria are not linear. 41 In other words, the concentrations of components in the stationary phase at equilibrium are no longer proportional to the concentrations in the mobile phase. This effect was not investigated further, but was not observed for NH 4 Ac concentrations # 50 mM.
The PPG standards retained much longer than PSS, as expected 32 (Fig. 1b), and the model compounds also varied in retention according to charge density 37 (Fig. 2). These compounds vary in charge density in pH 6.6 solution from 0-4 charges (Fig. SI4 †), and represent the range of charge density expected for natural DOM. The neutral compound, fraxin, was retained longer than acetone when methanol was not added to the mobile phase, indicating hydrophobic retention on the column, and was generally comparable with PPG. The least well retained was the malonic acid derivative, due to its high charge density, leading to an overestimated molecular weight based on the calibration using the PSS standards. The isoferulic acid glucuronide (two carboxylic acid groups) was the most comparable in retention to its equivalent mass in the PSS calibration curve. Some variability in retention time may also be explained by differing hydrodynamic volume, but charge density likely explains the major differences found.
The large range in predicted molecular weights for these three isomers demonstrates the limitations of HPSEC for complex mixtures (Fig. 2). Adding 20% methanol to the mobile phase improved the consistency of the predicted molecular weight across different NH 4 Ac concentrations, and with 20% methanol, the malonic acid derivative never appeared as more than double its true weight. It remains possible that the compound dimerised in solution and was truly present at $740 Da, but more likely is that charge repulsion was higher for this compound than the PSS standards it is being compared with. It is likely that DOM contains a high occurrence of carboxylic acid groups, according to its reactive chemistry 42,43 and fragmentation patterns when studied by MS, 6,44 but a higher density of carboxylic acids than the malonic acid derivative compound is unlikely. We therefore take comfort that its predicted weight is only two-fold higher than its true value. Also concerning was fraxin, which was retained for far too long compared with the equivalent retention time from the PSS calibration curve, and likely represents all neutral compounds in this sense. Addition of 20% methanol improved the estimated MW for fraxin, and seemed to stabilise the response with respect to NH 4 Ac concentration for the two carboxylic acids (Fig. 2). At 0% methanol, increasing NH 4 Ac concentration increased the estimated MW of the carboxylic acids. It might be considered that this high NH 4 Ac concentration promoted self-aggregation of these standards, 45 but this possibility is difficult to test, as only the monomer mass was detected by ESI-MS.
The nal selection of mobile phase has many implications for this study. It is important to have effective retention and resolution by HPSEC, meaning that the NH 4 Ac concentration must be at least 25 mM and contain at least 10% methanol. The ESI spray efficiency is poor when the methanol content is too low or the salt content is too high, so this favours keeping the salt concentration as low as possible. Sample loading was tested from 2.5-50 mg with no variability in UV retention prole observed (Fig. SI1 †), with higher loading giving a better signal and a higher number of assigned peaks by MS. Additionally, the study is improved by making the pH, salt and organic solvent conditions as similar to those of environmental samples as possible, so that the speciation of the DOM by HPSEC is comparable. Buffer capacity and ionic strength change dramatically across landscapes, particularly from terrestrial to marine waters. Marine conditions (0.5 M NaCl) are permissible for HPSEC but incompatible with ESI-MS, so the system was optimised for terrestrial samples of low salinity and buffer strength, usually below 3 meq. Overall, the optimal mobile phase for HPSEC-ESI-MS was chosen as 25 mM NH 4 Ac in 20% methanol (pH measured at 6.6), as a compromise between effective masking of ionic repulsion effects, limiting hydrophobic interaction and salting out of compounds from solution, optimising spray quality for ESI whilst keeping the salt and alcohol concentration low enough to mimic environmental conditions.
Size distribution of DOM samples according to HPSEC-UV-Vis DAD and HPSEC-HESI-Orbitrap-MS Suwannee river fulvic acid. Fulvic acids are operationally dened as being soluble in both acid and base. The Suwannee River Fulvic Acid (SRFA) reference sample is a complex mixture of these versatile molecules extracted from river water. The retention prole of SRFA in 25 mM NH 4 Ac with 20% methanol, measured by HPSEC-UV, is broad, beginning at 6 minutes and ending before 12 minutes, just past the apex of an acetone standard. The bulk of the material elutes between 7-10 minutes with an apex at 8.87 minutes, corresponding to 1331 Da in PSS retention equivalence (Fig. 3a), which is similar to the lower MW range of previous reports. 2,37,40,46,47 Note that the PSS equivalence is only accurate for DOM with similar charge density to PSS, and likely overestimates the MW of DOM with many carboxylic acid groups and underestimates the MW of neutral species, as discussed above (Fig. 2).
The signal from ESI-MS detection was rather different. 48 The total ion count showed two broad, unresolved peaks from 7-9 minutes and 9-12.5 minutes, with apexes at 7.7 and 9.8 minutes, or 3572 and 608 Da PSS. A dot plot (Fig. 3b) of all detected ions m/z 150-2000 shows a broad signal of material with m/z 800-2000 at 7-9 minutes, then starting from 9 minutes, m/z 150-800 were detected with higher intensity. Generally, each m/z in the range 150-800 was detected over a broad retention time range from 9-12 minutes, and all assignments were singly charged, as is typical for DOM. 49 High resolution mass spectrometry allows accurate formula assignment of the material eluting from HPSEC. The resolution and mass accuracy of the Orbitrap instrument dictates the mass range over which formulas can be assigned (up to about m/z 800). Although this range is limited in the context of this study, multiply charged compounds with higher mass could theoretically be detected and assigned a formula, and ions with m/z up to 2000 can be detected, if not assigned. Fig. 3a shows the chromatogram of the summed intensity of all assigned formulas (total assigned current; TAC), and assigned current was only found in the second hump of material, unlike in some previous results, which did not compare MS assignments to UV data. 29 The high molecular weight material (apex 7.5 minutes, Fig. 3a) with strong UV absorbance was not constituted by aggregated monomers (i.e. clusters) that could be dissociated during ESI or with in-source CID (Fig. SI2 †), unlike previous results using variable cone voltage (similar to insource CID) on a triple quadrupole MS. 48 Instead, this data supports the alternative hypothesis that there is genuinely high molecular mass material in the sample that has strong UV absorbance and cannot be dissociated and ionised (either as monomers or multiply charged ions) in the range 200-800 Da. It remains a possibility that the measured compounds aggregate in solution in natural waters, 48 but our data provides no evidence that these aggregates had low retention volumes (high hydrodynamic size) by our HPSEC method.
An extracted ion current of any individual formula shows that molecular peaks can appear at HPSEC retention times higher than predicted for the measured mass. This is likely due to charge density for ESI ionisable carboxylic acids, as discussed above (Fig. 2). The chromatographic peak width can also be greater than the PSS standards, indicating an isomeric diversity in HPSEC response (Fig. 4), similarly to other separation methods. 8,9 The variability in chromatographic elution time (i.e. number averaged molecular weight, M n ) of molecular peaks in SRFA is almost certainly due to variation in charge density, as shown for model compounds in Fig. 2 and for two peaks with similar mass and differing charge density (assumed from the oxygen to carbon ratio (O/C)) in Fig. 4. Inspection of all formulas in a van Krevelen diagram or m/z vs. oxygen number diagram shows that charge density (equivalent to oxygen density for unsaturated humic compounds) explains as much variation in obtained peak apex as measured m/z, as manifested by the increase in apparent MW with oxygen for any measured m/z region (Fig. 5). The polydispersity (apparent range in molecular weight) was generally low (<1.5), particularly for molecular formulas with high charge density and high apparent mass. The most disperse peaks (with broader retention proles) were mainly low oxygen number peaks with presumably more hydrophobic character and more susceptibility to hydrophobic retention on the column, leading to peak tailing.
Elution fractions were collected, dried down, re-dissolved in the mobile phase and re-injected, in order to investigate whether the high MW material that elutes rst in HPSEC originates from monomers that could aggregate, or the disaggregation of larger clusters when alone in solution. 50 Fractions were taken at 4-5.5 minutes (blank), 6-7.5 minutes (emerging UV, no MS signal), 7.5-9 minutes (strong UV, weak MS), 9-10.5 minutes (strong UV, strong MS), 10.5-12 minutes (weak UV, strong MS) and 12-13.5 minutes (no UV, weak MS). The fractions are depicted in Fig. 6 as UV signals (a) and MS signals (b). In panels (c) and (d), the signals for the re-injected fractions are shown, along with the summed, reconstituted signal. In general, the molecular distribution in each fraction remained in the size range that it was collected from, 51 especially when considering the poor chromatographic resolution of HPSEC. Certainly; it was not found that fraction 3 (orange) reformed some signal in fraction 1 (sky blue). However, fraction 1 was diminished in magnitude and appeared at lower apparent mass when re-injected, resulting in a slight decrease in the reconstituted total signal at higher apparent masses. This might indicate that some of the higher m/z values present in SRFA are supported by self-aggregation due to some form of solubility  limit of the slightly smaller material in fraction 2. 27,45 The MS prole was almost perfectly maintained in the separated fractions, and no signal was drawn from fraction 1 in isolation, contrary to previous results using low resolution MS, 50 and similarly to other previous results that found collected fractions to be stable in retention behaviour over several weeks. 51 This provides further support for the hypothesis that the molecular size distribution of riverine DOM is due to a large range in hydrodynamic sizes, and not dynamic, weak aggregation of small  molecules. Overall, our investigations of this complex natural mixture suggest that there is limited aggregation of ionisable molecules (carboxylic acids) in this mobile phase, and variation in retention time for masses detected by ESI-MS can be explained by variation in charge density and hydrophobicity. These nonaggregated ions have recently been shown to have a small cross-sectional area (<250Å 2 ) using ion mobility separation following ESI. 15 There is strong evidence for a second pool of compounds, unrelated to the carboxylic acids, which strongly absorb UV light and have a very weak MS signal at m/z from 1000 to an unknown range higher than 2000. Terrestrial sample comparison. Five samples were compared using the established HPSEC-DAD-ESI-MS method. The samples selected were SRFA, a soil humic acid (HA) extract, an acidied aliquot of the soil HA (fulvic acid (soil FA)), a reverse osmosis (RO) concentrate from a stream draining a wetland, and a cold water extract of birch leaves. The MW proles in UV-DAD and ESI-MS according to HPSEC are shown in Fig. 7, revealing a wide range in retention behaviour in each and between samples. In these plots, the retention time has been converted to HPSEC MW, inverting the proles. The wetland stream RO sample was similar to SRFA, but contained a small amount of humic material close to the interstitial volume (V 0 ) of the column. The soil humic acid had a fairly large UV peak at V 0 ; this was removed by precipitation aer acidication to pH 2, and can be attributed to traditionally dened 'humic acids'. Some low molecular weight hydrophobic constituents that were detected using MS were also removed in this process, possibly by co-precipitation, leaving a very narrow HPSEC MS peak in the soil FA sample. According to our results from SRFA, the material that remains as 'fulvic acids' aer precipitation is likely to be dominated by carboxylic acid rich molecules with high charge density. The leaf extract had a wide range in UV absorptivity, and the lowest number averaged molecular weight (M n ) in MS signal of the ve samples. The MS signal was centred at an apparent mass similar to acetone, signifying material with much lower charge density, and presumably lower ionisation efficiency.
HPSEC fractions of these samples were isolated in silico in the same time windows as described in Fig. 6, and the assigned MS peaks are presented in van Krevelen diagrams in Fig. 8. The river/soil samples were broadly similar in extent across the van Krevelen space, although the SRFA and Stream RO sample had considerably more saturated compounds with H/C > 1.0 and O/C < 0.5 than the soil extracts. This also accounts for the lower abundance of material in the soil extracts in fractions 4-5, where more saturated compounds tend to elute due to their lower charge density (Fig. 5). The leaf extract was very different from the other samples, containing many more labile biomolecules 52 with H/C > 1.5, including likely sugar and glycoside molecules with high oxygen and hydrogen saturation. These eluted across fractions 3-5.

Further discussion
Discrepancy between UV and MS results and the 'true' size distribution of DOM The diverse chemistry of DOM has an important inuence on its detection and characterisation by optical properties and mass spectrometry, and this study has demonstrated that there is a clear distinction in natural terrestrial mixtures between what is optically active and what is efficiently ionised by electrospray ionisation, with only moderate overlap between the two pools. These two pools are not separable with reversed-phase chromatography. 9,35 The compounds that have high light attenuation and low MS signal are not considered in mass spectrometry studies, whereas they probably dominate the results in studies that use optical methods for the characterisation of DOM. The size range difference between optically active and ionisable DOM probably explains why solid phase extraction, which is poor at extracting high molecular weight components, 53,54 has a minimal effect on ESI-MS results, 54 but a large effect on absorption and uorescence spectroscopy. 55 Great care should therefore be taken in studies that aim to link the optical and MS character of terrestrial DOM [56][57][58][59][60] or the concentration of DOC and MS signal, 61 as the results being compared are not necessarily representing the same pools of carbon, 47,48,62 and relationships are likely to be coincidental, albeit related. This larger molecular weight pool, which we refer to as phenolic compounds (Fig. 9), may be the coloured constituents that are most rapidly lost from terrestrial systems along transport to the sea, 63-65 as well as the easiest components to remove during groundwater transport, surface sorption and drinking water treatment. 66,67 This reactivity makes their characterisation highly important, and high resolution ESI-MS appears to be unsuitable for this particular task. Other ionisation techniques, such as atmospheric pressure photoionisation (APPI) and (matrix-assisted) laser desorption ionisation (MA)LDI, and different types of mass spectrometer with higher mass ranges may be useful in future work characterising this 'phenolic' material, but it is unlikely that one technique will be able to cover the full required analytical window due to the specicity and bias of each ionisation source. 3,68,69 Fig. 8 van Krevelen diagrams depicting H/C and O/C ratios of molecular formulas assigned in each in silico fraction for each sample. Essentially no formulas were assigned in the UV-active fraction 1 (not shown). The point size is proportional to the normalised intensity (all points sum to 1 Â 10 6 ). The assigned mass is shown in colour, and the apex mass according to HPSEC is indicated above the plots. As discussed, the apex mass is mainly related to charge and hydrophobicity, and not actual mass or size.
The true size distribution of DOM remains elusive using HPSEC due to the important secondary retention and exclusion effects that disrupt the technique. However, knowledge of how model compounds behave, along with information from UV and MS detection, allow us to form a better picture of the likely size distribution of DOM in natural systems based on our results (Fig. 9). We divide Fig. 9 Conceptual figure summarising our results and conclusions. Top: calibration and apparent mass is highly affected by charge (Fig. 2), meaning a conceptual adjustment needs to be made when considering different analyte types. Bottom: DOM can be grouped into three main classes: large phenolic compounds (top), carboxylic acids (second), biomolecules (third) and the sum of all three (last). The true size distribution of each is plotted in red, and the obtained distribution by UV detection and MS detection is shown in blue and black, respectively. The amplitude of the distributions is affected by the response factor of the detector for that type of molecule, and the size distribution is affected by secondary effects on the column, depending mainly on charge. The direction and extent of bias for each class depends on the polymer used for calibration (in our case, PSS). V 0 is indicated with a vertical dotted line.

Paper Faraday Discussions
This journal is © The Royal Society of Chemistry 2019 the DOM material into three classes, which we loosely name 'phenolic', 'carboxylic' and 'small biomolecules'. These classes clearly have some overlaps, and are meant here only as a guide. We note that previous work using HPSEC coupled to NMR also determined three conceptual classes, in that case named 'carbohydrate + aromatic', 'carboxylic rich alicyclic' and 'linear terpenoids', respectively. 17 How useful is chemical separation prior to detection in non-targeted complex mixture analysis?
Recent studies have shown that adding a separation to complex mixture analysis brings a new dimension of complexity to the data analysis, oen without achieving the goal of separating individual isomers from the mixture. 7,9,11,17,70,71 This may cast doubt on whether the added complexity and time are worthwhile, or simply complicate results that can already be taken from broadband analysis. We feel that the robust chemical information given by separation, especially coupled online to detection, is reason enough for its use. Reversed-phase chromatography can determine reliable solubility partitioning coefficients for the unknown analytes, 72 and chromatography can separate functional classes from complex mixtures with implications for characterisation via NMR 17 uorescence spectroscopy, 10,73 FT-infrared spectroscopy 2 and mass spectrometry. 8,9,29 Additionally, recent advances have shown that isomeric complexity can be fully resolved in certain cases when methods add enough dimensionality to the separations. 8,16,28 Unfortunately, there is no clear path towards one method that gives information about every single complex dimension of DOM. In terms of separation, HPSEC seems to be useful for broad analysis of intermolecular speciation and size, while reversed-phase and hydrophilic interaction chromatography are important for functional 'smearing' of material into broad classes, and certainly give clearer information about functionality than HPSEC. Other separation techniques, both old and new, 74 may become useful depending on the particular research question. It may be useful to combine orthogonal chromatographic methods into 2D separations, but this is likely to involve too much complication for the wider community to embrace as a standard technique.
This study makes it clear that the choice of the analysis can have important consequences for the nature and completeness of data obtained. We recently analysed 74 headwater stream samples without prior solid phase extraction using a reversed-phase-high resolution MS technique, 35 and separated material into three chromatographic fractions. Each fraction had a UV and MS response, but the high molecular weight, UV rich material with no MS response was hidden among the three fractions, and was not revealed in that study (Fig. SI3 †). The analytical window and inherent biases of any chosen method must be considered in any study of complex mixtures, and the best tools moving forward will include various analytical windows in order to validate assumptions and to thoroughly explore correlations and dependencies within the data. 48,60 Pitfalls to be aware of include limited analytical windows in ionisation and detection methods, interactions between DOM and the stationary phase during chromatography and other separations/extractions, and molecules occurring as aggregates and other intermolecular effects, either naturally or during chromatography and analysis. Considering the diversity of DOM and the different rates at which different fractions of it react in natural waters, 75,76 a full coverage of the molecular diversity of DOM is needed in order to understand its biogeochemistry. Our results support the view that an accurate understanding of DOM and its behaviour in natural environments requires a battery of different techniques, including methods that allow characterisation of molecules larger than those detectable by the currently common ESI-FTICRMS and ESI-Orbitrap MS techniques.

Conflicts of interest
There are no conicts to declare.