Sequence-dependent quenching of fluorescein fluorescence on single-stranded and double-stranded DNA

Fluorescein is commonly used to label macromolecules, particularly proteins and nucleic acids, but its fluorescence is known to be strongly dependent on its direct chemical environment. In the case of fluorescein-labeled nucleic acids, nucleobase-specific quenching originating in photoinduced charge transfer interactions results in sequence-dependent chemical environments. The resulting sequence specificity of fluorescent intensities can be used as a proximity detection tool, but can also lead to biases when the abundance of labeled nucleic acids is quantified by fluorescence intensity. Here we comprehensively survey how DNA sequences affect fluorescence intensity by preparing permutational libraries containing all possible 5mer contexts of both single-stranded and double-stranded DNA 3′ or 5′ end labeled with fluorescein (6-carboxyfluorescein, FAM). We observe the expected large quenching of fluorescence with guanine proximity but also find more complex fluorescence intensity changes depending on sequence contexts involving proximity to all four nucleobases. A terminal T (T > A ≈ C ≫ G) in both 3′ and 5′ labeled single strands results in the strongest fluorescence signal and it changes to a terminal C (C ≫ T > A ≫ G) in double-stranded DNA. Therefore, in dsDNA, the terminal G·C base pair largely controls the intensity of fluorescence emission depending on which of these two nucleotides the dye is attached to. Our data confirms the importance of guanine in fluorescence quenching while pointing towards an additional mechanism beyond the redox potential of DNA bases in modulating fluorescein intensity in both single and double stranded DNA. This study should help in designing better nucleic acid probes that can take sequence-dependent quenching effects into account.


Introduction
Fluorescein is a ubiquitous dye in labeling, imaging and tracing applications. Historically it is one of the very rst uorophores to enter widespread use thanks to a combination of high molar absorptivity in the visible region (>75 000 M À1 cm À1 ), very large quantum yield (f ¼ 0.92), 1 good photostability and solubility in aqueous media. Fluorescein was used in general staining approaches before becoming a macromolecular labeling method, allowing the tracking and quantication of proteins and nucleic acids. Structural derivatives of uorescein are commonly used as reversible uorophore tags on nucleoside triphosphates, a pivotal aspect of next-generation sequencing and in vitro DNA polymerization. [2][3][4] Fluorescein labeling can be conveniently carried out with an isothiocyanate functional group (FITC) or, in DNA and RNA labeling, during solid-phase synthesis by coupling a phosphoramidite version of uorescein (6-carboxyuorescein, 6-FAM). The uorescence properties of uorescein vary according to changes in the environment and it is most notably sensitive to pH variations, with the highest absorption of 490 nm light at pH > 7 and a progressive decrease in uorescence response with decreasing pH, which can be understood by the opening of the spirolactone function attached to the xanthene moiety. 5 Because of its pH-dependent behavior, uorescein has also found applications in the monitoring of very ne intracellular pH changes. 6 And while the uorescence properties of free uorescein have been extensively studied, much less work has been devoted to studying uorescein in the context of dye-labeled molecules, specically how the local chemical environment can affect the uorescence response. This potential modulation of absorption and emission properties of uorescence is important to consider and particularly relevant when emission can be correlated to concentration, in nucleic acid quantication and sequencing, but also in more complex photophysical systems based on Förster resonance energy transfer (FRET). 7 Indeed, uorescein is a common uorophore in FRET pairs, but its donor and acceptor properties are sensitive to the nucleic acid environment. The change from single-stranded to doublestranded DNA was found to result in a $1/3 decrease in uorescence intensity 8,9 and quantum yield is reduced when labeling occurs at the 3 0 end. [10][11][12] Sequence-dependent uorescence intensity in oligonucleotides is a well-documented phenomenon affecting most chromophores, although the specic mechanisms are varied. In the case of quenching via photoinduced electron transfer from nucleobases, proximity to guanine, as the most oxidizable nucleobase, is the largest contributor. Such quenching is also observed in many commonly used chromophores such as coumarin, 13 porphyrin 14 and rhodamine 15 derivatives, as well as others. 16,17 Further distal guanine bases also contribute to quenching but to a lesser extent. 18 Guanosine-mediated uorescence quenching proceeds via photoinduced electron transfer (PET) between the uorophore and a proximal electron-donor guanine. Since PET efficiency correlates with redox potential at the donor/acceptor level, PET-quenching of uorescence should follow the order of increasing nucleobase redox potential, 19 dG ( dA < dT z dC, but there is limited data availability on the sequence-specic modulation of uorescence and any study would likely need to distinguish between 5 0 and 3 0 labeling, and between singlestranded and double-stranded systems.
We previously investigated the sequence dependence of cyanine dyes in all possible 5mer ss-and dsDNA contexts and revealed how nucleobase identity further away from the dye also affects Cy3/Cy5 uorescence intensity, as well as that of the structurally similar DyLight DY547 and DyLight DY647. [20][21][22] For uorescein, the modulation of uorescence by neighboring bases is expected to signicantly differ from cyanine dyes, as a p-stacking contribution to uorescein's interaction with DNA has not been previously documented. Indeed, uorescence anisotropy measurements show that rotational motion of the uorescein dye is decoupled from that of the labeled DNA, indicating that uorescein rotates independently from the nucleic acid molecule. 23 This effect could be seen as the consequence of the opening of the spirolactone ring at physiological pH, creating not only freedom of rotation about the xanthene-phenolic system but a negatively charged carboxylate as well, which creates a source of electrostatic repulsion with nearby phosphodiester groups, 24 in clear contrast to positivelycharged cyanine uorophores.
Herein, we explore how the uorescence properties of uorescein can be affected by ve consecutive DNA nucleotides beginning immediately proximal to the dye, by synthesizing all possible sequence permutations (4 5 , or 1024 unique pentanucleotides) in all terminal labeling conditions, that is 3 0 , 5 0 , ssDNA and dsDNA formats. To do so, we synthesizedusing nucleic acid photolithographythe DNA oligonucleotides in both 3 0 / 5 0 and 5 0 / 3 0 direction with a nal, terminal uorescein coupling using the 6-uorescein phosphoramidite . 25 Changes in uorescence across the surface of the nucleic acid array inform on which nucleobase or ordered combination of nucleobases has the strongest effect on uorescein emission. As expected, we found that proximal G and G-rich sequences at both the 5 0 and 3 0 ends of oligonucleotides strongly predict uorescence quenching. However, the sequence-dependent uorescence cannot be fully explained by the nucleobase oxidation potential dG ( dA < dT z dC. Instead, we measure FAM uorescence quenching following the order G [ C z A [ T for 5 0 labeled single-stranded DNA, G [ C z A > T for 3 0 labeled single-stranded DNA, and G [ A z T [ C for double-stranded DNA. Our data suggest that a redox mechanism alone is insufficient to explain uorescein uorescence quenching in DNA. These results should provide comprehensive guidance for better uorescein nucleic acid probes that can take sequence-specic variations into account.

Sequence design
A complete series of base permutations spanning 5 consecutive nucleotides was generated, producing 1024 (4 5 ) unique combinations which were installed immediately adjacent to the uorescein at either the 5 0 or 3 0 end of oligonucleotides (Fig. 1). The rest of the oligonucleotides were composed of a simple dT 15 linker to the glass slide surface. The single-stranded DNA library was thus in the form: 5 0 -FAM-P 1 P 2 P 3 P 4 P 5 -TTTTTTTTTTTTTTTTTT-(slide) or 3 0 -FAM-P 1 P 2 P 3 P 4 P 5 -TTTTTTTTTTTTTTTTTT-(slide). The double-stranded DNA library design was based on our previous work on Cy3 and Cy5 sequence-dependence in ds DNA, 22 and consisted of a hairpinforming 36 nt strand with a permuted 5 bp long section at the Each library consists of a complete base permutation set over 5 consecutive nucleotides immediately adjacent to the dye (P 1 to P 5 , 4 5 or 1024 combinations). The identity of the nucleotides N in the dsDNA libraries corresponds to the bases absent in the permutation set P. For instance, if the P 1 $P 1c base pair is A$T, the N 1 $N 1c base pair is G$C. (C) Structure of the fluorescein phosphoramidite (6-FAM) used to label oligonucleotides. (D) Excerpt of an array scan at 488 nm of a 5 0 -fluorescein labeled single-stranded DNA library, revealing a large range of fluorescence intensities. Single-stranded oligonucleotide features are randomly distributed throughout the array surface, separated by non-synthesized space serving as background reference. Excerpt is $2% of the total synthesis area. 5 0 end. Close to the hairpin loop (TCCT), a 5 bp-long section is assembled with nucleotides absent from the permutated area, so that each hairpin sequence ultimately contains the same amount of A, C, G and T bases and thus all have approximately equal melting temperatures. An additional core section of GC-rich base pairs in the hairpin ensures a high melting temperature. All hairpins are synthesized on a 20 nt long dT linker. In linear format, the self-annealing structures can be written as 5 0 -FAM-P 1 P 2 P 3 P 4 P 5 -CCGGCCGCC-N 1 N 2 N 3 N 4 N 5 -TCCT-N 5c N 4c N 3c N 2c N 1c -GGCGGCCGG-P 5c P 4c P 3c P 2c P 1c -TTTTTTT TTTTTTTTTTTTTTTT-(slide), where P i refers to the permutations, P ic to their complements, and N i and N ic to the complementary section used to homogenize nucleobase content while being too distal from the uorescein to affect sequencedependent uorescence. Thus, each G$C base pair in the P section has a corresponding A$C base pair in the N section, so that the stem of each hairpin contains 5 G$C and 5 A$T base pairs, regardless of permutation.

Microarray synthesis
Microarray synthesis of DNA oligonucleotides by photolithography proceeded according to procedures described elsewhere and which have incorporated the most recent technical improvements we have brought to the methodology. [25][26][27][28][29][30]50 Briey, Maskless Array Synthesis (MAS) proceeds with 365 nm UV light deprotecting the 5 0 or 3 0 -BzNPPOC photosensitive protecting group (for 3 0 / 5 0 or 5 0 / 3 0 synthesis directionality, respectively) at selected locations across the surface of a functionalized glass slide (Schott Nexterion D). Selective photodeprotection is achieved by using an optical relay to image a Digital Micromirror Device (DMD, Texas Instruments) onto the surface of a glass slide forming the entrance window to a photochemical reaction chamber. Mirrors tilted to the ON position result in UV light illuminating a micromirror-sized feature (14 Â 14 mm) on the glass surface. Photodeprotection is promoted by the use of a 1% solution of imidazole in DMSO as an exposure solvent. Forward and reverse phosphoramidites (Orgentis) were exposed for 3 J cm À2 and coupled for 15 s (forward) and 60 s (reverse). To account for the lower coupling efficiency of G, its coupling time was set to 30 s (forward) and 120 s (reverse), so that all phosphoramidites would achieve a nucleobase-independent coupling efficiency >99%. The uorescein phosphoramidite (LinkTech) was coupled twice consecutively for 5 min each time at 50 mM concentration in acetonitrile (vs. 30 mM for the DNA phosphoramidites). The DNA synthesis reagents were pumped to the synthesis surface with an Expedite 8909 automated synthesizer (PerSeptive Biosystems) using the BzNPPOC phosphoramidites and the exposure solvent but otherwise using the reagents and solvents conventionally used in phosphoramidite chemistry. Aer each coupling, an additional coupling with a standard DMTr-dT phosphoramidite served to cap BzNPPOC phosphoramidite coupling failures since no reagent in nucleic acid photolithography can unblock a 4,4 0dimethoxytrityl (DMTr) group. This ensures that uorescein labeling can only occur on full-length oligonucleotides, and together with the high and consistent coupling efficiency, also results in a highly uniform surface density of labeled sequences. Aer synthesis, the microarrays were washed in acetonitrile for 2 hours at room temperature (rt) to remove unbound uorescein. DNA arrays were then deprotected in 1 : 1 ethylenediamine/ ethanol for 2 hours at rt (12 h for 5 0 / 3 0 synthesized DNA), washed with distilled water twice and then with 50 mM phosphate buffer at pH 7.6 (PBS) before being spun dry. The dsDNA libraries were self-annealed by heating the microarray in PBS buffer at 50 C and slowly cooling it down to rt. Aer 30 min, the array was briey washed in 1Â sodium citrate buffer then spun dry.

Data extraction and analysis
Aer drying, the microarrays were scanned in a GenePix 4400A scanner at 2.5 mm resolution with a PMT voltage set at 440 V for both ss and dsDNA libraries. Fluorescein was excited at 488 nm using a built-in solid-state laser and uorescein emission was collected through a 525 nm bandpass lter. Fluorescence intensity data was extracted from the scan image using NimbleScan 2.1 soware (NimbleGen) and data processing was carried out using Excel. The uorescence intensity values were calculated as an average of all 7 replicates of each sequence randomly distributed across the surface of the same microarray. The intensities were then corrected for background uorescence for each permutation by subtracting the uorescence of the corresponding non-labeled oligonucleotide sequence. Error was calculated as standard error the mean. The reported data is an average over three independent replicates. The consensus sequences were obtained by rst ranking the uorescence intensity of the 1024 sequences and dividing the range into 8 bins of equal intensity span. The sequences in each bin were fed into a sequence logo generator (Weblogo, http://weblogo.berkeley.edu/) 31 and the corresponding consensus sequences were arranged together from high to low uorescence to illustrate the changes in the sequence dependence of the uorescence properties of uorescein. The uorescence intensity data for all sequences is available in spreadsheet formal as ESI. †

Results and discussion
The parallel synthesis of all 1024 combinations on the same surface is the ideal approach to study sequence effects on uorescence intensity in a systematic and reproducible manner. [20][21][22] Coupling efficiency is very high (>99%) across all four DNA phosphoramidites and is independent of the identity of previously incorporated nucleotides, as veried via sequencing. 29 This means that terminal labeling should be equally efficient for all sequence combinations and, thanks to efficient capping chemistry, 32 labeling is prevented on sequences with failed couplings. The effect of distance to the surface on the uorescent intensity of dyes is well documented 33 but our sequence design ensures that uorescein is spaced away from the glass surface by the exact same distance for all combinations (Fig. 1A and B). In addition, since all sequences are synthesized in parallel, synthesis yield and oligonucleotide density are homogenous throughout the entire array surface. The mean distance between DNA molecules on the surface, based on the initial surface density of hydroxyl groups, is also sufficiently large (450 nm) to prevent intermolecular interactions. 34 Altogether, our fabrication design aims to guarantee a uniform assembly and density distribution of all library elements, allowing relative uorescence intensity to be compared between sequences. These advantages stand in contrast to solution-based studies, which, in addition to far lower throughput, are likely unable to accurately quantify sequence-dependent relative intensity between different sequences. This is because potential sequence-dependent spectral absorption overlap between uorescein and DNA prevents an accurate quantitation of DNA concentration, and hence relative uorescence intensity between the different sequences. Beyond their use in quantifying the sequence dependence of labeling dyes, the very high sequence density along with the highly uniform molecular surface density of in situ synthesized nucleic acid arrays has also enabled a variety of ultra-high throughput uorescence based quantitative assays in molecular biology. [35][36][37][38][39][40][41][42] We rst looked at how the uorescence intensities of uorescein are modulated by the sequence context immediately adjacent to the dye, initially in single-stranded format, and then when the dye is attached to double-stranded DNA. We nd that sequences interact differently with uorescein, creating a large range of uorescence intensities across the 1024 different combinations. In both the 5 0 -and 3 0 -labeled oligonucleotide series, the distribution of uorescence intensities adopt a sigmoidal shape, with up to an almost 55% difference between the brightest and darkest 5 0 -uorescein-labeled sequence combinations and a somewhat smaller difference in the case of 3 0 -uorescein labeled oligonucleotides, a maximum of 45% quenching relative to the brightest sequence (Fig. 2B). This dynamic range of uorescence is in line with our previous observations on Cy3 and Cy5 dyes on similarly complex DNA libraries, [20][21][22] indicating that the extent of uorescence quenching in xanthene-like structures is comparable to that in cyanine derivatives. At the top end of uorescence intensity, we identify the 5mer 5 0 -TTTTT and 3 0 -CTTTC (3 0 -TTTTT being a close third). At the lower end of the uorescence spectrum, we nd 5 0 -GGGGG and 3 0 -GGGGC. Clearly, T-proximal singlestranded DNA sequences minimally quench uorescence while G-rich elements near uorescein lead to the greatest loss of uorescence, largely as expected due to the known mechanism of photoinduced electron transfer between the uorophore and a proximal guanine as electron donor.
To assess the sequence-dependence derived from these very large datasets, we divided the extent of uorescence intensities into octiles of equal intensity ranges and looked for sequence motifs. Sequence logos were generated for each octile and arranged by intensity ( Fig. 2A and C). In both 5 0 and 3 0 labeling, T-rich sequence combinations populate the high uorescence intensities while the G-rich counterparts are very likely to be found in low uorescence data. The top and bottom 1% of uorescence intensity very clearly show the predominance of T and G nucleotides in the extremes of the intensity range. There is a loss of consensus in the middle range of uorescence, Fig. 2 Sequence-dependent variations in the fluorescence intensity of fluorescein-labeled single-stranded oligonucleotides. The relative fluorescence intensity of fluorescein end-labeled 5mers was ranked from most to least intense (highest recorded fluorescence and its corresponding 5mer ¼ 1). For the 5 0 fluorescein labeled the intensity falls by 59% and 3 0 -labeling by 47% (B). The intensity range was divided into 8 equal parts from which consensus sequences were generated and the sequence logos for each octile arranged in descending order of fluorescence intensity (left to right), for 5 0 (A) and 3 0 fluorescein labeling (C).
indicating that the sigmoidal curves can be interpreted as cumulative distribution functions with relative uorescence as the variable. More intuitively, this pattern originates because most sequences from the full permutational library are composed of a mix of nucleobases associated with both high and low uorescence, whereas only a few sequences exist that are primarily composed of these same nucleobases, T and G. Unsurprisingly, and corresponding to the electron transfer mechanism, the nucleotide immediately adjacent to the dye is the most important with regards to modulating uorescence properties, in both 5 0 and 3 0 labeling and the identity of the nucleobases further away from the terminal nucleotide quickly becomes less relevant.
The sequence-dependence of uorescein in single-stranded DNA correlates fairly well with our initial observations with terminal nucleotides alone and generally agrees with the expectation that nucleobase redox potential is the most important physicochemical parameter to consider when studying uorescence quenching in uorescein. This mechanism, however, would predict that both pyrimidines would have least affected the uorescence response, while our data demonstrates a strong preference for thymine only. Cytosinerich combinations ($4 dC) result in similar uorescence to dA-rich combinations, in both cases much lower than dT-rich and signicantly higher than dG-rich sequence combinations. In terms of nucleobase abundance alone, quenching follows the order dG [ dC z dA [ dT for 5 0 labeling and dG [ dC z dA > dT for 3 0 labeling, the former reecting the clear dominance of dT in the set of highly uorescent 5 0 -labeled uorescein DNA conjugates. These patterns follow the dG ( dA < dT z dC order expected from their redox potential mostly for dG and its associated quenching. While the identity of distal nucleotides is less conserved in the brightest and darkest range of uorescence intensity, they do affect the recorded uorescence signal. The 5 0 -TGGGG permutation is amongst the bottom 2% of the uorescence intensity range and the 5 0 -GTAAA is part of the rst octile of uorescence. However, a single T or G inserted ve nucleotides away from the terminal dye poorly inuences quenching, with 5 0 -GGGGT one of the darkest sequence variant and 5 0 -TTTTG in the rst octile of uorescence. These observations indicate that on top of the photoinduced electron transfer taking place at the uorophore-nucleotide level, the neighboring bases can affect uorescence intensity. Single-stranded DNA is a exible molecule with a persistence length on the order of a few nanometers 43 -longer than a 5mer-which therefore presupposes that on the length scale of the permuted sequences in our experiments, there exists a partial order and base stacking. Such base stacking in ssDNA has been observed experimentally 44 and could facilitate charge transport mechanisms between adjacent guanines and through adenine tracts. 45,46 The redox potential of cytosine and thymine is too large to allow participation in any charge transfer mechanism. Conversely, the exibility of ssDNA coupled with the very exible six-carbon linker to the uorescein should also permit direct contact between the uorescein and any of the nucleobases of the permuted 5mer. Since guanosine in each of the ve positions quenches uorescein uorescence, it is clear that some such charge transfer mechanism to distal guanosines is available. Since quenching by distal guanosines is almost entirely absent in the double-stranded DNA data (see below), we can hypothesize that molecular-exibility-enabled direct contact between uorescence and guanosine in any of the ve positions can result in quenching in ssDNA only.
Here too, the intensity of uorescence varies with sequence, with more than 50% uctuation between the brightest and darkest sequence combination (Fig. 3). The brightest sequence is 5 0 -CTACG and the least uorescent is 5 0 -GGGCC. As for single-stranded systems, a dG nucleotide next to the dye almost always decreases the uorescence of uorescein and can be found in more than 80% of all sequences in the 8 th octile of uorescence. Unlike single-stranded oligonucleotides however, bright sequence combinations frequently present dC Fig. 3 Sequence-dependent variations in the fluorescence intensity of fluorescein-labeled double-stranded DNA. The relative fluorescence intensity of fluorescein end-labeled hairpins was ranked from most to least intense (highest recorded fluorescence and its corresponding 5mer ¼ 1), falling down by a maximum of 52% (A). The intensity range was divided into 8 equal parts from which consensus sequences were generated and the sequence logos for each octile (B) arranged in descending order of fluorescence intensity (left to right). at the 5 0 end instead of dT (>2/3 of all sequences in the 1 st octile of uorescence). This observation is more in line with the fairly similar redox potential of pyrimidine bases which, based on this metric alone, should indeed predict that dC and dT both do not quench uorescence intensity. But it is interesting to note that in this context, the nucleotides are base-paired and a dG$dC base pair can drive the uorescence of the labeled hairpin towards the bright or the dark region depending on which heterocycle is in direct proximity to the dye. The oxidation potential of a G$C base pair was calculated to be lower than the oxidation potential of dG alone, 47,48 suggesting that photoinduced electron transfer via oxidation of the neighboring G base would be more facilitated in base-paired systems which might explain the slightly more dominating presence of G in the most quenching dsDNA combinations. Similarly, the oxidation potential of an A$T base pair was also found to be lower than A or T alone, but an A$T base pair at the very end of a dsDNA molecule is more likely to exist as loose nucleobases ("frayed ends"). The fact that a 5 0 -C in a hairpin system can be assumed to be correctly base-paired contrary to a 5 0 -T could by the reason why C appears at the bright end of the intensity spectrum. Furthermore, the importance of the nature of the nal 5 0 nucleotide suggests that the uorescein molecule mostly interacts with the closest covalently-bound nucleotide and does not reach over to the complementary base, nor does it appear to intercalate between base pairs either. The identity of the nucleotides further away from the dye does not substantially affect uorescence intensity, but looking at the ranked list of sequences, the top 5% of uorescence is very C/T rich, while the bottom 5% is mostly G rich. With pyrimidines consistently found in the top section of the intensity spectrum, it appears that stacking energies, greatest for purines, do not contribute signicantly to quenching, even in very rigid double helical structures. The brightest uorescein-labeled ssDNA is here in the 2 nd octile, 20% darker than the top sequence combination.
We also looked at how the sequence context, in the absence of guanine, affects the uorescence intensity of uorescein. The results are shown in Fig. 4. With or without G, the strong uorescence response in single-stranded DNA remain largely dominated by T when in proximity to the dye. Low uorescence G-free sequences are usually populated with A in 3 0 -labeled strands (Fig. 4B), which is in contrast with 5 0 -labeled strands, where low uorescence is equally distributed between C-and Arich DNA (Fig. 4A). The uorescence intensity falls by $40% for 5 0 -uorescein and by $30% for 3 0 -uorescein, indicating that some sequences entirely devoid of guanines can still signicantly quench uorescence, with 5 0 -CACCA and 3 0 -AAATT producing the weakest uorescence in all G-free combinations. Even in the absence of guanine, a clear T / C / A transition is difficult to identify when ranking uorescence intensities from high to low, as the appearance of C in the low uorescence regime is concomitant with the appearance of A. A-rich sequences can therefore tune the uorescence properties of uorescein in single-stranded formats; indeed >50% of all nucleotides in the 5mers that are at least 20% less uorescent than the brightest sequence combination are composed of A. Since T is prominent in the most uorescent ssDNA sequences, the T linker to the surface may contribute to higher uorescence; nevertheless, this would not affect our measured sequence dependence as all ssDNA permutations share this same linker.
In double-stranded DNA, excluding G from the 5mer immediately adjacent to the dye reveals a slightly different picture (Fig. 4C). As was observed in Fig. 3, the nal 5 0nucleotide leading to the strongest uorescence response continues to be C, as opposed to the T seen in single stranded DNA. Interestingly, low uorescence sequence combinations are populated with C as well, but only in the second and third nucleotide position. Along with A, these sequences at the low end of uorescein uorescence produce A/C-rich motifs comparable to those seen in 5 0 -uorescein. The h nucleotide position furthest from the dye, here appears to prefer a T for strong uorescence response. Such a clear nucleotide preference at a 5 nt distance from terminal labeling is striking, but has been observed before. 20 The presence of T at the 3 0 -end of the permuted region being critical to high uorescence intensity may be due to the uorescein dye stacking not on the terminal 5 0 base pair, but rather intercalating further down along the double strand, an effect which could not take place in single-stranded oligonucleotides. The intercalation of the xanthene moiety 5 bp downstream of the 5 0 end is conceivable given the exibility of the C6 aliphatic chain linking the dye to the terminal nucleotide. As with ssDNA, the uorescein in the dsDNA can also interact with the T linker as illustrated in Fig. 1B, but this interaction is shared among all sequences and therefore does not affect the consensus sequence. Fig. 5 illustrates how-even in the absence of all guaninesa diminished but still large span of sequence-dependent uorescence of the uorescein is retained. The range of intensities, comprising a 30% drop for 3 0 FAM ssDNA (vs. $50% for such sequences including G), and almost 40% for both 5 0 FAM ssDNA and 5 0 FAM dsDNA (vs. $60% and $50%, respectively, for the equivalent sequences including G). These numbers, along with the discrepancy between the nucleobase redox potentials of A, C and T and their relative prominence in all of the consensus sequences, suggest that one or more additional mechanisms-superimposed on the photoinduced electron transfer mechanism-are needed to fully explain the sequence dependence of uorescein end labeling in singleand double-stranded DNA. Alternatively, since experimental values for the oxidation potentials have only been determined for free nucleobases in acetonitrile, 13 signicant shis in more natural contexts cannot be excluded. Within DNA, protonation equilibria, 49 as well as nucleobase pairing and stacking interactions 48 may signicantly change these potentials, and these changes themselves are likely to be sequence dependent. Even in single-stranded DNA, which is far less structurally dened than double-stranded DNA, base-stacking in ssDNA has been observed experimentally and shown to contribute to its electrostatics and elasticity, two factors which can also contribute to charge transfer efficiency. 44

Conclusions
We studied the sequence dependence of end-labeled uorescein on single and double-stranded DNA by preparing the complete sequence permutation library up to 5 nucleotides adjacent to the dye. We found that the identity of the nucleobase immediately next to uorescein is the most likely to affect uorescence intensity and, as expected, uorescence quenching is largely seen with terminal Gs and G-rich combinations, leading to uorescence reduction by up to 60% relative to the brightest sequence composition. At this end of the uorescence intensity spectrum, we noticed that the terminal nucleotide responsible for high uorescence signals differs between ss and dsDNA, with mostly T as the nal nucleotide in ssDNA and mostly C in double-stranded DNA. Distal nucleotides have limited participation in the overall outcome. These results suggest that proximity to guanines, as the most oxidizable nucleobase, is primarily responsible for the modulation of uorescence intensity in the case of uorescein, in stark contrast to how cyanine dyes are affected by nucleotide sequence. Beyond guanine, however, the observed sequence dependence of intensity quenching does not correspond to what would be expected based on current experimental values for (isolated) nucleobase redox potentials, suggesting either signicant shis in these values in DNA contexts, or an additional sequence-dependent mechanism that affects uorescein intensity. The ranking of all sequence permutations by uorescence can be used as a calibration curve to counter and correct for quenching/dequenching events in labeled probes, for instance in quantitative PCR and more generally in any approach involving FRET and uorescein. Another type of correction that can easily be implemented is in probe design itself where a terminal G can be accompanied with a T/A-rich segment to partially compensate for G-mediated uorescence quenching, i.e. in the form of GTAAA in ssDNA and GTACT in dsDNA.  Fig. 3B and 4A, but include only the 243 (35) sequences without G in each DNA context. The 5mers are ranked from most to least intense, with fluorescence falling by $30% for 3 0 -fluorescein-labeled ssDNA, and by $40% for 5 0 -fluorescein-labeled ssDNA and dsDNA.