Rodolphe
Marie
*a,
Marie
Pødenphant‡
a,
Kamila
Koprowska‡
b,
Loic
Bærlocher‡
c,
Roland C. M.
Vulders‡
d,
Jennifer
Wilding
b,
Neil
Ashley
b,
Simon J.
McGowan
b,
Dianne
van Strijp
d,
Freek
van Hemert
d,
Tom
Olesen
e,
Niels
Agersnap
e,
Brian
Bilenberg
f,
Celine
Sabatel
g,
Julien
Schira
c,
Anders
Kristensen
a,
Walter
Bodmer
b,
Pieter J.
van der Zaag
d and
Kalim U.
Mir
h
aDepartment for Micro and Nanotechnology, Technical University of Denmark, Ørsteds Plads Building 345C, 2800 Kgs. Lyngby, Denmark. E-mail: rodolphe.marie@nanotech.dtu.dk; Fax: +45 45 88 77 62; Tel: +45 45 25 57 00
bWeatherall Institute of Molecular Medicine, Department of Oncology, John Radcliffe Hospital, Headington, Oxford OX3 9DS, UK
cFasteris SA, Chemin du Pont-du-Centenaire 109, CH-1228 Plan-les-Ouates, Switzerland
dPhilips Research Laboratories, High Tech Campus, 11 5656 AE Eindhoven, The Netherlands
ePhilips Biocell, Gydevang 42, 3450 Lillerød, Denmark
fNIL Technology ApS, Diplomvej 381, 2800 Kgs. Lyngby, Denmark
gDiagenode SA, Liege Science Park, Rue Bois Saint-Jean, 3, 4102 Seraing, Belgium
hXGenomes, Pagliuca Harvard Life Lab, 127 Western Ave, Boston, MA 02134, USA
First published on 6th June 2018
Sequencing the genomes of individual cells enables the direct determination of genetic heterogeneity amongst cells within a population. We have developed an injection-moulded valveless microfluidic device in which single cells from colorectal cancer derived cell lines (LS174T, LS180 and RKO) and fresh colorectal tumors have been individually trapped, their genomes extracted and prepared for sequencing using multiple displacement amplification (MDA). Ninety nine percent of the DNA sequences obtained mapped to a reference human genome, indicating that there was effectively no contamination of these samples from non-human sources. In addition, most of the reads are correctly paired, with a low percentage of singletons (0.17 ± 0.06%) and we obtain genome coverages approaching 90%. To achieve this high quality, our device design and process shows that amplification can be conducted in microliter volumes as long as the lysis is in sub-nanoliter volumes. Our data thus demonstrates that high quality whole genome sequencing of single cells can be achieved using a relatively simple, inexpensive and scalable device. Detection of genetic heterogeneity at the single cell level, as we have demonstrated for freshly obtained single cancer cells, could soon become available as a clinical tool to precisely match treatment with the properties of a patient's own tumor.
Treatment can then be directed at the different clones that co-exist in the cancer and thus single cell DNA sequencing1 becomes an extremely important tool for matching the treatment of a cancer to its genetic make up. This is the essence of precision medicine as applied to cancer treatment. There is also an interest in single cell mRNA analysis,2 which can help to identify gene expression differences, due mainly to DNA methylation as well as interest in examining methylation directly.3,4 There has therefore been increasing focus on the development of methods that obtain molecular information from single cells by isolating and sequencing their DNA and mRNA content.5–7 Similarly, in metagenomics, where bacteria, fungi and other microbes may not be culturable, a robust single cell analysis is important to evaluate the genomes of the distinct microbes present in a sample.8 In addition to untangling heterogeneity, the single cell methods are relevant to cases when only a small number of cells is available, for example in the analysis of circulating tumour cells9,10 and circulating fetal cells in maternal blood.11 A single diploid human cell contains around 7 picograms of genomic DNA and some form of amplification is therefore needed to obtain the amounts of material necessary for current sequencing methods. Amplification by multiple displacement amplification (MDA)12 and PCR based methods such as DOP-PCR,13 Picoplex14 and, multiple annealing and looping-based amplification cycles (MALBAC)15 have been used relatively effectively to do single cell whole genome DNA sequencing. Genome coverage of >90% has been claimed to be routinely obtained from single cells using MDA and MALBAC, with a MDA kit adapted for single cells reportedly giving superior all round performance.13 However, issues around uniformity of coverage, allelic dropout, false positives (amplification and/or sequencing errors) and unmappable reads (e.g. from primer-dimers), remain.16
The MDA process results in uneven coverage across the genome. Some of the amplification biases are presumed to be a result of stochastic effects due to the sampling of a small number of molecules. In MALBAC amplification bias can be corrected by normalizing the GC content.17 Another approach to dealing with this problem is the use of barcoding or identification tags.18–20
Contamination, if not carefully controlled can lead to difficulty in interpreting results and limits the sequencing capacity that is available for a single cell of interest. To combat this, single cell genomics is preferably conducted in a clean room,16 in microwells,21 or in a microfluidic device.8,12,17 Contamination can also be assessed by the use of appropriate known genetic markers. Using a microfluidic device provides the containment of cells and their immediate lysis products, and the controlled amplification within the device limits the loss of material when handling small volumes.
Existing microfluidic devices8,17 requiring multiple-PDMS layers,22 one layer for the passage of fluids and another for valves to control the fluids through the device, are difficult and expensive to manufacture because PDMS casting is not a scalable industrial process. We introduce a novel valve-less microfluidic device for single cell genomics that is manufactured in a thermoplastic material by injection moulding, a process that is scalable at low cost.23 Our chip design is based on a hydrodynamic cell trap24 derived from a previously described device for cell culture.25 Our device design enables the process to be carried out on an optical microscope or in “Cell-O-Matic” a specially built single cell processing instrument (Philips BioCell), in either case allowing us to monitor both cell trapping and genome extraction.
Our single cell sequencing data obtained using the Cell-O-Matic instrument show that we can achieve reasonable levels of whole genome coverage in a significant proportion of cells. Our results compare well with other reported whole genome sequencing from single cells using instrumentation in which the amplification is performed in nanoliter-reaction chambers. In our device only the DNA extraction occurs in a sub-nanoliter volume of solution, while the amplification is performed by adding microliter volumes of reagents in the device outlet. We conclude that the critical step in single cell whole genome amplification with regard to sequence allelic dropout, contamination and genome coverage is to extract DNA in sub-nanoliter volumes in the confinement of the microfluidic device, while performing the amplification in such small volumes may only be required for reduction of reagent consumption, but at the cost of higher device complexity and cost.
Cells from colorectal cancer cell lines LS174T, LS180 or RKO (concentration: 6 × 105 cells per mL) were stained with 1 mM calcein AM and suspended in BD FACSFlow buffer (Becton Dickinson). After cell capture, the trap occupancy was checked by bright field and fluorescence imaging of the calcein signal. After trapping cells, the B1 and B2 inlets were emptied leaving negligible volumes in the outlets.
Fig. 2 Read metrics. a) Percentage of mapped reads and b) total number of reads. Legend displays the mean, the standard deviation and the standard error of the mean (s.e.m). |
Fig. 4 Coverage plots corresponding to the (a) bottom, (b) middle and (c) top tercile and (d) the single cells processed by proteolysis in Eindhoven. Cells 124 to 236 are LS174T cells. Cells 335 to 343 are RKO cells. The bulk of LS174T is also shown (sample ID 46). We display the E-score as mean value and the standard deviation for each group. The E-score is calculated from a normalized coverage curve as described in ref. 36. |
For each experiment a single use microfluidic device (Fig. 1a and S1†) is placed in the instrument allowing bright field and fluorescence imaging, the control of the device temperature and connection of the device inlets (cell, B1 and B2 inlet, Fig. 1b) to a multi-channel air pressure controller7 (Fig. S1g†). Fluorescence imaging and the use of YOYO-1 intercalating DNA dye enabled monitoring of cell lysis. However, the dye may be omitted to avoid interference with the subsequent quality of the preparations with respect to their use for DNA, or RNA sequencing.7
Our design is the result of iterative optimization where we identified and improved three critical aspects of the device design and fabrication: i) the flow through the trap, which depends on its cross section and the flow resistance of the outlet channels, ii) the shape of the cell pocket and iii) the moulding quality of the cell inlet.
The trap cross section has to be smaller than the LS174T cells size to retain the cells, but also sufficiently large so that it collects a significant fraction of the main flow in the feeding channel for cells to be directed through the trap. As a boundary condition, our choice of a single depth design means that the trap depth remains the same throughout the chip, namely 30 μm. As a result, the trap has a high aspect ratio, within the limit achievable during the fabrication of the master in silicon by micromachining. Finally, the fabrication by polymer replication results in the channels and in particular the trap having tilted sidewalls (up to 3 degrees) to allow the separation of the polymer part from the mould during injection moulding. As a result, the cell traps have a cross section 30 μm-deep, 4.5 μm wide at the bottom and 7.5 μm wide at the top. The pocket receiving the cell has an asymmetric design (Fig. 1c and d). This is in contrast to previously reported devices based on hydrodynamic trapping where flow focusing is used to direct the cell to a microfluidic constriction that is a bypass in an otherwise symmetric flow profile. In our device, the flow focusing is asymmetric since cells are aligned against the wall of the feeding channel. A symmetric pocket creates a dead volume after the constriction (Fig. S4†) that is a spot where a cell decelerates and can settle just outside the cell trap. By making the pocket asymmetrical, we improve the flow profile such that cell trapping is more efficient. The optimized design gave the best results in terms of numbers of traps per chip having single cells.
Finally, the connection of the feeding channel and the well receiving the cells is a critical aspect of the design. The surface roughness at the inlet is of paramount importance since a sharp edge tends to stop cells entering the channel. The injection moulding parameters are therefore adjusted to produce a round edge. In addition, we ensured that the shim is mounted into the injection moulder only once. This greatly improved the quality of the final device since successive mounting of the shim increases the roughness at the connections with the inlets due to the alignment tolerance of the shim in the mould. On the optimized device (Fig. 1), single cells were trapped routinely, on average, in 3 to 6 out of 8 possible traps. On rare occasions, cell doublets are trapped and this may be because cell doublets enter the device in the first place. For this reason, cell traps are imaged in bright field and fluorescence after trapping to confirm the presence of, and then exclude, such cell doublets from further analysis.
Cellular DNA is eluted from the cell trap by introducing a lysis solution from the inlet B1 (Fig. 1b). In our study, we compared two lysis solutions. For one, a solution for proteolysis including proteinase K and Triton-X100 was used for the 13 cells (LS174T and RKO) whose results were obtained in Eindhoven. This lysis solution enables collecting the RNA prior to collecting the DNA of the trapped cell.7 Alternatively, an alkaline lysis buffer (D2, pH above 12) provided with the Repli-g UltraFast kit was used in Oxford for the analysis of the 39 single cells from the LS174T and LS180 cell lines, and from two fresh tumour samples (Fig. S4†). The alkaline lysis is the one adopted in commercially available kits for eluting DNA for sequencing. Both solutions successfully lyse the cells trapped and elute the DNA from the trap as observed in experiments where the DNA is labelled with an intercalating dye so it can be visualized by fluorescence microscopy. From the results of the single cell sequencing using the two different approaches, as discussed below, we conclude that both approaches to lysis were appropriate for MDA. This is, perhaps, surprising in the case of DNA extraction by proteolysis since proteinase K might be expected to digest the polymerase. However, there are six orders of magnitude difference between the volumes of lysate (pL) and the volumes of the reagents added to the well (μL), which thus makes the protease content in the MDA mix insignificant.7 The success of the amplification and sequencing is the best indication that the lysis is successful.
DNA samples that successfully amplified were passed through a quality control. For the samples processed in Oxford, we PCR amplified five genes from five different chromosomes to give five different sized fragments, and visualized them on an agarose gel (see Protocol S3† for details). Only samples which successfully displayed at least 4 of the 5 PCR products were used for library preparation and sequencing. Essentially all of the single cell lysates were successfully amplified for DNA and more than 90% of the Oxford samples passed the subsequent quality filter (i.e. quantification by a pico-green assay (Qubit)) before being passed on for DNA sequencing. For samples processed in Eindhoven, a quality check comprising PCR of RNase P was performed on some samples. Next, some of the samples were then checked by 1) quantification by Qubit and 2) a test run of sequencing performed at a low number of reads in order to assess the quality of the library before the actual sequencing presented in this paper. Sequencing libraries were successfully prepared from 97% of the samples that passed the initial quality control.
G = 1 − 2A. | (1) |
In which A is the area under the Lorenz curve. For an ideally uniform coverage of the genome, the Lorenz plot displays a diagonal and the area under the curve is 0.5. G = 0 indicates an ideally uniform coverage of the genome. In our study, G = 0.3 for the sequencing of the bulk of LS174T and many cells have a G = 0.5 (see Fig. 3d and S5†). In the top tercile of the cells processed in Oxford, corresponding to the highest coverage, G = 0.6 ± 0.1 (n = 13 cells). For comparison, using commercial instrumentation, Szulwach et al. report G = 0.36 ± 0.04 (n = 5) for GM12752 cells where the bulk sequencing gives a G just below 0.2, but also G = 0.6 for another cell type.35 Thus far most of the single cell sequencing studies only report the coverage results using the Lorenz plot.5,6,15,35 Although the Lorenz graph is effective in reporting which fraction on the genome is not covered, for reporting the distribution of the coverage the so-called coverage graph is more suited and used in (bulk) sequencing experiments. Previously we have reported a coverage graph of single cell sequencing experiments.7 Here, we report a more complete overview of the coverage of our results in Fig. 4. The evenness score E:
(2) |
p = 2a + b/2n. | (3) |
Fig. 5 shows the results of such an analysis for DNA prepared from the single cells in Oxford using a panel of 12 SNPs known to be heterozygous in the LS174T and LS180 cell lines. The different colours of the vertical bars for each single cell show the proportions of times 2:1 or no alleles are found, and the cells are ordered from highest to lowest estimate of p. The corresponding Lorenz plots for these DNA sequences (Fig. 3a–c) show that there is a reasonable relationship between the coverage estimates from the Lorenz plots and the p value estimates. About 44% (17/39) of these single cell DNA sequences give p value estimates of around 0.7 or more, indicating total genomic coverage per cell of around 50%, while about 25% give total coverage of greater than 70%. The overall average p-value using the data on all 39 single cells is 0.60 ± 0.25 corresponding to complete coverage of just under 40% (the average p value for the Eindhoven data in Fig. 6 is 0.63 ± 0.22). Out of more than 10000 reads covering the 13 pairs of alleles for the SNPs, only 63 were ‘incorrect’ in the sense that they were not expected for either allele pair of a given SNP. This indicates a sequencing error rate of less than 1% and also the absence of any contaminating human DNA from external sources, namely other than the cells being analysed.
Additional evidence for the absence of contamination with exogenous DNA was obtained from the density of reads that mapped to male-specific genes on the Y-chromosome, see Table S3.† Since both the LS174T and the RKO cell lines are derived from female patients and the operators in the Eindhoven laboratory were male, lack of Y chromosome reads provides evidence that there was at least no contamination of Y chromosome reads from them. The male-specific genes used in this analysis are those for which there are no homologies on the X-chromosome as taken from the work of Page and co-workers.38 For almost all male-specific genes we found zero reads mapping to them whereas the mean number of reads per gene on the X-chromosome (taken over all genes listed in the Ensemble human genome annotation GTF file) is over 900 reads per gene on average for these all samples (Table S3†). This value is to be compared to average number of reads found for the male-specific genes which is 0.4 read per gene (Table S3†). This effectively rules out exogenous DNA contamination from the male operators in the Eindhoven laboratory to occur and suggests that these reads mapping to male specific genes found corresponds to amplification, sequencing and mapping errors. Since the error rate for sequencing on an Illumina HiSeq system is in the order of 1%. One usually refers to a minimum number of Q30 base (number of base where the error rate is below 1/1000). For 2 × 125 bp reads of HiSeq, we should have error rate below 1.5% (estimation using an indexed PhiX). The exact value depends on each run. Our data suggest a mapping error of 0.4/900 = 0.04%.
The heterogeneity of the frequencies of reads (data not shown) between SNPs within single cells suggests dropping out, namely absence of DNA in the initial single cell preparation, as the main reason for lack of complete coverage. Similarly, Fig. 6 gives an estimate of the allelic drop-out for the single cells processed in Eindhoven for which the Lorenz graph is shown in Fig. 3d. Note that for the RKO cells, 13 heterozygous SNPs across the genome where used. Again, the results are concordant with the measures of coverage and the Lorenz curves. The poorest cells, with the highest allelic drop-out are right in the far lower corner of the Lorenz plot in Fig. 3d, while the best cells, with 60–70% sharing, correspond to the curves nearest to the diagonal. In addition to the analysis of single cells from the colorectal cancer-derived cell lines, some single cell whole genome sequences were obtained directly from two fresh colorectal cancers. These were analysed following the same procedure described above using different appropriately chosen sets of SNP markers for each cancer. The results shown in Fig. S8† for a further total of 15 single cells demonstrate that at least comparable quality single cell whole genome DNA sequences can be obtained from fresh tumours as were obtained from the cell line cultures. Our overall results indicate that the independent analyses of singe cell DNA sequences using two different protocols in different laboratories, but using the same device and instrument, gave comparable results, with perhaps somewhat better coverage using the protocol with alkaline lysis compared to the protocol using proteolytic lysis. Moreover, the results obtained using our valve free devices which are simpler in design and manufacture are comparable with the best published results. For details of the experimental protocols and the use of the instrument see the methods section and the Protocol S3.†
We found that sequencing genomic DNA extracted from single cells inside our low-cost microfluidic device, gave single cell DNA sequencing results of comparable quality to those reported using more complex and expensive instruments. Our device is valve free and can thus be fabricated by injection moulding a polymer. It is also straightforward and can be operated on a commercial optical microscope or using a custom-built instrument, Cell-O-Matic.
Moreover, in this study the cell lysis is performed in the sub-nanoliter cell trap of the device while the amplification step is performed in the outlet wells in μL volumes. The representation of the genome in the sequencing data is similar to single cell sequencing obtained in devices where both DNA extraction and amplification take place in nL volumes.35 In addition, we also show that the genome representation of single cells processed in the microfluidic device is on average better than when both cell lysis and amplification are performed in μL volumes. Here we compare to sequencing of single cells sorted by FACS in individual PCR tubes and amplified using the unmodified MDA protocol, i.e. with an alkaline lysis (see Fig. S9†). In this case the Gini coefficient is higher (G = 0.9 ± 0.1, n = 7) and the evenness (E = 24.1 ± 16.9, n = 7) lower than for the lower tercile of the alkaline lysis data set (Fig. 3c and 4c). This shows that only the DNA extraction may be crucial to a good single cell sequencing. The main reason for a poor representation of the genome may be the loss of DNA and/or equally the loss of enzymatic activity. When amplification takes place in confinement of a nano-liter volume, enzyme may be lost on surfaces due to the high area-to-volume ratio and this may outbalance the benefit of maintaining a high template concentration.
Our device design focuses only on DNA extraction thus its design is not specific to any amplification protocol and the end-user may be free to implement any other amplification protocol. Moreover, the device design includes inlet wells that are placed in a grid matching that of a 96-well plate thus that standard lab robotics could be used to perform the amplification step.
Previously described microfluidic devices isolate single cells using either a physical valve8 or an oil phase21 at the time of the lysis and subsequently for the amplification. The use of valves to trap the cells necessitates two-layer devices which are complex and hard to manufacture. By contrast, our devices have no valves and are thus easier to design, manufacture and use. We are able to operate without valves because the liquid flow from different inlets is strictly controlled by air pressure with high accuracy. This, in particular, allows us to exchange reagents in the feeding channel while maintaining the cells trapped until they need to be lysed. The flow rate is minimal as too high flows would dislodge the cells from the traps. The laminar flow conditions in the feeding channel and through the traps ensure that the lysate is pushed through the trap. At a later stage, during the amplification, the solution is confined to the outlet since the loss of material from the outlet well through diffusion into the microfluidic channel is negligible (see ESI† and Fig. S10). The design of the traps, the mode of lysis and collection of the resulting DNA makes it unlikely that there is significant contamination between the cells trapped on the same chip. Preliminary data obtained by analysis of the LS174T cell line, which is a known mixture of two cell populations (unpublished observations), suggests that there is no major contamination between neighbouring traps on the same chip.
A further proof of the absence of contamination between neighbouring traps can be derived from a subset of data where mRNA was extracted from the captured cells.7 There, PCR of the AXIN2 and beta-actin genes was used to assess the presence of mRNA in the outlet wells. In those experiments, no mRNA was detected from empty traps adjacent to those where cells were successfully captured and lysed.
Finally, we also consider contamination by exogenous human DNA. The allele analysis of heterozygous SNPs from throughout the entire genome shown in Fig. 5 and 6. In this analysis, we detect only the alleles that we expect for the LS174T cell line which gives us a good indication that there is no contamination from extraneous human DNA. In addition, we also look at the presence of reads mapped to Y-chromosome genes knowing that the cell lines used in this study are female cell lines. Here, we find generally no reads mapping to those genes (see Table S2†). Some reads do map to a few Y-chromosome genes such as PCDH11Y for all samples but we show that this is due to homologies with genes on the X-chromosome. The density of reads that map to chromosome Y is below 3% and typically 0.05% of the input of a single cell, so should be attributed to amplification and sequencing errors. The absence of exogenous human contamination may be surprising at first since the devices are fabricated in a standard laboratory environment (i.e. not a clean room environment). The microfluidic chip is injection molded on an industrial equipment and assembled to a polymer foil. However, the assembly is realised by UV-assisted thermal bonding. The strong UV illumination during the bonding of the lid would destroy any foreign DNA before the microfluidic channels are sealed. Immediately after bonding the lid, the device connectors are covered by PCR tape.
When comparing our data to previously published DNA sequencing from single cells we see that the Gini coefficients (see Fig. 3) are similar to results obtained on a commercial system.35 Previously7 and here we have shown coverage graphs of our single-cell sequencing data. To our knowledge, this has not been done before for single cell sequencing data and we suggest that this should be incorporated in future single sequencing experiments as this gives a better insight in the read distribution in these experiments and to what extent reliable SNP calling can be performed. Finally, we have presented our results in terms of maximum likelihood estimate of allelic dropout p and find this value to be 0.60 ± 0.25.
The first commercial microfluidic device method for processing single cells for sequencing18 was known to suffer significantly from the capture of doublets rather than single cells. Our approach has the advantage that we can take an image of trapped cells to confirm the single cell occupancy of each trap before proceeding to sequencing.
The more recently emerging droplet-based single cell fluidics and dilution tagging and pooling approaches offer the highest throughput (up to 10000 s of cells) compared to 10–100 s of the microfluidic trapping approaches. However, an advantage of our approach, is that it can be used to extract and process RNA from the same cell as the DNA;7 such multi-omic characterization will be important for making the connection between genotype and molecular phenotype to gain a better understanding of cellular mechanisms and to better select the mutations that may be driving a cancer phenotype and which might be candidates for targeted therapy.
For such integrative omics applications it is important to know that the comparative performance metrics of our single cell processing devices are equivalent to other types of devices and approaches. We can conclude that our DNA sequencing results show that the output of our device is at least comparable to, if not better than the valve-based commercial devices and offers advantages over non-microfluidic approaches such as a very low contamination level.
Footnotes |
† Electronic supplementary information (ESI) available: Fig. S1–S9 and Tables S1–S3. See DOI: 10.1039/c8lc00169c |
‡ These authors contributed equally to this work. |
§ See http://www.sigmaaldrich.com/technical-documents/articles/life-science-innovations/qualitative-multiplex.html. |
¶ These 5 positions were chosen based on https://www.sigmaaldrich.com/technical-documents/articles/biology/ffpe-wga-poster.html. |
This journal is © The Royal Society of Chemistry 2018 |