February 13th, 2019
By: Noah Fierer
DNA sequencing is clearly a powerful tool for analyzing microbial communities. It is now somewhat routine to take a sample of feces, tap water, or bellybutton lint and use DNA sequencing-based approaches to analyze the microbes living therein. We also know that the cost of generating DNA sequence data is getting far cheaper every year. This widely disseminated plot illustrating the declining cost of sequencing has to be updated annually as sequencing costs continue to drop. Large sequencing projects that would have once cost as much as an apartment in San Francisco can now be done on a relatively small budget (‘relatively’ is the key word here).
One question I am often asked when working with collaborators – preparing proposals or budgeting for projects, is: “How much will it cost to analyze these samples via DNA sequencing?”. The cost estimates are often a bit surprising to those less familiar with these types of microbial analyses and this confusion is completely understandable. After all, an Illumina HiSeq 4000 instrument can generate 300 million reads per lane at a cost of ~$2000 per lane. So, if we want ~1 million reads from each of 300 samples, it should cost roughly $6.50 per sample. Right? No – not even close.
The cost of the sequencing itself is often a small fraction of the total cost of the microbial analyses. To use an example, let’s walk through a hypothetical budget required for 16S rRNA gene amplicon sequencing (one of the more commonly used approaches for characterizing the microbial communities found in environmental or host-associated samples). I’m using amplicon sequencing here because it is a bit easier to estimate the costs and because such methods are used by many labs across the globe to get some initial insight into what microbes are found in a given sample.
For this hypothetical example, I’ll assume that the samples (whether they are water filters, soil, or a swab from a cellphone) have already been collected and are sitting in your lab freezer organized and ready to go. These cost estimates are for 90 samples. Why 90 samples? – because we often do our work in 96-well plates(I don’t trust 384-well pipelines for reasons that are too complicated to go into here) and we typically reserve at least 6 wells of each 96-well plate for running positive/negative controls. So, let’s break down the steps in the process of 16S rRNA sequencing and the costs associated with each step. I’m assuming here that everything works well the first time and none of the steps have to be repeated. I’m also assuming that the lab does not have to purchase any of the equipment required (e.g. centrifuges, pipettors, gel boxes, etc.), there are no institutional overhead costs, and the people doing the lab work already know exactly what they are doing1:
- $550 in reagents/consumables (mostly the cost of the extraction kits)
- $150 in labor (5 h – includes plate loading)
PCR amplification (duplicate 25 µL reactions):
- $160 in reagents/consumables (barcoded primers and good Taq are not cheap)
- $30 in labor (1 h)
Gel electrophoresis (you want to make sure the PCRs worked, right?):
- $20 in reagents/consumables (mostly SYBRSafe costs)
- $30 in labor (1 h)
- $115 in reagents/consumables (SequalPrep plates and lots of pipette tips)
- $30 in labor (1 h)
Illumina MiSeq run2:
- $500 in reagents/consumables (assuming 1/3rd of a 2x250bp run)
- $30 in labor (1 h)
So – we haven’t even gotten to the bioinformatics and other downstream analyses (which can consume a huge amount of personnel time depending on what is required) and we are already at >$1600 for the 90 samples. The sequencing itself is <1/3rd of the total cost. Thus, although the declining cost of sequencing is clearly a boon to researchers, the other steps still cost money and these costs are unlikely to decline appreciably in coming years.
I know that these budget numbers can clearly vary and I look forward to hearing why I have grossly over or under-estimated specific costs. Your cost estimates could end up being appreciably lower or higher depending on the depth of sequencing required, the specific sequencer/sequencing chemistry used, labor costs, sample throughput, and the specific nature of the samples to be analyzed3. There are ways to reduce the costs of library prep by using cheaper reagents, making your own Taq, using smaller PCR reaction volumes, running fewer replicate PCR reactions, using cheaper DNA extraction kits, or even forgoing DNA extraction altogether. The techno-futurists would even argue that sample processing costs would be far lower if we just replaced people with robots. This is likely naïve4.
While there are clearly ways to reduce costs and/or the personnel time required for sample processing, cutting costs often ends up reducing the quality of the data – increasing the likelihood of introducing contamination or increasing the sample drop-out rate. In short – you can quibble with the numbers and I know there are ways to reduce costs – but the per-sample costs will not be substantially lower for most sample types unless you are willing to sacrifice data quality.
What’s the take home message here? Don’t ignore library prep costs when estimating per-sample costs of the wet lab work. Unlike a fictional lab featured on CSI: Miami – we do not just put a raw sample in a sequencer and get results without a lot of hard work conducted by highly trained people. Wet lab work costs time and money – even the ‘latest and greatest’ DNA sequencer won’t solve that problem. Last but not least – it is worth reiterating that I haven’t even included the costs associated with downstream data analyses. Once you have the sequence data in hand, the party is only beginning and it can take months of effort to fully analyze the dataset that was generated over a few days.
1: I’m using $30 per hour as a low estimate here. At my university, this is equivalent to a ‘take home’ annual salary of $44K once you take into account fringe benefits (~1/3rd of salary).
2: These cost estimates are for a MiSeq run. Other sequencing platforms (e.g. the Illumina HiSeq 4000) could have lower sequencing costs per sample, but MiSeqs (or equivalent) are widely used for these sorts of sequencing efforts given their availability, longer read lengths, shorter run times, and suitability for low-complexity amplicon sequencing. Plus, it is rare that we have enough samples or primer combinations to justify far higher data output per run.
3: Per-sample costs can go up appreciably if one is running just a few samples at a time or if the samples need to be processed prior to DNA extraction. For example, aseptically cutting air or water filters can take a lot of time as does sub-sampling a large pile of frozen feces (or so I’ve heard). Plus, if the samples are not well-organized, it can often incur a substantial amount of time (and mental anguish) to read poor handwriting on small tubes and keep track of what samples are going into what wells. These sample organization steps are clearly important and cannot be rushed.
4: Liquid-handling robots are expensive, they often don’t work as well as we would like for many sample types and they need to be monitored closely to avoid catastrophic failures. In our experience, when all is said and done, a well-trained researcher armed with a multi-channel pipettor can be just as fast and arguably generate higher quality data than liquid-handling robots. I, for one, do not welcome our robot overlords.