Fierer Lab

Exploring the structure and function of microbial communities

Contamination in ‘aerobiome’ studies – a cautionary tale

Noah Fierer


Studying the microbes found in outdoor air is inherently difficult. Not only is the atmosphere a highly dynamic system, but cell concentrations are typically very low, making sampling and downstream analyses challenging. Contamination is a persistent problem and concern when working with these types of samples, regardless of the method employed for microbial analyses. Contaminants can be introduced during the sampling itself, from reagents/consumables used during sample processing, or from cross-contamination with other samples (see old post here). Contamination not only affects data quality – it can fundamentally alter study conclusions.

As a case in point, I want to focus on a recent study published by Kobziar et al “Wildland fire smoke alters the composition, diversity, and potential atmospheric function of microbial life in the aerobiome”. I am singling out this paper as an example of the potential contamination issues that appear to be surprisingly common in the published ‘aerobiome’ literature. This paper is definitely not the only ‘aerobiome’ paper that appears to suffer from important contamination issues – it is just the most recent one I have come across.

I was excited to read this paper. The authors deployed drones (drones!) to collect air samples onto filters and then used a range of methods to investigate the aerosolization of bacteria and fungi during wildfire events. However, as I started delving into their paper – it immediately raised a number of ‘red flags’. There are multiple lines of evidence suggesting that an appreciable number of their samples are likely contaminated and that the level of contamination is sufficient to affect their conclusions:

  • The authors report that their ambient samples had, on average, 520 cells per filter (determined via microscopy). Such low cell numbers are to be expected given that they were only able to collect 20L of air onto individual filters. This is not problematic per se. What is problematic is that their blanks had 360 cells per filter and the estimate of 520 cells per filter was obtained by subtracting the ‘blank’ cell numbers from the sample numbers. This means that out of the 880 cells per filter (550 + 320 cells, on average), ~40% of these cells were contaminants.  This seems to be a high number and suggests that DNA-based analyses would suffer from similar problems with a large fraction of the DNA (on average) likely derived from contaminants. In fact, this seems to be the case.
  • The supplement of the paper includes a list of taxa that were removed from the sequence data as they also appeared to show up in the ‘blanks’. This includes some taxa that are strongly indicative of inadvertent contamination (including Propionibacterium, Staphylococcus, Malassezia, and mammals). After exchanging a series of emails with the authors, it turns out that those sequences which were removed represented roughly a quarter or more of all the sequences they obtained from their samples. This % of ‘contaminant’ sequences ranged from 5% to 48% of total sequence reads, with the extent of contamination variable across their sample categories and across their targeted taxa. In my opinion, this level of contamination is high and the samples (or at least some of the samples) should not have been included. The authors clearly disagree with my assertion, but I think it is important for readers to recognize that contaminants were abundant in their sequence data. I know there are numerous published studies that have used a similar approach to remove contaminant sequences, but trying to salvage datasets by simply subtracting contaminants seems tenuous when the levels of contamination are substantial.
  • Some of the more abundant taxa found in their samples (even after removal of those that were found identified as contaminants) are very unexpected. This list includes E. coli, Corynebacterium, Lactobacillus, and Streptococus (see Figure 3) These are common contaminants (see here) and taxa that are typically very abundant on human skin (see here). Yes – we often see these taxa in indoor air (because human skin is an important source of indoor microbes), but it is surprising to see these taxa in their samples, especially given that they were sampling air above a forest.

Note that it is very difficult to _prove_ contamination, especially when I am working with limited information. Also – I note that I raised these concerns with the authors and, to their credit, they were very helpful in answering my questions. They do not argue with the facts presented above, but they do stand by their results and conclusions. Regardless, I think it is important for such concerns to be aired publicly so the broader scientific community can think about these potential issues. Investigating the ‘aerobiome’ is difficult and the study by Kobziar et al. emphasizes the challenges encountered when dealing with extremely low biomass samples. We can debate the extent of contamination in these samples and whether such contamination might have affected their conclusions, but readers should be vigilant when evaluating published ‘aerobiome’ studies or designing their own studies. Contamination is a persistent problem and a problem that it is difficult to address when sampling relatively small volumes of air with low cell concentrations.

2 thoughts on “Contamination in ‘aerobiome’ studies – a cautionary tale

  1. Thank you to the Fierer lab for another interesting post. Contamination is an unavoidable problem when sampling ultra-low biomass habitats. The post commenting on the recent publication by Kobziar et al raises the important issue of how this impacts the field of aeromicrobiology. Although the topic of the paper was interesting, the recovery of very low air volumes and biomass, lack of comprehensive decontamination mitigation or reporting, and the apparent abundance of potential contaminants in the filtered dataset were a concern. I agree with the Fierer Lab post that decontamination is an extremely important aspect in low biomass studies and that strong nuancing of findings in the absence of clear reporting or approaches is required. However, the post also presented the opinion that “trying to salvage datasets by simply subtracting contaminants seems tenuous when the levels of contamination are substantial” without an evidence-based explanation. In response to this I think it is important to recognise that there is currently no consensus on what level of contamination is acceptable in ultra-low biomass environmental studies where contamination is an unavoidable issue. It is therefore essential to publish studies that more fully address the contamination problem and transparently report the outcome so that they contribute to a data-driven conversation that can benefit the field moving forwards.

    I share here a recent preprint from my group that might be interesting to the discussion: It demonstrates a comprehensive and transparent approach to data filtering and decontamination reporting that was shown empirically to mitigate this problem within the constraints of current microbial ecology practice. Specifically we advocate that researchers:
    1) Include blanks for field sampling and laboratory workflow, and also swab human operators to obtain a baseline of potential human contamination sources.
    2) Perform a comprehensive subtractive filtering process and fully report the outcome for each stage both quantitatively and taxonomically, and include checks for potential cross-contamination.
    3) Perform post-hoc checks on the effectiveness of the decontamination process and fully report the outcomes and impact on downstream analysis so that filtered data can be appropriately interpreted and nuanced.

    This comprehensive data filtering approach and fully transparent reporting probably represent the maximum effort that can be achieved at this time in ultra-low biomass aeromicrobiology. The study represents a significant advance over earlier studies in terms of addressing the decontamination issue (not to mention I think the description of the large-scale ecological patterns is pretty cool too!), and I hope that its publication will foster critical debate and further improvements in future as new tools and insights become available. I believe this is in the spirit of the Fierer Lab blog statement that “I think it is important for such concerns to be aired publicly so the broader scientific community can think about these potential issues.”

  2. Thanks Steve – very helpful response – much appreciated. I agree with nearly everything you have mentioned here. This discussion is an important one to have and I agree with the approach you advocate. Transparency is key. My only (minor) quibble is with the word ‘unavoidable’ – with good lab practices and appropriate sampling design (e.g. sufficiently high volumes) – it is possible to minimize (or perhaps even eliminate) contamination issues.

Comments are closed.