Archive: Contamination in “Aerobiome” Studies - A Cautionary Tale

Oct 3

Noah Fierer

2/10/2022

Studying the microbes found in outdoor air is inherently difficult. Not only is the atmosphere a highly dynamic system, but cell concentrations are typically very low, making sampling and downstream analyses challenging. Contamination is a persistent problem and concern when working with these types of samples, regardless of the method employed for microbial analyses. Contaminants can be introduced during the sampling itself, from reagents/consumables used during sample processing, or from cross-contamination with other samples (see old post here). Contamination not only affects data quality – it can fundamentally alter study conclusions.

As a case in point, I want to focus on a recent study published by Kobziar et al “Wildland fire smoke alters the composition, diversity, and potential atmospheric function of microbial life in the aerobiome”. I am singling out this paper as an example of the potential contamination issues that appear to be surprisingly common in the published ‘aerobiome’ literature. This paper is definitely not the only ‘aerobiome’ paper that appears to suffer from important contamination issues – it is just the most recent one I have come across.

I was excited to read this paper. The authors deployed drones (drones!) to collect air samples onto filters and then used a range of methods to investigate the aerosolization of bacteria and fungi during wildfire events. However, as I started delving into their paper – it immediately raised a number of ‘red flags’. There are multiple lines of evidence suggesting that an appreciable number of their samples are likely contaminated and that the level of contamination is sufficient to affect their conclusions:

The authors report that their ambient samples had, on average, 520 cells per filter (determined via microscopy). Such low cell numbers are to be expected given that they were only able to collect 20L of air onto individual filters. This is not problematic per se. What is problematic is that their blanks had 360 cells per filter and the estimate of 520 cells per filter was obtained by subtracting the ‘blank’ cell numbers from the sample numbers. This means that out of the 880 cells per filter (550 + 320 cells, on average), ~40% of these cells were contaminants. This seems to be a high number and suggests that DNA-based analyses would suffer from similar problems with a large fraction of the DNA (on average) likely derived from contaminants. In fact, this seems to be the case.

The supplement of the paper includes a list of taxa that were removed from the sequence data as they also appeared to show up in the ‘blanks’. This includes some taxa that are strongly indicative of inadvertent contamination (including Propionibacterium, Staphylococcus, Malassezia, and mammals). After exchanging a series of emails with the authors, it turns out that those sequences which were removed represented roughly a quarter or more of all the sequences they obtained from their samples. This % of ‘contaminant’ sequences ranged from 5% to 48% of total sequence reads, with the extent of contamination variable across their sample categories and across their targeted taxa. In my opinion, this level of contamination is high and the samples (or at least some of the samples) should not have been included. The authors clearly disagree with my assertion, but I think it is important for readers to recognize that contaminants were abundant in their sequence data. I know there are numerous published studies that have used a similar approach to remove contaminant sequences, but trying to salvage datasets by simply subtracting contaminants seems tenuous when the levels of contamination are substantial.

Some of the more abundant taxa found in their samples (even after removal of those that were found identified as contaminants) are very unexpected. This list includes E. coli, Corynebacterium, Lactobacillus, and Streptococus (see Figure 3) These are common contaminants (see here) and taxa that are typically very abundant on human skin (see here). Yes – we often see these taxa in indoor air (because human skin is an important source of indoor microbes), but it is surprising to see these taxa in their samples, especially given that they were sampling air above a forest.

Note that it is very difficult to _prove_ contamination, especially when I am working with limited information. Also – I note that I raised these concerns with the authors and, to their credit, they were very helpful in answering my questions. They do not argue with the facts presented above, but they do stand by their results and conclusions. Regardless, I think it is important for such concerns to be aired publicly so the broader scientific community can think about these potential issues. Investigating the ‘aerobiome’ is difficult and the study by Kobziar et al. emphasizes the challenges encountered when dealing with extremely low biomass samples. We can debate the extent of contamination in these samples and whether such contamination might have affected their conclusions, but readers should be vigilant when evaluating published ‘aerobiome’ studies or designing their own studies. Contamination is a persistent problem and a problem that it is difficult to address when sampling relatively small volumes of air with low cell concentrations.

Jamie Micciulla

Archive: Contamination in “Aerobiome” Studies - A Cautionary Tale

Archive: Unclogging the Peer Review Pipeline