Archive: Microbial Community Data in R: Introducing mctoolsr
By Jon Leff
June 14, 2016
The problem
There - I did it. I made it through processing my raw amplicon (marker gene, 16S rRNA, 18S rRNA, or fungal ITS, etc.) DNA sequences. Now I have a table that tells me how many of each taxon (i.e. phylotype, OTU, etc.) are represented in each sample. This table is often referred to as an OTU table, but I prefer to call it a 'taxa table' to avoid using acronyms. My next step in a microbial community study is usually to say something about how the microbial communities (i.e. microbiomes) in individual samples are similar or different across sample types and to determine which taxa are responsible for driving these differences. What is the best way to perform these analyses? [Please comment below with your opinions.]
Some of these analyses can be performed using the software used to generate the taxa table (e.g., QIIME), but I've found that performing them using the software tool R provides the highest level of flexibility to do the analyses you want. The problem is, R is not exactly user friendly, and after conducting several microbial community studies, I've found myself doing repetitive tasks in R over and over again.
Potential help
To simplify these analyses in R, I created an R package called mctoolsr (typically pronounced "M-C-tools-R"), which stands for "microbial community analysis tools in R". The intention of the package is to make it quicker to manipulate data and perform certain analyses common to microbial community studies while allowing the the user to easily access their data using functions they've already created or will create. mctoolsr is meant for users who are already familiar with R, but you don't need to be an R expert to use it (if you aren't familiar with R, one good place to learn the basics is Swirl). Also, the intention was not to replace other analysis tools. For example, phyloseq contains some similar tools to mctoolsr and a bunch of other useful functions, but I wanted to create a package that functioned more simply, was intuitive to me, and stored data in familiar R objects such as lists and data frames.
I will briefly make the case that using an R package, such as mctoolsr, for your analyses has attractive benefits over creating R scripts from scratch:
It saves time. Functions are at your fingertips, and you don't need to think about the details of performing each minute analysis.
Analyses are more reproducible.
The functions are tested.
You can add to the functions available in the package, making them available to others or for a future you.
Less common analyses can be performed by mixing functions from the package with those you create for one-time use
Try it out
For those interested in trying mctoolsr, there are details available here: http://leffj.github.io/mctoolsr/. It is in the early stages of development, so post any issues you have. Feel free to share more general comments on this blog post below. It would be great to hear others' thoughts on the best ways to analyze microbial community data. If you think mctoolsr is useful and would like to contribute to its development, please contact me via email here.
Good luck with your analyses!
Jonathan Leff