126
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and closed-reference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because more of our algorithm can be run in parallel relative to “classic” open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though on smaller data sets, “classic” open-reference OTU clustering is often faster). We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons). To the best of our knowledge, this is the largest OTU picking run ever performed, and we estimate that our new algorithm runs in less than 1/5 the time than would be required of “classic” open reference OTU picking. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by “classic” open-reference OTU picking through comparisons on three well-studied datasets. An implementation of this algorithm is provided in the popular QIIME software package, which uses uclust for read clustering. All analyses were performed using QIIME’s uclust wrappers, though we provide details (aided by the open-source code in our GitHub repository) that will allow implementation of subsampled open-reference OTU picking independently of QIIME (e.g., in a compiled programming language, where runtimes should be further reduced). Our analyses should generalize to other implementations of these OTU picking algorithms. Finally, we present a comparison of parameter settings in QIIME’s OTU picking workflows and make recommendations on settings for these free parameters to optimize runtime without reducing the quality of the results. These optimized parameters can vastly decrease the runtime of uclust-based OTU picking in QIIME.

          Related collections

          Most cited references5

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Oligotyping: differentiating between closely related microbial taxa using 16S rRNA gene data

          Bacteria comprise the most diverse domain of life on Earth, where they occupy nearly every possible ecological niche and play key roles in biological and chemical processes. Studying the composition and ecology of bacterial ecosystems and understanding their function are of prime importance. High-throughput sequencing technologies enable nearly comprehensive descriptions of bacterial diversity through 16S ribosomal RNA gene amplicons. Analyses of these communities generally rely upon taxonomic assignments through reference data bases or clustering approaches using de facto sequence similarity thresholds to identify operational taxonomic units. However, these methods often fail to resolve ecologically meaningful differences between closely related organisms in complex microbial data sets. In this paper, we describe oligotyping, a novel supervised computational method that allows researchers to investigate the diversity of closely related but distinct bacterial organisms in final operational taxonomic units identified in environmental data sets through 16S ribosomal RNA gene data by the canonical approaches. Our analysis of two data sets from two different environments demonstrates the capacity of oligotyping at discriminating distinct microbial populations of ecological importance. Oligotyping can resolve the distribution of closely related organisms across environments and unveil previously overlooked ecological patterns for microbial communities. The URL http://oligotyping.org offers an open-source software pipeline for oligotyping.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Advancing our understanding of the human microbiome using QIIME.

            High-throughput DNA sequencing technologies, coupled with advanced bioinformatics tools, have enabled rapid advances in microbial ecology and our understanding of the human microbiome. QIIME (Quantitative Insights Into Microbial Ecology) is an open-source bioinformatics software package designed for microbial community analysis based on DNA sequence data, which provides a single analysis framework for analysis of raw sequence data through publication-quality statistical analyses and interactive visualizations. In this chapter, we demonstrate the use of the QIIME pipeline to analyze microbial communities obtained from several sites on the bodies of transgenic and wild-type mice, as assessed using 16S rRNA gene sequences generated on the Illumina MiSeq platform. We present our recommended pipeline for performing microbial community analysis and provide guidelines for making critical choices in the process. We present examples of some of the types of analyses that are enabled by QIIME and discuss how other tools, such as phyloseq and R, can be applied to expand upon these analyses. © 2013 Elsevier Inc. All rights reserved.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Meeting Report: The Terabase Metagenomics Workshop and the Vision of an Earth Microbiome Project

              Between July 18th and 24th 2010, 26 leading microbial ecology, computation, bioinformatics and statistics researchers came together in Snowbird, Utah (USA) to discuss the challenge of how to best characterize the microbial world using next-generation sequencing technologies. The meeting was entitled “Terabase Metagenomics” and was sponsored by the Institute for Computing in Science (ICiS) summer 2010 workshop program. The aim of the workshop was to explore the fundamental questions relating to microbial ecology that could be addressed using advances in sequencing potential. Technological advances in next-generation sequencing platforms such as the Illumina HiSeq 2000 can generate in excess of 250 billion base pairs of genetic information in 8 days. Thus, the generation of a trillion base pairs of genetic information is becoming a routine matter. The main outcome from this meeting was the birth of a concept and practical approach to exploring microbial life on earth, the Earth Microbiome Project (EMP). Here we briefly describe the highlights of this meeting and provide an overview of the EMP concept and how it can be applied to exploration of the microbiome of each ecosystem on this planet.
                Bookmark

                Author and article information

                Contributors
                Journal
                PeerJ
                PeerJ
                PeerJ
                PeerJ
                PeerJ
                PeerJ Inc. (San Francisco, USA )
                2167-8359
                21 August 2014
                2014
                : 2
                : e545
                Affiliations
                [1 ]Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
                [2 ]Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai , New York, NY, USA
                [3 ]State Key Laboratory of Organ Failure Prevention, and Department of Environmental Health, School of Public Health and Tropical Medicine, Southern Medical University , Guangzhou, Guangdong, China
                [4 ]Department of Computer Science, University of Colorado Boulder , Boulder, CO, USA
                [5 ]Department of Molecular, Cellular, and Developmental Biology, University of Colorado at Boulder , Boulder, CO, USA
                [6 ]Department of Chemistry and Biochemistry, University of Colorado at Boulder , Boulder, CO, USA
                [7 ]Graduate Program in Biophysical Sciences, University of Chicago , Chicago, IL, USA
                [8 ]Department of Biological Sciences, Northern Arizona University , AZ, USA
                [9 ]BioFrontiers Institute, University of Colorado at Boulder , Boulder, CO, USA
                [10 ]Institute for Genomics and Systems Biology, Argonne National Laboratory , Lemont, IL, USA
                [11 ]Department of Ecology and Evolution, University of Chicago , Chicago, IL, USA
                [12 ]Department of Pathology and Laboratory Science, Warren Alpert Medical School, Brown University , Providence, RI, USA
                [13 ]Howard Hughes Medical Institute , Boulder, CO, USA
                Article
                545
                10.7717/peerj.545
                4145071
                25177538
                12a9ae4c-1f15-4cde-8197-7cfabca25e69
                © 2014 Rideout et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

                History
                : 10 June 2014
                : 6 August 2014
                Funding
                Funded by: EPA STAR Graduate Fellowship
                Funded by: NSF IGERT
                Award ID: 1144807
                Funded by: Arizona’s Technology and Research Initiative Fund
                Funded by: Alfred P. Sloan Foundation
                Award ID: 2012-5-42 MBRP
                SMG was supported by an EPA STAR Graduate Fellowship. DM was supported in part by NSF IGERT (award number: 1144807). This work was partially supported by a grant from Arizona’s Technology and Research Initiative Fund to JGC, and by a grant from the Alfred P. Sloan Foundation to JGC and RK (award number: 2012-5-42 MBRP). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Bioinformatics
                Ecology
                Microbiology

                otu picking,microbial ecology,microbiome,qiime,bioinformatics

                Comments

                Comment on this article