3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      VALENCIA: a nearest centroid classification method for vaginal microbial communities based on composition

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Taxonomic profiles of vaginal microbial communities can be sorted into a discrete number of categories termed community state types (CSTs). This approach is advantageous because collapsing a hyper-dimensional taxonomic profile into a single categorical variable enables efforts such as data exploration, epidemiological studies, and statistical modeling. Vaginal communities are typically assigned to CSTs based on the results of hierarchical clustering of the pairwise distances between samples. However, this approach is problematic because it complicates between-study comparisons and because the results are entirely dependent on the particular set of samples that were analyzed. We sought to standardize and advance the assignment of samples to CSTs.

          Results

          We developed VALENCIA ( VAgina L community state typ E Nearest Centro Id cl Assifier), a nearest centroid-based tool which classifies samples based on their similarity to a set of reference centroids. The references were defined using a comprehensive set of 13,160 taxonomic profiles from 1975 women in the USA. This large dataset allowed us to comprehensively identify, define, and characterize vaginal CSTs common to reproductive age women and expand upon the CSTs that had been defined in previous studies. We validated the broad applicability of VALENCIA for the classification of vaginal microbial communities by using it to classify three test datasets which included reproductive age eastern and southern African women, adolescent girls, and a racially/ethnically and geographically diverse sample of postmenopausal women. VALENCIA performed well on all three datasets despite the substantial variations in sequencing strategies and bioinformatics pipelines, indicating its broad application to vaginal microbiota. We further describe the relationships between community characteristics (vaginal pH, Nugent score) and participant demographics (race, age) and the CSTs defined by VALENCIA.

          Conclusion

          VALENCIA provides a much-needed solution for the robust and reproducible assignment of vaginal community state types. This will allow unbiased analysis of both small and large vaginal microbiota datasets, comparisons between datasets and meta-analyses that combine multiple datasets.

          Supplementary information

          Supplementary information accompanies this paper at 10.1186/s40168-020-00934-6.

          Related collections

          Most cited references52

          • Record: found
          • Abstract: found
          • Article: not found

          DADA2: High resolution sample inference from Illumina amplicon data

          We present DADA2, a software package that models and corrects Illumina-sequenced amplicon errors. DADA2 infers sample sequences exactly, without coarse-graining into OTUs, and resolves differences of as little as one nucleotide. In several mock communities DADA2 identified more real variants and output fewer spurious sequences than other methods. We applied DADA2 to vaginal samples from a cohort of pregnant women, revealing a diversity of previously undetected Lactobacillus crispatus variants.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The SILVA ribosomal RNA gene database project: improved data processing and web-based tools

            SILVA (from Latin silva, forest, http://www.arb-silva.de) is a comprehensive web resource for up to date, quality-controlled databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services. The referred database release 111 (July 2012) contains 3 194 778 small subunit and 288 717 large subunit rRNA gene sequences. Since the initial description of the project, substantial new features have been introduced, including advanced quality control procedures, an improved rRNA gene aligner, online tools for probe and primer evaluation and optimized browsing, searching and downloading on the website. Furthermore, the extensively curated SILVA taxonomy and the new non-redundant SILVA datasets provide an ideal reference for high-throughput classification of data from next-generation sequencing approaches.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy.

              The Ribosomal Database Project (RDP) Classifier, a naïve Bayesian classifier, can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryotes (2nd ed., release 5.0, Springer-Verlag, New York, NY, 2004). It provides taxonomic assignments from domain to genus, with confidence estimates for each assignment. The majority of classifications (98%) were of high estimated confidence (> or = 95%) and high accuracy (98%). In addition to being tested with the corpus of 5,014 type strain sequences from Bergey's outline, the RDP Classifier was tested with a corpus of 23,095 rRNA sequences as assigned by the NCBI into their alternative higher-order taxonomy. The results from leave-one-out testing on both corpora show that the overall accuracies at all levels of confidence for near-full-length and 400-base segments were 89% or above down to the genus level, and the majority of the classification errors appear to be due to anomalies in the current taxonomies. For shorter rRNA segments, such as those that might be generated by pyrosequencing, the error rate varied greatly over the length of the 16S rRNA gene, with segments around the V2 and V4 variable regions giving the lowest error rates. The RDP Classifier is suitable both for the analysis of single rRNA sequences and for the analysis of libraries of thousands of sequences. Another related tool, RDP Library Compare, was developed to facilitate microbial-community comparison based on 16S rRNA gene sequence libraries. It combines the RDP Classifier with a statistical test to flag taxa differentially represented between samples. The RDP Classifier and RDP Library Compare are available online at http://rdp.cme.msu.edu/.
                Bookmark

                Author and article information

                Contributors
                jravel@som.umaryland.edu
                Journal
                Microbiome
                Microbiome
                Microbiome
                BioMed Central (London )
                2049-2618
                23 November 2020
                23 November 2020
                2020
                : 8
                : 166
                Affiliations
                [1 ]GRID grid.411024.2, ISNI 0000 0001 2175 4264, Institute for Genome Sciences, , University of Maryland School of Medicine, ; Baltimore, MD USA
                [2 ]GRID grid.411024.2, ISNI 0000 0001 2175 4264, Department of Microbiology and Immunology, , University of Maryland School of Medicine, ; Baltimore, USA
                [3 ]GRID grid.411024.2, ISNI 0000 0001 2175 4264, Department of Epidemiology and Public Health, , University of Maryland School of Medicine, ; Baltimore, USA
                [4 ]GRID grid.27860.3b, ISNI 0000 0004 1936 9684, Department of Obstetrics and Gynecology, , University of California Davis School of Medicine, ; Sacramento, USA
                Author information
                http://orcid.org/0000-0002-0851-2233
                Article
                934
                10.1186/s40168-020-00934-6
                7684964
                33228810
                d84b6bb1-75e5-41c4-996b-95721ea21f81
                © The Author(s) 2020

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 16 September 2020
                : 6 October 2020
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000049, National Institute on Aging;
                Award ID: U01AG012531
                Award ID: U01AG012539
                Award ID: U01AG012546
                Award ID: U01AG012505
                Award ID: U01AG012535
                Award ID: U01AG012553
                Award ID: U01AG012554
                Award ID: U01AG012495
                Funded by: FundRef http://dx.doi.org/10.13039/100000060, National Institute of Allergy and Infectious Diseases;
                Award ID: U19AI084044
                Award ID: UH2AI083264
                Award ID: R01AI119012
                Funded by: FundRef http://dx.doi.org/10.13039/100000056, National Institute of Nursing Research;
                Award ID: R01NR015495
                Award ID: U01NR004061
                Categories
                Research
                Custom metadata
                © The Author(s) 2020

                Comments

                Comment on this article