39
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      GTDB-Tk v2: memory friendly classification with the genome taxonomy database

      brief-report

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Summary

          The Genome Taxonomy Database (GTDB) and associated taxonomic classification toolkit (GTDB-Tk) have been widely adopted by the microbiology community. However, the growing size of the GTDB bacterial reference tree has resulted in GTDB-Tk requiring substantial amounts of memory (∼320 GB) which limits its adoption and ease of use. Here, we present an update to GTDB-Tk that uses a divide-and-conquer approach where user genomes are initially placed into a bacterial reference tree with family-level representatives followed by placement into an appropriate class-level subtree comprising species representatives. This substantially reduces the memory requirements of GTDB-Tk while having minimal impact on classification.

          Availability and implementation

          GTDB-Tk is implemented in Python and licenced under the GNU General Public Licence v3.0. Source code and documentation are available at: https://github.com/ecogenomics/gtdbtk.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references8

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database

          Abstract Summary The Genome Taxonomy Database Toolkit (GTDB-Tk) provides objective taxonomic assignments for bacterial and archaeal genomes based on the GTDB. GTDB-Tk is computationally efficient and able to classify thousands of draft genomes in parallel. Here we demonstrate the accuracy of the GTDB-Tk taxonomic assignments by evaluating its performance on a phylogenetically diverse set of 10 156 bacterial and archaeal metagenome-assembled genomes. Availability and implementation GTDB-Tk is implemented in Python and licenced under the GNU General Public Licence v3.0. Source code and documentation are available at: https://github.com/ecogenomics/gtdbtk. Supplementary information Supplementary data are available at Bioinformatics online.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A complete domain-to-species taxonomy for Bacteria and Archaea

            The Genome Taxonomy Database is a phylogenetically consistent, genome-based taxonomy that provides rank-normalized classifications for ~150,000 bacterial and archaeal genomes from domain to genus. However, almost 40% of the genomes in the Genome Taxonomy Database lack a species name. We address this limitation by using commonly accepted average nucleotide identity criteria to set bounds on species and propose species clusters that encompass all publicly available bacterial and archaeal genomes. Unlike previous average nucleotide identity studies, we chose a single representative genome to serve as the effective nomenclatural 'type' defining each species. Of the 24,706 proposed species clusters, 8,792 are based on published names. We assigned placeholder names to the remaining 15,914 species clusters to provide names to the growing number of genomes from uncultivated species. This resource provides a complete domain-to-species taxonomic framework for bacterial and archaeal genomes, which will facilitate research on uncultivated species and improve communication of scientific results.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy

              The Genome Taxonomy Database (GTDB; https://gtdb.ecogenomic.org ) provides a phylogenetically consistent and rank normalized genome-based taxonomy for prokaryotic genomes sourced from the NCBI Assembly database. GTDB R06-RS202 spans 254 090 bacterial and 4316 archaeal genomes, a 270% increase since the introduction of the GTDB in November, 2017. These genomes are organized into 45 555 bacterial and 2339 archaeal species clusters which is a 200% increase since the integration of species clusters into the GTDB in June, 2019. Here, we explore prokaryotic diversity from the perspective of the GTDB and highlight the importance of metagenome-assembled genomes in expanding available genomic representation. We also discuss improvements to the GTDB website which allow tracking of taxonomic changes, easy assessment of genome assembly quality, and identification of genomes assembled from type material or used as species representatives. Methodological updates and policy changes made since the inception of the GTDB are then described along with the procedure used to update species clusters in the GTDB. We conclude with a discussion on the use of average nucleotide identities as a pragmatic approach for delineating prokaryotic species.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                01 December 2022
                11 October 2022
                11 October 2022
                : 38
                : 23
                : 5315-5316
                Affiliations
                Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland , St Lucia, QLD 4072, Australia
                Research Computing Center, The University of Queensland , St Lucia, QLD 4072, Australia
                Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland , St Lucia, QLD 4072, Australia
                Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland , St Lucia, QLD 4072, Australia
                Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland , St Lucia, QLD 4072, Australia
                Author notes
                To whom correspondence should be addressed. Email: p.chaumeil@ 123456uq.edu.au or donovan.parks@ 123456gmail.com
                Author information
                https://orcid.org/0000-0003-0426-8445
                https://orcid.org/0000-0002-9988-0866
                https://orcid.org/0000-0001-6662-9010
                Article
                btac672
                10.1093/bioinformatics/btac672
                9710552
                36218463
                59492384-a06c-44e9-ba8e-5a14ddeb6759
                © The Author(s) 2022. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 10 July 2022
                : 23 September 2022
                : 03 October 2022
                : 07 October 2022
                : 25 October 2022
                Page count
                Pages: 2
                Funding
                Funded by: UQ Strategic Funding and Australian Research Council Laureate Fellowship;
                Award ID: FL150100038
                Categories
                Applications Note
                Genome Analysis
                AcademicSubjects/SCI01060

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article