8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      PanACoTA: a modular tool for massive microbial comparative genomics

      research-article
      ,
      NAR Genomics and Bioinformatics
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The study of the gene repertoires of microbial species, their pangenomes, has become a key part of microbial evolution and functional genomics. Yet, the increasing number of genomes available complicates the establishment of the basic building blocks of comparative genomics. Here, we present PanACoTA ( https://github.com/gem-pasteur/PanACoTA), a tool that allows to download all genomes of a species, build a database with those passing quality and redundancy controls, uniformly annotate and then build their pangenome, several variants of core genomes, their alignments and a rapid but accurate phylogenetic tree. While many programs building pangenomes have become available in the last few years, we have focused on a modular method, that tackles all the key steps of the process, from download to phylogenetic inference. While all steps are integrated, they can also be run separately and multiple times to allow rapid and extensive exploration of the parameters of interest. PanACoTA is built in Python3, includes a singularity container and features to facilitate its future development. We believe PanACoTa is an interesting addition to the current set of comparative genomics tools, since it will accelerate and standardize the more routine parts of the work, allowing microbial genomicists to more quickly tackle their specific questions.

          Related collections

          Most cited references56

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

          We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Basic local alignment search tool.

            A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies

              Large phylogenomics data sets require fast tree inference methods, especially for maximum-likelihood (ML) phylogenies. Fast programs exist, but due to inherent heuristics to find optimal trees, it is not clear whether the best tree is found. Thus, there is need for additional approaches that employ different search strategies to find ML trees and that are at the same time as fast as currently available ML programs. We show that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented. If we allow the same CPU time as RAxML and PhyML, then our software IQ-TREE found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space. If we use the IQ-TREE stopping rule, RAxML and PhyML are faster in 75.7% and 47.1% of the DNA alignments and 42.2% and 100% of the protein alignments, respectively. However, the range of obtaining higher likelihoods with IQ-TREE improves to 73.3-97.1%.
                Bookmark

                Author and article information

                Contributors
                Journal
                NAR Genom Bioinform
                NAR Genom Bioinform
                nargab
                NAR Genomics and Bioinformatics
                Oxford University Press
                2631-9268
                March 2021
                12 January 2021
                12 January 2021
                : 3
                : 1
                : lqaa106
                Affiliations
                Microbial Evolutionary Genomics , CNRS, UMR3525, Institut Pasteur, 28, rue Dr Roux, Paris 75015, France
                Sorbonne Universite , College doctoral, F-75005 Paris, France
                Bioinformatics and Biostatistics Hub, Department of Computational Biology, Institut Pasteur , USR 3756 CNRS, 28, rue Dr Roux, Paris 75015, France
                Microbial Evolutionary Genomics , CNRS, UMR3525, Institut Pasteur, 28, rue Dr Roux, Paris 75015, France
                Author notes
                To whom correspondence should be addressed. ?Tel: +33 1 45 68 89 83; Fax: +33 1 45 68 87 27; Email: amandine.perrin@ 123456pasteur.fr
                Author information
                http://orcid.org/0000-0003-4797-6185
                http://orcid.org/0000-0001-7704-822X
                Article
                lqaa106
                10.1093/nargab/lqaa106
                7803007
                33575648
                4bb5e74c-6c11-4798-a290-0f1a2a999440
                © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 16 September 2020
                : 10 November 2020
                : 01 December 2020
                Page count
                Pages: 12
                Funding
                Funded by: Agence Nationale de la Recherche Salmo_Prophages;
                Award ID: ANR-16-CE16-0029
                Funded by: Agence Nationale de la Recherche Inception program;
                Award ID: PIA/ANR-16-CONV-0005
                Funded by: Equipe Federation pour la recherche médicale;
                Award ID: EQU201903007835
                Categories
                AcademicSubjects/SCI00030
                AcademicSubjects/SCI00980
                AcademicSubjects/SCI01060
                AcademicSubjects/SCI01140
                AcademicSubjects/SCI01180
                Standard Article

                Comments

                Comment on this article