3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Phylogenetic inferences under the maximum likelihood criterion deploy heuristic tree search strategies to explore the vast search space. Depending on the input dataset, searches from different starting trees might all converge to a single tree topology. Often, though, distinct searches infer multiple topologies with large log-likelihood score differences or yield topologically highly distinct, yet almost equally likely, trees. Recently, Haag et al. introduced an approach to quantify, and implemented machine learning methods to predict, the dataset difficulty with respect to phylogenetic inference. Easy multiple sequence alignments (MSAs) exhibit a single likelihood peak on their likelihood surface, associated with a single tree topology to which most, if not all, independent searches rapidly converge. As difficulty increases, multiple locally optimal likelihood peaks emerge, yet from highly distinct topologies. To make use of this information, we introduce and implement an adaptive tree search heuristic in RAxML-NG, which modifies the thoroughness of the tree search strategy as a function of the predicted difficulty. Our adaptive strategy is based upon three observations. First, on easy datasets, searches converge rapidly and can hence be terminated at an earlier stage. Second, overanalyzing difficult datasets is hopeless, and thus it suffices to quickly infer only one of the numerous almost equally likely topologies to reduce overall execution time. Third, more extensive searches are justified and required on datasets with intermediate difficulty. While the likelihood surface exhibits multiple locally optimal peaks in this case, a small proportion of them is significantly better. Our experimental results for the adaptive heuristic on 9,515 empirical and 5,000 simulated datasets with varying difficulty exhibit substantial speedups, especially on easy and difficult datasets ( 53 % of total MSAs), where we observe average speedups of more than 10 × . Further, approximately 94 % of the inferred trees using the adaptive strategy are statistically indistinguishable from the trees inferred under the standard strategy (RAxML-NG).

          Related collections

          Most cited references26

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

          Motivation: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU GPL at https://github.com/stamatak/standard-RAxML. Contact: alexandros.stamatakis@h-its.org Supplementary information: Supplementary data are available at Bioinformatics online.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies

            Large phylogenomics data sets require fast tree inference methods, especially for maximum-likelihood (ML) phylogenies. Fast programs exist, but due to inherent heuristics to find optimal trees, it is not clear whether the best tree is found. Thus, there is need for additional approaches that employ different search strategies to find ML trees and that are at the same time as fast as currently available ML programs. We show that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented. If we allow the same CPU time as RAxML and PhyML, then our software IQ-TREE found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space. If we use the IQ-TREE stopping rule, RAxML and PhyML are faster in 75.7% and 47.1% of the DNA alignments and 42.2% and 100% of the protein alignments, respectively. However, the range of obtaining higher likelihoods with IQ-TREE improves to 73.3-97.1%.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments

              Background We recently described FastTree, a tool for inferring phylogenies for alignments with up to hundreds of thousands of sequences. Here, we describe improvements to FastTree that improve its accuracy without sacrificing scalability. Methodology/Principal Findings Where FastTree 1 used nearest-neighbor interchanges (NNIs) and the minimum-evolution criterion to improve the tree, FastTree 2 adds minimum-evolution subtree-pruning-regrafting (SPRs) and maximum-likelihood NNIs. FastTree 2 uses heuristics to restrict the search for better trees and estimates a rate of evolution for each site (the “CAT” approximation). Nevertheless, for both simulated and genuine alignments, FastTree 2 is slightly more accurate than a standard implementation of maximum-likelihood NNIs (PhyML 3 with default settings). Although FastTree 2 is not quite as accurate as methods that use maximum-likelihood SPRs, most of the splits that disagree are poorly supported, and for large alignments, FastTree 2 is 100–1,000 times faster. FastTree 2 inferred a topology and likelihood-based local support values for 237,882 distinct 16S ribosomal RNAs on a desktop computer in 22 hours and 5.8 gigabytes of memory. Conclusions/Significance FastTree 2 allows the inference of maximum-likelihood phylogenies for huge alignments. FastTree 2 is freely available at http://www.microbesonline.org/fasttree.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Mol Biol Evol
                Mol Biol Evol
                molbev
                Molecular Biology and Evolution
                Oxford University Press (US )
                0737-4038
                1537-1719
                October 2023
                06 October 2023
                06 October 2023
                : 40
                : 10
                : msad227
                Affiliations
                Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies , 69118 Heidelberg, Germany
                Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies , 69118 Heidelberg, Germany
                Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies , 69118 Heidelberg, Germany
                Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies , 69118 Heidelberg, Germany
                Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies , 69118 Heidelberg, Germany
                Institute of Theoretical Informatics, Karlsruhe Institute of Technology , 76128 Karlsruhe, Germany
                Biodiversity Computing Group, Institute of Computer Science, Foundation for Research and Technology - Hellas, GR - 711 10 Heraklion, Crete, Greece
                Author notes
                Corresponding author: E-mail: anastasis.togkousidis@ 123456h-its.org

                Conflict of interests statement None declared.

                Author information
                https://orcid.org/0000-0003-4306-3709
                https://orcid.org/0000-0001-7394-2718
                https://orcid.org/0000-0002-7493-3917
                https://orcid.org/0000-0002-4144-6709
                https://orcid.org/0000-0003-0353-0691
                Article
                msad227
                10.1093/molbev/msad227
                10584362
                37804116
                b123eb9e-8078-4ae8-b209-3d581750f927
                © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License ( https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 23 May 2023
                : 06 September 2023
                : 26 September 2023
                Page count
                Pages: 11
                Categories
                Methods
                AcademicSubjects/SCI01130
                AcademicSubjects/SCI01180

                Molecular biology
                phylogenetics,maximum likelihood,heuristics,difficulty prediction
                Molecular biology
                phylogenetics, maximum likelihood, heuristics, difficulty prediction

                Comments

                Comment on this article