13
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Sensitive protein alignments at tree-of-life scale using DIAMOND

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We are at the beginning of a genomic revolution in which all known species are planned to be sequenced. Accessing such data for comparative analyses is crucial in this new age of data-driven biology. Here, we introduce an improved version of DIAMOND that greatly exceeds previous search performances and harnesses supercomputing to perform tree-of-life scale protein alignments in hours, while matching the sensitivity of the gold standard BLASTP.

          Abstract

          An updated version of DIAMOND uses improved algorithmic procedures and a customized high-performance computing framework to make seemingly prohibitive large-scale protein sequence alignments feasible.

          Related collections

          Most cited references28

          • Record: found
          • Abstract: found
          • Article: not found

          Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

          S Altschul (1997)
          The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Minimap2: pairwise alignment for nucleotide sequences

            Heng Li (2018)
            Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Search and clustering orders of magnitude faster than BLAST.

              Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets. Binaries are available at no charge for non-commercial use at http://www.drive5.com/usearch.
                Bookmark

                Author and article information

                Contributors
                hajk-georg.drost@tuebingen.mpg.de
                Journal
                Nat Methods
                Nat Methods
                Nature Methods
                Nature Publishing Group US (New York )
                1548-7091
                1548-7105
                7 April 2021
                7 April 2021
                2021
                : 18
                : 4
                : 366-368
                Affiliations
                [1 ]GRID grid.419495.4, ISNI 0000 0001 1014 8330, Computational Biology Group, , Max Planck Institute for Developmental Biology, ; Tübingen, Germany
                [2 ]GRID grid.470196.d, Max Planck Computing and Data Facility, ; Garching, Germany
                Author information
                http://orcid.org/0000-0001-7658-731X
                http://orcid.org/0000-0001-6869-7877
                http://orcid.org/0000-0002-1567-306X
                Article
                1101
                10.1038/s41592-021-01101-x
                8026399
                33828273
                62cd67ca-ebe7-408d-96db-b6300fe06be9
                © The Author(s) 2021

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 23 July 2020
                : 22 February 2021
                Funding
                Funded by: FundRef https://doi.org/10.13039/501100004189, Max-Planck-Gesellschaft (Max Planck Society);
                Categories
                Brief Communication
                Custom metadata
                © The Author(s), under exclusive licence to Springer Nature America, Inc. 2021

                Life sciences
                genomic analysis,sequencing,software,computational biology and bioinformatics,genome informatics

                Comments

                Comment on this article