24
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Fast gap-affine pairwise alignment using the wavefront algorithm

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          Pairwise alignment of sequences is a fundamental method in modern molecular biology, implemented within multiple bioinformatics tools and libraries. Current advances in sequencing technologies press for the development of faster pairwise alignment algorithms that can scale with increasing read lengths and production yields.

          Results

          In this article, we present the wavefront alignment algorithm (WFA), an exact gap-affine algorithm that takes advantage of homologous regions between the sequences to accelerate the alignment process. As opposed to traditional dynamic programming algorithms that run in quadratic time, the WFA runs in time O( ns), proportional to the read length n and the alignment score s, using O ( s 2 ) memory. Furthermore, our algorithm exhibits simple data dependencies that can be easily vectorized, even by the automatic features of modern compilers, for different architectures, without the need to adapt the code. We evaluate the performance of our algorithm, together with other state-of-the-art implementations. As a result, we demonstrate that the WFA runs 20–300× faster than other methods aligning short Illumina-like sequences, and 10–100× faster using long noisy reads like those produced by Oxford Nanopore Technologies.

          Availability and implementation

          The WFA algorithm is implemented within the wavefront-aligner library, and it is publicly available at https://github.com/smarco/WFA.

          Related collections

          Most cited references30

          • Record: found
          • Abstract: found
          • Article: not found

          Fast gapped-read alignment with Bowtie 2.

          As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Minimap2: pairwise alignment for nucleotide sequences

            Heng Li (2018)
            Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A framework for variation discovery and genotyping using next-generation DNA sequencing data

              Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                15 February 2021
                11 September 2020
                11 September 2020
                : 37
                : 4
                : 456-463
                Affiliations
                [1 ]Department of Computer Sciences, Barcelona Supercomputing Center , Barcelona 08034, Spain
                [2 ]Departament d’Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona , Barcelona 08193, Spain
                [3 ]Departament d’Arquitectura de Computadors, Universitat Politècnica de Catalunya , Barcelona 08034, Spain
                Author notes
                To whom correspondence should be addressed. santiagomsola@ 123456gmail.com
                Author information
                https://orcid.org/0000-0001-7951-3914
                Article
                btaa777
                10.1093/bioinformatics/btaa777
                8355039
                32915952
                98bcf7ae-7717-430d-a70b-f1cf67e85447
                © The Author(s) 2020. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 08 April 2020
                : 22 July 2020
                : 26 August 2020
                : 01 September 2020
                Page count
                Pages: 8
                Funding
                Funded by: European Unions’s Horizon 2020 Framework Programme under the DeepHealth;
                Award ID: 825111
                Funded by: European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020;
                Funded by: DRAC;
                Award ID: 001-P-001723
                Funded by: MINECO-Spain;
                Award ID: TIN2017-84553-C2-1-R
                Funded by: TIN2015-65316-P;
                Funded by: Catalan government;
                Award ID: 2017-SGR-313
                Award ID: 2017-SGR-1328
                Award ID: 2017-SGR-1414
                Funded by: Spanish Ministry of Economy, Industry and Competitiveness;
                Award ID: RYC-2016-21104
                Categories
                Original Papers
                Sequence Analysis
                AcademicSubjects/SCI01060

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article