Fast gap-affine pairwise alignment using the wavefront algorithm

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Motivation

Pairwise alignment of sequences is a fundamental method in modern molecular biology, implemented within multiple bioinformatics tools and libraries. Current advances in sequencing technologies press for the development of faster pairwise alignment algorithms that can scale with increasing read lengths and production yields.

Results

In this article, we present the wavefront alignment algorithm (WFA), an exact gap-affine algorithm that takes advantage of homologous regions between the sequences to accelerate the alignment process. As opposed to traditional dynamic programming algorithms that run in quadratic time, the WFA runs in time O( ns), proportional to the read length n and the alignment score s, using $O (s^{2})$ memory. Furthermore, our algorithm exhibits simple data dependencies that can be easily vectorized, even by the automatic features of modern compilers, for different architectures, without the need to adapt the code. We evaluate the performance of our algorithm, together with other state-of-the-art implementations. As a result, we demonstrate that the WFA runs 20–300× faster than other methods aligning short Illumina-like sequences, and 10–100× faster using long noisy reads like those produced by Oxford Nanopore Technologies.

Availability and implementation

The WFA algorithm is implemented within the wavefront-aligner library, and it is publicly available at https://github.com/smarco/WFA.

Related collections

Most cited references 30

Record: found
Abstract: found
Article: not found

Fast gapped-read alignment with Bowtie 2.

Ben Langmead, Steven L Salzberg (2022)

As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

0 comments Cited 12798 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Minimap2: pairwise alignment for nucleotide sequences

Heng Li (2018)

Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms.

0 comments Cited 3808 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A framework for variation discovery and genotyping using next-generation DNA sequencing data

M.A. DePristo, E Banks, R.E. Poplin … (2011)

Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.

0 comments Cited 1990 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Peter Robinson: Role: Associate Editor

Journal

Journal ID (nlm-ta): Bioinformatics

Journal ID (iso-abbrev): Bioinformatics

Journal ID (publisher-id): bioinformatics

Title: Bioinformatics

Publisher: Oxford University Press

ISSN (Print): 1367-4803

ISSN (Electronic): 1367-4811

Publication date Collection: 15 February 2021

Publication date (Electronic): 11 September 2020

Publication date PMC-release: 11 September 2020

Volume: 37

Issue: 4

Pages: 456-463

Affiliations

[1 ]Department of Computer Sciences, Barcelona Supercomputing Center , Barcelona 08034, Spain

[2 ]Departament d’Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona , Barcelona 08193, Spain

[3 ]Departament d’Arquitectura de Computadors, Universitat Politècnica de Catalunya , Barcelona 08034, Spain

Author notes

To whom correspondence should be addressed. santiagomsola@ 123456gmail.com

Author information

Santiago Marco-Sola https://orcid.org/0000-0001-7951-3914

Article

Publisher ID: btaa777

DOI: 10.1093/bioinformatics/btaa777

PMC ID: 8355039

PubMed ID: 32915952

SO-VID: 98bcf7ae-7717-430d-a70b-f1cf67e85447

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

History

Date received : 08 April 2020

Date revision received : 22 July 2020

Date: 26 August 2020

Date accepted : 01 September 2020

Page count

Pages: 8

Funding

Funded by: European Unions’s Horizon 2020 Framework Programme under the DeepHealth;

Award ID: 825111

Funded by: European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020;

Funded by: DRAC;

Award ID: 001-P-001723

Funded by: MINECO-Spain;

Award ID: TIN2017-84553-C2-1-R

Funded by: TIN2015-65316-P;

Funded by: Catalan government;

Award ID: 2017-SGR-313

Award ID: 2017-SGR-1328

Award ID: 2017-SGR-1414

Funded by: Spanish Ministry of Economy, Industry and Competitiveness;

Award ID: RYC-2016-21104

Comments

Comment on this article

scite_

Cited by 28

See all cited by

Most referenced authors 1,973

See all reference authors

- Version 1

Fast gap-affine pairwise alignment using the wavefront algorithm

Read this article at

Abstract

Motivation

Results

Availability and implementation

Related collections

Genetoberfest

Most cited references 30

Fast gapped-read alignment with Bowtie 2.

Minimap2: pairwise alignment for nucleotide sequences

A framework for variation discovery and genotyping using next-generation DNA sequencing data

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 306

Cited by 28

Most referenced authors 1,973