Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

Related collections

Most cited references 17

Record: found
Abstract: found
Article: not found

Profile hidden Markov models.

S. Eddy (1998)

The recent literature on profile hidden Markov model (profile HMM) methods and software is reviewed. Profile HMMs turn a multiple sequence alignment into a position-specific scoring system suitable for searching databases for remotely homologous sequences. Profile HMM analyses complement standard pairwise comparison methods for large-scale sequence analysis. Several software implementations and two large libraries of profile HMMs of common protein domains are available. HMM methods performed comparably to threading methods in the CASP2 structure prediction exercise.

0 comments Cited 1254 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Protein homology detection by HMM-HMM comparison.

Johannes Söding (2005)

Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution. We have generalized the alignment of protein sequences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile HMMs. We present a method for detecting distant homologous relationships between proteins based on this approach. The method (HHsearch) is benchmarked together with BLAST, PSI-BLAST, HMMER and the profile-profile comparison tools PROF_SIM and COMPASS, in an all-against-all comparison of a database of 3691 protein domains from SCOP 1.63 with pairwise sequence identities below 20%.Sensitivity: When the predicted secondary structure is included in the HMMs, HHsearch is able to detect between 2.7 and 4.2 times more homologs than PSI-BLAST or HMMER and between 1.44 and 1.9 times more than COMPASS or PROF_SIM for a rate of false positives of 10%. Approximately half of the improvement over the profile-profile comparison methods is attributable to the use of profile HMMs in place of simple profiles. Alignment quality: Higher sensitivity is mirrored by an increased alignment quality. HHsearch produced 1.2, 1.7 and 3.3 times more good alignments ('balanced' score >0.3) than the next best method (COMPASS), and 1.6, 2.9 and 9.4 times more than PSI-BLAST, at the family, superfamily and fold level, respectively.Speed: HHsearch scans a query of 200 residues against 3691 domains in 33 s on an AMD64 2GHz PC. This is 10 times faster than PROF_SIM and 17 times faster than COMPASS.

0 comments Cited 959 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Kalign – an accurate and fast multiple sequence alignment algorithm

Timo Lassmann, Erik Sonnhammer (2005)

Background The alignment of multiple protein sequences is a fundamental step in the analysis of biological data. It has traditionally been applied to analyzing protein families for conserved motifs, phylogeny, structural properties, and to improve sensitivity in homology searching. The availability of complete genome sequences has increased the demands on multiple sequence alignment (MSA) programs. Current MSA methods suffer from being either too inaccurate or too computationally expensive to be applied effectively in large-scale comparative genomics. Results We developed Kalign, a method employing the Wu-Manber string-matching algorithm, to improve both the accuracy and speed of multiple sequence alignment. We compared the speed and accuracy of Kalign to other popular methods using Balibase, Prefab, and a new large test set. Kalign was as accurate as the best other methods on small alignments, but significantly more accurate when aligning large and distantly related sets of sequences. In our comparisons, Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods. Conclusion Kalign is a fast and robust alignment method. It is especially well suited for the increasingly important task of aligning large numbers of sequences.

0 comments Cited 215 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Mol Syst Biol

Journal ID (iso-abbrev): Mol. Syst. Biol

Title: Molecular Systems Biology

Publisher: Nature Publishing Group

ISSN (Electronic): 1744-4292

Publication date Collection: 2011

Publication date (Electronic): 11 October 2011

Publication date PMC-release: 11 October 2011

Volume: 7

Page: 539

Affiliations

[1 ]School of Medicine and Medical Science, UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin , Dublin, Ireland

[2 ]Computational and Systems Biology, Genome Institute of Singapore , Singapore

[3 ]Structural and Computational Biology Unit, European Molecular Biology Laboratory , Heidelberg, Germany

[4 ]Department of Biomolecular Engineering, University of California , Santa Cruz, CA, USA

[5 ]EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus , Hinxton, Cambridge, UK

[6 ]Gene Center Munich, University of Munich (LMU) , Muenchen, Germany

[7 ]Département de Biologie Structurale et Génomique, IGBMC (Institut de Génétique et de Biologie Moléculaire et Cellulaire), CNRS/INSERM/Université de Strasbourg , Illkirch, France

Author notes

[a ]UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland. Tel.: +353 1 716 6833; Fax: +353 1 716 6713; des.higgins@ 123456ucd.ie

[*]

These authors contributed equally to this work

Article

Publisher Item ID: msb201175

DOI: 10.1038/msb.2011.75

PMC ID: 3261699

PubMed ID: 21988835

SO-VID: 78412396-5d7b-4626-8f4b-3f2d636eaafe

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution Noncommercial Share Alike 3.0 Unported License, which allows readers to alter, transform, or build upon the article and then distribute the resulting work under the same or similar license to this one. The work must be attributed back to the original author and commercial use is not permitted without specific permission.

History

Date received : 23 July 2011

Date accepted : 06 September 2011

Comments

Comment on this article

scite_

Cited by 5,535

See all cited by

Most referenced authors 1,081

See all reference authors

- Version 1
- Version 1

Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega

Read this article at

Abstract

Related collections

RExPO Conference Series

Most cited references 17

Profile hidden Markov models.

Protein homology detection by HMM-HMM comparison.

Kalign – an accurate and fast multiple sequence alignment algorithm

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 10

Cited by 5,535

Most referenced authors 1,081