1,799
views
0
recommends
+1 Recommend
0 collections
    21
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega

      report

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: found
          • Article: not found

          Profile hidden Markov models.

          S. Eddy (1998)
          The recent literature on profile hidden Markov model (profile HMM) methods and software is reviewed. Profile HMMs turn a multiple sequence alignment into a position-specific scoring system suitable for searching databases for remotely homologous sequences. Profile HMM analyses complement standard pairwise comparison methods for large-scale sequence analysis. Several software implementations and two large libraries of profile HMMs of common protein domains are available. HMM methods performed comparably to threading methods in the CASP2 structure prediction exercise.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Protein homology detection by HMM-HMM comparison.

            Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution. We have generalized the alignment of protein sequences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile HMMs. We present a method for detecting distant homologous relationships between proteins based on this approach. The method (HHsearch) is benchmarked together with BLAST, PSI-BLAST, HMMER and the profile-profile comparison tools PROF_SIM and COMPASS, in an all-against-all comparison of a database of 3691 protein domains from SCOP 1.63 with pairwise sequence identities below 20%.Sensitivity: When the predicted secondary structure is included in the HMMs, HHsearch is able to detect between 2.7 and 4.2 times more homologs than PSI-BLAST or HMMER and between 1.44 and 1.9 times more than COMPASS or PROF_SIM for a rate of false positives of 10%. Approximately half of the improvement over the profile-profile comparison methods is attributable to the use of profile HMMs in place of simple profiles. Alignment quality: Higher sensitivity is mirrored by an increased alignment quality. HHsearch produced 1.2, 1.7 and 3.3 times more good alignments ('balanced' score >0.3) than the next best method (COMPASS), and 1.6, 2.9 and 9.4 times more than PSI-BLAST, at the family, superfamily and fold level, respectively.Speed: HHsearch scans a query of 200 residues against 3691 domains in 33 s on an AMD64 2GHz PC. This is 10 times faster than PROF_SIM and 17 times faster than COMPASS.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Kalign – an accurate and fast multiple sequence alignment algorithm

              Background The alignment of multiple protein sequences is a fundamental step in the analysis of biological data. It has traditionally been applied to analyzing protein families for conserved motifs, phylogeny, structural properties, and to improve sensitivity in homology searching. The availability of complete genome sequences has increased the demands on multiple sequence alignment (MSA) programs. Current MSA methods suffer from being either too inaccurate or too computationally expensive to be applied effectively in large-scale comparative genomics. Results We developed Kalign, a method employing the Wu-Manber string-matching algorithm, to improve both the accuracy and speed of multiple sequence alignment. We compared the speed and accuracy of Kalign to other popular methods using Balibase, Prefab, and a new large test set. Kalign was as accurate as the best other methods on small alignments, but significantly more accurate when aligning large and distantly related sets of sequences. In our comparisons, Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods. Conclusion Kalign is a fast and robust alignment method. It is especially well suited for the increasingly important task of aligning large numbers of sequences.
                Bookmark

                Author and article information

                Journal
                Mol Syst Biol
                Mol. Syst. Biol
                Molecular Systems Biology
                Nature Publishing Group
                1744-4292
                2011
                11 October 2011
                11 October 2011
                : 7
                : 539
                Affiliations
                [1 ]School of Medicine and Medical Science, UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin , Dublin, Ireland
                [2 ]Computational and Systems Biology, Genome Institute of Singapore , Singapore
                [3 ]Structural and Computational Biology Unit, European Molecular Biology Laboratory , Heidelberg, Germany
                [4 ]Department of Biomolecular Engineering, University of California , Santa Cruz, CA, USA
                [5 ]EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome Campus , Hinxton, Cambridge, UK
                [6 ]Gene Center Munich, University of Munich (LMU) , Muenchen, Germany
                [7 ]Département de Biologie Structurale et Génomique, IGBMC (Institut de Génétique et de Biologie Moléculaire et Cellulaire), CNRS/INSERM/Université de Strasbourg , Illkirch, France
                Author notes
                [a ]UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland. Tel.: +353 1 716 6833; Fax: +353 1 716 6713; des.higgins@ 123456ucd.ie
                [*]

                These authors contributed equally to this work

                Article
                msb201175
                10.1038/msb.2011.75
                3261699
                21988835
                78412396-5d7b-4626-8f4b-3f2d636eaafe
                Copyright © 2011, EMBO and Macmillan Publishers Limited

                This is an open-access article distributed under the terms of the Creative Commons Attribution Noncommercial Share Alike 3.0 Unported License, which allows readers to alter, transform, or build upon the article and then distribute the resulting work under the same or similar license to this one. The work must be attributed back to the original author and commercial use is not permitted without specific permission.

                History
                : 23 July 2011
                : 06 September 2011
                Categories
                Report

                Quantitative & Systems biology
                hidden markov models,multiple sequence alignment,bioinformatics

                Comments

                Comment on this article