70
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This article describes several features in the MAFFT online service for multiple sequence alignment (MSA). As a result of recent advances in sequencing technologies, huge numbers of biological sequences are available and the need for MSAs with large numbers of sequences is increasing. To extract biologically relevant information from such data, sophistication of algorithms is necessary but not sufficient. Intuitive and interactive tools for experimental biologists to semiautomatically handle large data are becoming important. We are working on development of MAFFT toward these two directions. Here, we explain (i) the Web interface for recently developed options for large data and (ii) interactive usage to refine sequence data sets and MSAs.

          Related collections

          Most cited references29

          • Record: found
          • Abstract: found
          • Article: not found

          Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era.

          Recently developed methods have shown considerable promise in predicting residue-residue contacts in protein 3D structures using evolutionary covariance information. However, these methods require large numbers of evolutionarily related sequences to robustly assess the extent of residue covariation, and the larger the protein family, the more likely that contact information is unnecessary because a reasonable model can be built based on the structure of a homolog. Here we describe a method that integrates sequence coevolution and structural context information using a pseudolikelihood approach, allowing more accurate contact predictions from fewer homologous sequences. We rigorously assess the utility of predicted contacts for protein structure prediction using large and representative sequence and structure databases from recent structure prediction experiments. We find that contact predictions are likely to be accurate when the number of aligned sequences (with sequence redundancy reduced to 90%) is greater than five times the length of the protein, and that accurate predictions are likely to be useful for structure modeling if the aligned sequences are more similar to the protein of interest than to the closest homolog of known structure. These conditions are currently met by 422 of the protein families collected in the Pfam database.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A beginner's guide to eukaryotic genome annotation.

            The falling cost of genome sequencing is having a marked impact on the research community with respect to which genomes are sequenced and how and where they are annotated. Genome annotation projects have generally become small-scale affairs that are often carried out by an individual laboratory. Although annotating a eukaryotic genome assembly is now within the reach of non-experts, it remains a challenging task. Here we provide an overview of the genome annotation process and the available tools and describe some best-practice approaches.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees

              Motivation: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally conserved regions. This theory challenges the basis of progressive alignment methods and needs to be examined by being compared with other known methods including computationally intensive ones. Results: We used HomFam, ContTest and OXFam (an extended version of OXBench) to evaluate several methods enabled in MAFFT: (1) a progressive method with approximate guide trees, (2) a progressive method with chained guide trees, (3) a combination of an iterative refinement method and a progressive method and (4) a less approximate progressive method that uses a rigorous guide tree and consistency score. Other programs, Clustal Omega and UPP, available for large MSAs, were also included into the comparison. The effect of method 2 (chained guide trees) was positive in ContTest but negative in HomFam and OXFam. Methods 3 and 4 increased the benchmark scores more consistently than method 2 for the three datasets, suggesting that they are safer to use. Availability and Implementation: http://mafft.cbrc.jp/alignment/software/ Contact: katoh@ifrec.osaka-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Journal
                Brief Bioinform
                Brief. Bioinformatics
                bib
                Briefings in Bioinformatics
                Oxford University Press
                1467-5463
                1477-4054
                July 2019
                06 September 2017
                06 September 2017
                : 20
                : 4
                : 1160-1166
                Author notes
                Corresponding author: Kazutaka Katoh, 3-1 Yamadaoka, Suita, Osaka 565-0871, JAPAN. E-mail: katoh@ 123456ifrec.osaka-u.ac.jp
                Author information
                http://orcid.org/0000-0003-4133-8393
                Article
                bbx108
                10.1093/bib/bbx108
                6781576
                28968734
                5147d944-d5e9-4faf-a1ba-44e27231d52d
                © The Author 2017. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 30 June 2017
                : 27 July 2017
                Page count
                Pages: 7
                Funding
                Funded by: Japan Society for the Promotion of Science 10.13039/501100001691
                Award ID: JP16K07464
                Funded by: Japan Agency for Medical Research and Development 10.13039/100009619
                Categories
                Paper

                Bioinformatics & Computational biology
                multiple sequence alignment,sequence analysis,phylogenetic tree

                Comments

                Comment on this article