BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Methods for evaluating the quality of genomic and metagenomic data are essential to aid genome assembly procedures and to correctly interpret the results of subsequent analyses. BUSCO estimates the completeness and redundancy of processed genomic data based on universal single-copy orthologs. Here, we present new functionalities and major improvements of the BUSCO software, as well as the renewal and expansion of the underlying data sets in sync with the OrthoDB v10 release. Among the major novelties, BUSCO now enables phylogenetic placement of the input sequence to automatically select the most appropriate BUSCO data set for the assessment, allowing the analysis of metagenome-assembled genomes of unknown origin. A newly introduced genome workflow increases the efficiency and runtimes especially on large eukaryotic genomes. BUSCO is the only tool capable of assessing both eukaryotic and prokaryotic species, and can be applied to various data types, from genome assemblies and metagenomic bins, to transcriptomes and gene sets.

Related collections

Most cited references 23

Record: found
Abstract: found
Article: not found

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Felipe A. Simão, Robert Waterhouse, Panagiotis Ioannidis … (2015)

Genomics has revolutionized biological research, but quality assessment of the resulting assembled sequences is complicated and remains mostly limited to technical measures like N50.

0 comments Cited 3289 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

Donovan Parks, Michael Imelfort, Connor Skennerton … (2015)

Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of “marker” genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities.

0 comments Cited 2860 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Prodigal: prokaryotic gene recognition and translation initiation site identification

Doug Hyatt, Gwo-Liang Chen, Philip LoCascio … (2010)

Background The quality of automated gene prediction in microbial organisms has improved steadily over the past decade, but there is still room for improvement. Increasing the number of correct identifications, both of genes and of the translation initiation sites for each gene, and reducing the overall number of false positives, are all desirable goals. Results With our years of experience in manually curating genomes for the Joint Genome Institute, we developed a new gene prediction algorithm called Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm). With Prodigal, we focused specifically on the three goals of improved gene structure prediction, improved translation initiation site recognition, and reduced false positives. We compared the results of Prodigal to existing gene-finding methods to demonstrate that it met each of these objectives. Conclusion We built a fast, lightweight, open source gene prediction program called Prodigal http://compbio.ornl.gov/prodigal/. Prodigal achieved good results compared to existing methods, and we believe it will be a valuable asset to automated microbial annotation pipelines.

0 comments Cited 2422 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Joanna Kelley: Role: Associate Editor

Journal

Journal ID (nlm-ta): Mol Biol Evol

Journal ID (iso-abbrev): Mol Biol Evol

Journal ID (publisher-id): molbev

Title: Molecular Biology and Evolution

Publisher: Oxford University Press

ISSN (Print): 0737-4038

ISSN (Electronic): 1537-1719

Publication date Collection: October 2021

Publication date (Electronic): 28 July 2021

Publication date PMC-release: 28 July 2021

Volume: 38

Issue: 10

Pages: 4647-4654

Affiliations

[1 ]Department of Genetic Medicine and Development, University of Geneva , Geneva, Switzerland

[2 ]Swiss Institute of Bioinformatics , Geneva, Switzerland

Author notes

Mosè Manni, Matthew R Berkeley and Mathieu Seppey authors contributed equally to this work.

Corresponding author: E-mail: evgeny.zdobnov@ 123456unige.ch .

Author information

Mosè Manni https://orcid.org/0000-0002-4146-6523

Article

Publisher ID: msab199

DOI: 10.1093/molbev/msab199

PMC ID: 8476166

PubMed ID: 34320186

SO-VID: a0e34a59-afba-4129-a753-6cc584d0c057

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Page count

Pages: 8

Funding

Funded by: Swiss National Science Foundation, DOI 10.13039/501100001711;

Award ID: 310030_189062

Comments

Comment on this article

scite_

Cited by 1,121

See all cited by

Most referenced authors 609

See all reference authors

- Version 1

BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes

Read this article at

Abstract

Related collections

Microbial Genomics

Most cited references 23

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

Prodigal: prokaryotic gene recognition and translation initiation site identification

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 219

Cited by 1,121

Most referenced authors 609