22
views
0
recommends
+1 Recommend
2 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Long-reads are revolutionizing 20 years of insect genome sequencing

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The first insect genome (Drosophila melanogaster) was published two decades ago. Today, nuclear genome assemblies are available for a staggering 601 insect species representing 20 orders. In this study, we analyzed the most-contiguous assembly for each species and provide a “state of the field” perspective, emphasizing taxonomic representation, assembly quality, gene completeness, and sequencing technologies. Relative to species richness, genomic efforts have been biased towards four orders (Diptera, Hymenoptera, Collembola, and Phasmatodea), Coleoptera are underrepresented, and 11 orders still lack a publicly available genome assembly. The average insect genome assembly is 439.2 megabases in length with 87.5% of single-copy benchmarking genes intact. Most notable has been the impact of long-read sequencing; assemblies that incorporate long-reads are ∼48x more contiguous than those that do not. We offer four recommendations as we collectively continue building insect genome resources: (1) seek better integration between independent research groups and consortia, (2) balance future sampling between filling taxonomic gaps and generating data for targeted questions, (3) take advantage of long read sequencing technologies, and (4) expand and improve gene annotations.

          Related collections

          Most cited references26

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Towards complete and error-free genome assemblies of all vertebrate species

          High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species 1 – 4 . To address this issue, the international Genome 10K (G10K) consortium 5 , 6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences. The Vertebrate Genome Project has used an optimized pipeline to generate high-quality genome assemblies for sixteen species (representing all major vertebrate classes), which have led to new biological insights.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A new view of the tree of life.

            The tree of life is one of the most important organizing principles in biology(1). Gene surveys suggest the existence of an enormous number of branches(2), but even an approximation of the full scale of the tree has remained elusive. Recent depictions of the tree of life have focused either on the nature of deep evolutionary relationships(3-5) or on the known, well-classified diversity of life with an emphasis on eukaryotes(6). These approaches overlook the dramatic change in our understanding of life's diversity resulting from genomic sampling of previously unexamined environments. New methods to generate genome sequences illuminate the identity of organisms and their metabolic capacities, placing them in community and ecosystem contexts(7,8). Here, we use new genomic data from over 1,000 uncultivated and little known organisms, together with published sequences, to infer a dramatically expanded version of the tree of life, with Bacteria, Archaea and Eukarya included. The depiction is both a global overview and a snapshot of the diversity within each major lineage. The results reveal the dominance of bacterial diversification and underline the importance of organisms lacking isolated representatives, with substantial evolution concentrated in a major radiation of such organisms. This tree highlights major lineages currently underrepresented in biogeochemical models and identifies radiations that are probably important for future evolutionary analyses.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Opportunities and challenges in long-read sequencing data analysis

              Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.
                Bookmark

                Author and article information

                Journal
                Genome Biology and Evolution
                Oxford University Press (OUP)
                1759-6653
                June 21 2021
                Affiliations
                [1 ]School of Biological Sciences, Washington State University, Pullman, WA, USA
                [2 ]Department of Biology, University of Rochester, Rochester, NY, USA
                [3 ]LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
                [4 ]Department of Terrestrial Zoology, Entomology III, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt, Germany
                [5 ]Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT, USA
                [6 ]Institute for Insect Biotechnology, Justus-Liebig-University, Giessen, Germany
                [7 ]Data Science Lab, Smithsonian Institution, Washington, DC, USA
                Article
                10.1093/gbe/evab138
                3cea2058-fd1a-4656-9a25-6cc152c75bf7
                © 2021

                http://creativecommons.org/licenses/by/4.0/

                History

                Comments

                Comment on this article