30
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Pangenome graph layout by Path-Guided Stochastic Gradient Descent

      Preprint
      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation:

          The increasing availability of complete genomes demands for models to study genomic variability within entire populations. Pangenome graphs capture the full genomic similarity and diversity between multiple genomes. In order to understand them, we need to see them. For visualization, we need a human readable graph layout: A graph embedding in low (e.g. two) dimensional depictions. Due to a pangenome graph’s potential excessive size, this is a significant challenge.

          Results:

          In response, we introduce a novel graph layout algorithm: the Path-Guided Stochastic Gradient Descent (PG-SGD). PG-SGD uses the genomes, represented in the pangenome graph as paths, as an embedded positional system to sample genomic distances between pairs of nodes. This avoids the quadratic cost seen in previous versions of graph drawing by Stochastic Gradient Descent (SGD). We show that our implementation efficiently computes the low dimensional layouts of gigabase-scale pangenome graphs, unveiling their biological features.

          Availability:

          We integrated PG-SGD in ODGI which is released as free software under the MIT open source license. Source code is available at https://github.com/pangenome/odgi.

          Related collections

          Most cited references21

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Bandage: interactive visualization of de novo genome assemblies

          Summary: Although de novo assembly graphs contain assembled contigs (nodes), the connections between those contigs (edges) are difficult for users to access. Bandage (a Bioinformatics Application for Navigating De novo Assembly Graphs Easily) is a tool for visualizing assembly graphs with connections. Users can zoom in to specific areas of the graph and interact with it by moving nodes, adding labels, changing colors and extracting sequences. BLAST searches can be performed within the Bandage graphical user interface and the hits are displayed as highlights in the graph. By displaying connections between contigs, Bandage presents new possibilities for analyzing de novo assemblies that are not possible through investigation of contigs alone. Availability and implementation: Source code and binaries are freely available at https://github.com/rrwick/Bandage. Bandage is implemented in C++ and supported on Linux, OS X and Windows. A full feature list and screenshots are available at http://rrwick.github.io/Bandage. Contact: rrwick@gmail.com Supplementary information : Supplementary data are available at Bioinformatics online.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly

            The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              A draft human pangenome reference

              Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals 1 . These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
                Bookmark

                Author and article information

                Journal
                bioRxiv
                BIORXIV
                bioRxiv
                Cold Spring Harbor Laboratory
                17 October 2023
                : 2023.09.22.558964
                Affiliations
                [1 ]Quantitative Biology Center (QBiC), University of Tübingen, Tübingen 72076, Germany
                [2 ]Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen 72076, Germany
                [3 ]Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
                [4 ]Genomics Research Centre, Human Technopole, Milan 20157, Italy
                [5 ]Department of Computer Engineering, School of Computation, Information and Technology (CIT), Technical University of Munich, Munich 80333, Germany
                [6 ]School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853, USA
                [7 ]Computomics GmbH, Eisenbahnstr. 1, 72072 Tübingen, Germany
                [8 ]M3 Research Center, University Hospital Tübingen, 72076 Tübingen, Germany
                Author notes
                [†]

                The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.

                [* ]To whom correspondence should be addressed. Contact: egarris5@ 123456uthsc.edu
                Author information
                http://orcid.org/0000-0003-3326-817X
                http://orcid.org/0000-0001-9744-131X
                http://orcid.org/0000-0001-8566-4049
                http://orcid.org/0000-0002-6775-2843
                http://orcid.org/0000-0002-0778-0308
                http://orcid.org/0000-0002-7232-3103
                http://orcid.org/0000-0002-4375-0691
                http://orcid.org/0000-0002-8021-9162
                http://orcid.org/0000-0003-3821-631X
                Article
                10.1101/2023.09.22.558964
                10542513
                37790531
                fccda619-98ec-49d4-af6d-0f38e8315efb

                This work is licensed under a Creative Commons Attribution 4.0 International License, which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.

                History
                Funding
                Funded by: Central Innovation Programme (ZIM) for SMEs of the Federal Ministry for Economic Affairs and Energy of Germany
                Funded by: Germany’s Excellence Strategy (CMFI)
                Award ID: EXC-2124
                Award ID: EXC 2180-390900677
                Funded by: BMBF-funded de
                Funded by: NBI Cloud within the German Network for Bioinformatics Infrastructure
                Award ID: 031A532B
                Award ID: 031A533A
                Award ID: 031A533B
                Award ID: 031A534A
                Award ID: 031A535A
                Award ID: 031A537A
                Award ID: 031A537B
                Award ID: 031A537C
                Award ID: 031A537D
                Award ID: 031A538A
                Funded by: NSFPPoSS
                Award ID: #2118709
                Funded by: National Institutes of Health
                Award ID: U01DA047638
                Funded by: National Institutes of Health/NIGMS
                Award ID: R01GM123489
                Categories
                Article

                Comments

                Comment on this article

                scite_
                0
                0
                0
                0
                Smart Citations
                0
                0
                0
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content432

                Most referenced authors593