11
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A deep population reference panel of tandem repeat variation

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Tandem repeats (TRs) represent one of the largest sources of genetic variation in humans and are implicated in a range of phenotypes. Here we present a deep characterization of TR variation based on high coverage whole genome sequencing from 3550 diverse individuals from the 1000 Genomes Project and H3Africa cohorts. We develop a method, EnsembleTR, to integrate genotypes from four separate methods resulting in high-quality genotypes at more than 1.7 million TR loci. Our catalog reveals novel sequence features influencing TR heterozygosity, identifies population-specific trinucleotide expansions, and finds hundreds of novel eQTL signals. Finally, we generate a phased haplotype panel which can be used to impute most TRs from nearby single nucleotide polymorphisms (SNPs) with high accuracy. Overall, the TR genotypes and reference haplotype panel generated here will serve as valuable resources for future genome-wide and population-wide studies of TRs and their role in human phenotypes.

          Abstract

          Tandem repeats (TRs) comprise some of the most polymorphic regions of the human genome but are difficult to study. Here, the authors develop an ensemble-based genotyping method and characterize 1.7 million TRs across 3,550 humans from diverse populations.

          Related collections

          Most cited references64

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The Sequence Alignment/Map format and SAMtools

          Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            SciPy 1.0: fundamental algorithms for scientific computing in Python

            SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              PLINK: a tool set for whole-genome association and population-based linkage analyses.

              Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.
                Bookmark

                Author and article information

                Contributors
                mgymrek@ucsd.edu
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                23 October 2023
                23 October 2023
                2023
                : 14
                : 6711
                Affiliations
                [1 ]Department of Computer Science and Engineering, University of California San Diego, ( https://ror.org/0168r3w48) La Jolla, CA USA
                [2 ]Department of Medicine, University of California San Diego, ( https://ror.org/0168r3w48) La Jolla, CA USA
                [3 ]Department of Electrical and Computer Engineering, University of California San Diego, ( https://ror.org/0168r3w48) La Jolla, CA USA
                [4 ]GRID grid.11194.3c, ISNI 0000 0004 0620 0548, The African Center of Excellence in Bioinformatics and Data Intensive Sciences, the Infectious Diseases Institute, , Makerere University, ; Kampala, Uganda
                [5 ]Covenant University Bioinformatics Research (CUBRe), Covenant University, ( https://ror.org/00frr1n84) Ota, Ogun 112233 Nigeria
                [6 ]Department of Bioengineering, University of California San Diego, ( https://ror.org/0168r3w48) La Jolla, CA USA
                [7 ]Illumina Incorporated, San Diego, CA 92122 USA
                [8 ]Department of Computer & Information Sciences, Covenant University, ( https://ror.org/00frr1n84) Ota, Ogun 112233 Nigeria
                [9 ]Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, ( https://ror.org/00frr1n84) Ota, Ogun 112233 Nigeria
                [10 ]Department of Computer Science, Makerere University, ( https://ror.org/03dmz0111) Kampala, Uganda
                [11 ]Applied Bioinformatics Division, German Cancer Research Center (DKFZ), ( https://ror.org/04cdgtt98) Heidelberg, Baden-Württemberg 69120 Germany
                Author information
                http://orcid.org/0009-0009-2443-7868
                http://orcid.org/0000-0003-4343-8689
                http://orcid.org/0000-0001-9937-1255
                http://orcid.org/0000-0001-6874-2543
                http://orcid.org/0000-0002-3820-4565
                http://orcid.org/0000-0002-3296-0677
                http://orcid.org/0000-0002-0539-9714
                http://orcid.org/0000-0001-6445-8721
                http://orcid.org/0000-0002-5335-8562
                http://orcid.org/0009-0008-0420-9502
                http://orcid.org/0000-0002-0001-0554
                http://orcid.org/0000-0002-5810-6241
                http://orcid.org/0000-0002-6086-3903
                Article
                42278
                10.1038/s41467-023-42278-3
                10593948
                37872149
                193be097-9dda-4e7e-b2af-05a21568a997
                © Springer Nature Limited 2023

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 10 March 2023
                : 5 October 2023
                Funding
                Funded by: FundRef https://doi.org/10.13039/100000051, U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI);
                Award ID: R01HG010885
                Award ID: 1RM1HG011558
                Award ID: 1R01HG010149
                Award ID: 5U24HG006941
                Award ID: 5U2RTW010679
                Award ID: 1U2RTW010672-01
                Award ID: 1U2CEB032224-01
                Award ID: HG010149
                Award Recipient :
                Funded by: U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
                Funded by: U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
                Funded by: U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
                Funded by: U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
                Funded by: U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
                Funded by: U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
                Funded by: U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
                Funded by: FundRef https://doi.org/10.13039/100000057, U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS);
                Award ID: GM114362
                Award Recipient :
                Categories
                Article
                Custom metadata
                © Springer Nature Limited 2023

                Uncategorized
                data processing,gene expression,genome-wide association studies
                Uncategorized
                data processing, gene expression, genome-wide association studies

                Comments

                Comment on this article