0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Impact of genome build on RNA-seq interpretation and diagnostics

      Preprint
      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Summary

          Transcriptomics is a powerful tool for unraveling the molecular effects of genetic variants and disease diagnosis. Prior studies have demonstrated that choice of genome build impacts variant interpretation and diagnostic yield for genomic analyses. To identify the extent genome build also impacts transcriptomics analyses, we studied the effect of the hg19, hg38, and CHM13 genome builds on expression quantification and outlier detection in 386 rare disease and familial control samples from both the Undiagnosed Diseases Network (UDN) and Genomics Research to Elucidate the Genetics of Rare Disease (GREGoR) Consortium. We identified 2,800 genes with build-dependent quantification across six routinely-collected biospecimens, including 1,391 protein-coding genes and 341 known rare disease genes. We further observed multiple genes that only have detectable expression in a subset of genome builds. Finally, we characterized how genome build impacts the detection of outlier transcriptomic events. Combined, we provide a database of genes impacted by build choice, and recommend that transcriptomics-guided analyses and diagnoses are cross-referenced with these data for robustness.

          Related collections

          Most cited references74

          • Record: found
          • Abstract: found
          • Article: not found

          STAR: ultrafast universal RNA-seq aligner.

          Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            limma powers differential expression analyses for RNA-sequencing and microarray studies

            limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Cutadapt removes adapter sequences from high-throughput sequencing reads

                Bookmark

                Author and article information

                Journal
                medRxiv
                MEDRXIV
                medRxiv
                Cold Spring Harbor Laboratory
                12 January 2024
                : 2024.01.11.24301165
                Affiliations
                [1 ]Department of Genetics, School of Medicine, Stanford University
                [2 ]Department of Pathology, School of Medicine, Stanford University
                [3 ]PEGASE, INRAE, Institut Agro
                [4 ]Department of Pediatrics, School of Medicine, Stanford University
                [5 ]Stanford Center for Undiagnosed Diseases, Stanford University
                [6 ]Department of Cardiovascular Medicine, School of Medicine, Stanford University
                [7 ]Department of Biomedical Data Science, Stanford University
                Author notes
                [*]

                Undiagnosed Diseases Network Representatives: Jonathan A. Bernstein ( jon.berstein@ 123456stanford.edu ) and Matthew T. Wheeler ( wheelerm@ 123456stanford.edu )

                Author Contributions

                R.A.U., P.C.G., and S.B.M. conceived the study. R.A.U., P.C.G, T.D.J., F.D., and S.B.M. significantly contributed to study design, with feedback from D.E.B, J.A.B, and M.T.W. improving the analyses and focus throughout. Pipelines were developed by R.A.U. and P.C.G. with contributions from T.D.J. Analyses and figures for these analyses were generated by R.A.U., P.C.G, T.D.J., and F.D. Patients were seen by J.A.B., M.T.J, and D.E.B. and samples were processed by K.S.S. and C.A.J. The manuscript was primarily written and figures generated by R.A.U. and P.C.G, with major feedback also provided by S.B.M. All authors provided feedback on the manuscript to improve it.

                [+ ] Corresponding Author: smontgom@ 123456stanford.edu
                Author information
                http://orcid.org/0000-0002-2214-959X
                http://orcid.org/0000-0001-8187-5316
                http://orcid.org/0000-0002-1873-8607
                http://orcid.org/0000-0001-8252-6425
                http://orcid.org/0000-0003-2432-1928
                http://orcid.org/0000-0001-7340-5023
                http://orcid.org/0000-0002-8771-0886
                http://orcid.org/0000-0001-5369-346X
                http://orcid.org/0000-0001-8721-3022
                http://orcid.org/0000-0002-5200-3903
                Article
                10.1101/2024.01.11.24301165
                10802764
                38260490
                1264c73a-a21d-4b2f-bb5b-76ccfbab91bb

                This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator.

                History
                Categories
                Article

                Comments

                Comment on this article