259
views
0
recommends
+1 Recommend
0 collections
    17
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation: High-throughput sequencing has made the analysis of new model organisms more affordable. Although assembling a new genome can still be costly and difficult, it is possible to use RNA-seq to sequence mRNA. In the absence of a known genome, it is necessary to assemble these sequences de novo, taking into account possible alternative isoforms and the dynamic range of expression values.

          Results: We present a software package named Oases designed to heuristically assemble RNA-seq reads in the absence of a reference genome, across a broad spectrum of expression values and in presence of alternative isoforms. It achieves this by using an array of hash lengths, a dynamic filtering of noise, a robust resolution of alternative splicing events and the efficient merging of multiple assemblies. It was tested on human and mouse RNA-seq data and is shown to improve significantly on the transABySS and Trinity de novo transcriptome assemblers.

          Availability and implementation: Oases is freely available under the GPL license at www.ebi.ac.uk/~zerbino/oases/

          Contact: dzerbino@ 123456ucsc.edu

          Supplementary information: Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references18

          • Record: found
          • Abstract: found
          • Article: not found

          Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs

          RNA-Seq provides an unbiased way to study a transcriptome, including both coding and non-coding genes. To date, most RNA-Seq studies have critically depended on existing annotations, and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We apply it to mouse embryonic stem cells, neuronal precursor cells, and lung fibroblasts to accurately reconstruct the full-length gene structures for the vast majority of known expressed genes. We identify substantial variation in protein-coding genes, including thousands of novel 5′-start sites, 3′-ends, and internal coding exons. We then determine the gene structures of over a thousand lincRNA and antisense loci. Our results open the way to direct experimental manipulation of thousands of non-coding RNAs, and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            De novo assembly and analysis of RNA-seq data.

            We describe Trans-ABySS, a de novo short-read transcriptome assembly and analysis pipeline that addresses variation in local read densities by assembling read substrings with varying stringencies and then merging the resulting contigs before analysis. Analyzing 7.4 gigabases of 50-base-pair paired-end Illumina reads from an adult mouse liver poly(A) RNA library, we identified known, new and alternative structures in expressed transcripts, and achieved high sensitivity and specificity relative to reference-based assembly methods.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Comprehensive comparative analysis of strand-specific RNA sequencing methods

              Strand-specific, massively-parallel cDNA sequencing (RNA-Seq) is a powerful tool for novel transcript discovery, genome annotation, and expression profiling. Despite multiple published methods for strand-specific RNA-Seq, no consensus exists as to how to choose between them. Here, we developed a comprehensive computational pipeline to compare library quality metrics from any RNA-Seq method. Using the well-annotated Saccharomyces cerevisiae transcriptome as a benchmark, we compared seven library construction protocols, including both published and our own novel methods. We found marked differences in strand-specificity, library complexity, evenness and continuity of coverage, agreement with known annotations, and accuracy for expression profiling. Weighing each method’s performance and ease, we identify the dUTP second strand marking and the Illumina RNA ligation methods as the leading protocols, with the former benefitting from the current availability of paired-end sequencing. Our analysis provides a comprehensive benchmark, and our computational pipeline is applicable for assessment of future protocols in other organisms.
                Bookmark

                Author and article information

                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                bioinfo
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                15 April 2012
                24 February 2012
                24 February 2012
                : 28
                : 8
                : 1086-1092
                Affiliations
                1Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, D-14195 Berlin, Germany, 2Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA, 3European Bioinformatics Institute, Wellcome Trust Genome Campus, CBS 10 SD, Hinxton, Cambridgeshire, UK and 4Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
                Author notes
                * To whom correspondence should be addressed.

                Associate Editor: Ivo Hofacker

                Article
                bts094
                10.1093/bioinformatics/bts094
                3324515
                22368243
                3c3b82c6-a990-4143-801c-02213affece2
                © The Author(s) 2012. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 2 December 2011
                : 20 January 2012
                : 17 February 2012
                Page count
                Pages: 7
                Categories
                Original Papers
                Sequence Analysis

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article