Optimization of miRNA-seq data preprocessing

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The past two decades of microRNA (miRNA) research has solidified the role of these small non-coding RNAs as key regulators of many biological processes and promising biomarkers for disease. The concurrent development in high-throughput profiling technology has further advanced our understanding of the impact of their dysregulation on a global scale. Currently, next-generation sequencing is the platform of choice for the discovery and quantification of miRNAs. Despite this, there is no clear consensus on how the data should be preprocessed before conducting downstream analyses. Often overlooked, data preprocessing is an essential step in data analysis: the presence of unreliable features and noise can affect the conclusions drawn from downstream analyses. Using a spike-in dilution study, we evaluated the effects of several general-purpose aligners (BWA, Bowtie, Bowtie 2 and Novoalign), and normalization methods (counts-per-million, total count scaling, upper quartile scaling, Trimmed Mean of M, DESeq, linear regression, cyclic loess and quantile) with respect to the final miRNA count data distribution, variance, bias and accuracy of differential expression analysis. We make practical recommendations on the optimal preprocessing methods for the extraction and interpretation of miRNA count data from small RNA-sequencing experiments.

Related collections

Most cited references 28

Record: found
Abstract: not found
Article: not found

Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation.

Y. H. Yang (2002)

There are many sources of systematic variation in cDNA microarray experiments which affect the measured gene expression levels (e.g. differences in labeling efficiency between the two fluorescent dyes). The term normalization refers to the process of removing such variation. A constant adjustment is often used to force the distribution of the intensity log ratios to have a median of zero for each slide. However, such global normalization approaches are not adequate in situations where dye biases can depend on spot overall intensity and/or spatial location within the array. This article proposes normalization methods that are based on robust local regression and account for intensity and spatial dependence in dye biases for different types of cDNA microarray experiments. The selection of appropriate controls for normalization is discussed and a novel set of controls (microarray sample pool, MSP) is introduced to aid in intensity-dependent normalization. Lastly, to allow for comparisons of expression levels across slides, a robust method based on maximum likelihood estimation is proposed to adjust for scale differences among slides.

0 comments Cited 804 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells.

Ryan Morin, Michael D O'Connor, Malachi Griffith … (2008)

MicroRNAs (miRNAs) are emerging as important, albeit poorly characterized, regulators of biological processes. Key to further elucidation of their roles is the generation of more complete lists of their numbers and expression changes in different cell states. Here, we report a new method for surveying the expression of small RNAs, including microRNAs, using Illumina sequencing technology. We also present a set of methods for annotating sequences deriving from known miRNAs, identifying variability in mature miRNA sequences, and identifying sequences belonging to previously unidentified miRNA genes. Application of this approach to RNA from human embryonic stem cells obtained before and after their differentiation into embryoid bodies revealed the sequences and expression levels of 334 known plus 104 novel miRNA genes. One hundred seventy-one known and 23 novel microRNA sequences exhibited significant expression differences between these two developmental states. Owing to the increased number of sequence reads, these libraries represent the deepest miRNA sampling to date, spanning nearly six orders of magnitude of expression. The predicted targets of those miRNAs enriched in either sample shared common features. Included among the high-ranked predicted gene targets are those implicated in differentiation, cell cycle control, programmed cell death, and transcriptional regulation.

0 comments Cited 414 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A survey of sequence alignment algorithms for next-generation sequencing.

Heng Li, Nils Homer (2010)

Rapidly evolving sequencing technologies produce data on an unparalleled scale. A central challenge to the analysis of this data is sequence alignment, whereby sequence reads must be compared to a reference. A wide variety of alignment algorithms and software have been subsequently developed over the past two years. In this article, we will systematically review the current development of these algorithms and introduce their practical applications on different types of experimental data. We come to the conclusion that short-read alignment is no longer the bottleneck of data analyses. We also consider future development of alignment algorithms with respect to emerging long sequence reads and the prospect of cloud computing.

0 comments Cited 232 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Brief Bioinform

Journal ID (iso-abbrev): Brief. Bioinformatics

Journal ID (publisher-id): bib

Journal ID (hwp): bib

Title: Briefings in Bioinformatics

Publisher: Oxford University Press

ISSN (Print): 1467-5463

ISSN (Electronic): 1477-4054

Publication date (Print): November 2015

Publication date (Electronic): 17 April 2015

Publication date PMC-release: 17 April 2015

Volume: 16

Issue: 6

Pages: 950-963

Author notes

Corresponding author. John McPherson, Ontario Institute for Cancer Research, MaRS Centre, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3; Fax.: 1-416-977-1118. E-mail: John.McPherson@ 123456oicr.on.ca

Article

Publisher ID: bbv019

DOI: 10.1093/bib/bbv019

PMC ID: 4652620

PubMed ID: 25888698

SO-VID: d393d9c2-58ca-4b20-bcfd-23c136318984

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 18 January 2015

Date revision received : 24 February 2015

Page count

Pages: 14

Comments

Comment on this article

scite_

Cited by 54

See all cited by

Most referenced authors 904

See all reference authors

Optimization of miRNA-seq data preprocessing

Read this article at

Abstract

Related collections

Privacy and Data Protection

Most cited references 28

Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation.

Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells.

A survey of sequence alignment algorithms for next-generation sequencing.

Author and article information

Journal

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 333

Cited by 54

Most referenced authors 904