Pipeliner: A Nextflow-Based Framework for the Definition of Sequencing Data Processing Pipelines

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The advent of high-throughput sequencing technologies has led to the need for flexible and user-friendly data preprocessing platforms. The Pipeliner framework provides an out-of-the-box solution for processing various types of sequencing data. It combines the Nextflow scripting language and Anaconda package manager to generate modular computational workflows. We have used Pipeliner to create several pipelines for sequencing data processing including bulk RNA-sequencing (RNA-seq), single-cell RNA-seq, as well as digital gene expression data. This report highlights the design methodology behind Pipeliner that enables the development of highly flexible and reproducible pipelines that are easy to extend and maintain on multiple computing environments. We also provide a quick start user guide demonstrating how to setup and execute available pipelines with toy datasets.

Related collections

Most cited references 6

Record: found
Abstract: found
Article: found

Is Open Access

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

Jeremy Goecks, Anton Nekrutenko, James E. Taylor (2010)

Increased reliance on computational approaches in the life sciences has revealed grave concerns about how accessible and reproducible computation-reliant results truly are. Galaxy http://usegalaxy.org, an open web-based platform for genomic research, addresses these problems. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis.

0 comments Cited 1428 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Alternative preprocessing of RNA-Sequencing data in The Cancer Genome Atlas leads to improved analysis results.

Mumtahena Rahman, Laurie Jackson, W Johnson … (2015)

The Cancer Genome Atlas (TCGA) RNA-Sequencing data are used widely for research. TCGA provides 'Level 3' data, which have been processed using a pipeline specific to that resource. However, we have found using experimentally derived data that this pipeline produces gene-expression values that vary considerably across biological replicates. In addition, some RNA-Sequencing analysis tools require integer-based read counts, which are not provided with the Level 3 data. As an alternative, we have reprocessed the data for 9264 tumor and 741 normal samples across 24 cancer types using the Rsubread package. We have also collated corresponding clinical data for these samples. We provide these data as a community resource.

0 comments Cited 113 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

How leaders influence the impact of affective events on team climate and performance in R&D teams

Andrew Pirola-Merlo, Giles Hirst, Leon Mann … (2002)

0 comments Cited 99 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Anthony Federico: URI : https://loop.frontiersin.org/people/627214

Tanya Karagiannis: URI : https://loop.frontiersin.org/people/650365

Kritika Karri: URI : https://loop.frontiersin.org/people/759761/overview

Dileep Kishore: URI : https://loop.frontiersin.org/people/714243

Yusuke Koga: URI : https://loop.frontiersin.org/people/647948

Joshua D. Campbell: URI : https://loop.frontiersin.org/people/656795

Stefano Monti: URI : https://loop.frontiersin.org/people/61455

Journal

Journal ID (nlm-ta): Front Genet

Journal ID (iso-abbrev): Front Genet

Journal ID (publisher-id): Front. Genet.

Title: Frontiers in Genetics

Publisher: Frontiers Media S.A.

ISSN (Electronic): 1664-8021

Publication date (Electronic): 28 June 2019

Publication date Collection: 2019

Volume: 10

Electronic Location Identifier: 614

Affiliations

[1] ¹Bioinformatics Program, Boston University , Boston, MA, United States

[2] ²Division of Computational Biomedicine, Boston University School of Medicine , Boston, MA, United States

Author notes

Edited by: Vinicius Maracaja-Coutinho, Universidad de Chile, Chile

Reviewed by: Pao-Yang Chen, Academia Sinica, Taiwan; Ernesto Picardi, University of Bari Aldo Moro, Italy

*Correspondence: Anthony Federico, anfed@ 123456bu.edu ; Stefano Monti, smonti@ 123456BU.EDU

This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics

Article

DOI: 10.3389/fgene.2019.00614

PMC ID: 6609566

PubMed ID: 31316552

SO-VID: dbc75273-c359-483d-825a-c9cf145eaed4

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

History

Date received : 21 November 2018

Date accepted : 13 June 2019

Page count

Figures: 6, Tables: 1, Equations: 0, References: 16, Pages: 7, Words: 2755

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Cited by 6

See all cited by

Most referenced authors 324

See all reference authors

Pipeliner: A Nextflow-Based Framework for the Definition of Sequencing Data Processing Pipelines

Read this article at

Abstract

Related collections

RNA drug delivery

Most cited references 6

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

Alternative preprocessing of RNA-Sequencing data in The Cancer Genome Atlas leads to improved analysis results.

How leaders influence the impact of affective events on team climate and performance in R&D teams

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 347

Cited by 6

Most referenced authors 324