Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Increased reliance on computational approaches in the life sciences has revealed grave concerns about how accessible and reproducible computation-reliant results truly are. Galaxy http://usegalaxy.org, an open web-based platform for genomic research, addresses these problems. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis.

Related collections

Most cited references 14

Record: found
Abstract: found
Article: not found

The Bioperl toolkit: Perl modules for the life sciences.

Jason E Stajich, David Block, Kris Boulez … (2002)

The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort.

0 comments Cited 715 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

ChIP-seq accurately predicts tissue-specific activity of enhancers.

Axel Visel, Matthew J. Blow, Zirong Li … (2009)

A major yet unresolved quest in decoding the human genome is the identification of the regulatory sequences that control the spatial and temporal expression of genes. Distant-acting transcriptional enhancers are particularly challenging to uncover because they are scattered among the vast non-coding portion of the genome. Evolutionary sequence constraint can facilitate the discovery of enhancers, but fails to predict when and where they are active in vivo. Here we present the results of chromatin immunoprecipitation with the enhancer-associated protein p300 followed by massively parallel sequencing, and map several thousand in vivo binding sites of p300 in mouse embryonic forebrain, midbrain and limb tissue. We tested 86 of these sequences in a transgenic mouse assay, which in nearly all cases demonstrated reproducible enhancer activity in the tissues that were predicted by p300 binding. Our results indicate that in vivo mapping of p300 binding is a highly accurate means for identifying enhancers and their associated activities, and suggest that such data sets will be useful to study the role of tissue-specific enhancers in human biology and disease on a genome-wide scale.

0 comments Cited 640 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Computation for ChIP-seq and RNA-seq studies.

Shirley Pepke, Barbara J Wold, Ali Mortazavi (2009)

Genome-wide measurements of protein-DNA interactions and transcriptomes are increasingly done by deep DNA sequencing methods (ChIP-seq and RNA-seq). The power and richness of these counting-based measurements comes at the cost of routinely handling tens to hundreds of millions of reads. Whereas early adopters necessarily developed their own custom computer code to analyze the first ChIP-seq and RNA-seq datasets, a new generation of more sophisticated algorithms and software tools are emerging to assist in the analysis phase of these projects. Here we describe the multilayered analyses of ChIP-seq and RNA-seq datasets, discuss the software packages currently available to perform tasks at each layer and describe some upcoming challenges and features for future analysis tools. We also discuss how software choices and uses are affected by specific aspects of the underlying biology and data structure, including genome size, positional clustering of transcription factor binding sites, transcript discovery and expression quantification.

0 comments Cited 212 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Genome Biol

Title: Genome Biology

Publisher: BioMed Central

ISSN (Print): 1465-6906

ISSN (Electronic): 1465-6914

Publication date (Print): 2010

Publication date (Electronic): 25 August 2010

Volume: 11

Issue: 8

Page: R86

Affiliations

[1 ]Department of Biology and Department of Mathematics and Computer Science, Emory University, 1510 Clifton Road NE, Atlanta, GA 30322, USA

[2 ]Center for Comparative Genomics and Bioinformatics, Penn State University, 505 Wartik Lab, University Park, PA 16802, USA

Article

Publisher ID: gb-2010-11-8-r86

DOI: 10.1186/gb-2010-11-8-r86

PMC ID: 2945788

PubMed ID: 20738864

SO-VID: 0f77a98d-1ebe-45bc-9158-71a83d3aab97

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 2 June 2010

Date revision received : 30 July 2010

Date accepted : 25 August 2010

Comments

Comment on this article

scite_

Cited by 1,322

See all cited by

Most referenced authors 1,158

See all reference authors

- Version 1
- Version 1

Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

Read this article at

Abstract

Related collections

Genome Integrity

Most cited references 14

The Bioperl toolkit: Perl modules for the life sciences.

ChIP-seq accurately predicts tissue-specific activity of enhancers.

Computation for ChIP-seq and RNA-seq studies.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 114

Cited by 1,322

Most referenced authors 1,158