108
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      integrOmics: an R package to unravel relationships between two omics datasets

      research-article
      1 , * , 2 , 3
      Bioinformatics
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation: With the availability of many ‘omics’ data, such as transcriptomics, proteomics or metabolomics, the integrative or joint analysis of multiple datasets from different technology platforms is becoming crucial to unravel the relationships between different biological functional levels. However, the development of such an analysis is a major computational and technical challenge as most approaches suffer from high data dimensionality. New methodologies need to be developed and validated.

          Results: integrOmics efficiently performs integrative analyses of two types of ‘omics’ variables that are measured on the same samples. It includes a regularized version of canonical correlation analysis to enlighten correlations between two datasets, and a sparse version of partial least squares (PLS) regression that includes simultaneous variable selection in both datasets. The usefulness of both approaches has been demonstrated previously and successfully applied in various integrative studies.

          Availability: integrOmics is freely available from http://CRAN.R-project.org/ or from the web site companion ( http://math.univ-toulouse.fr/biostat) that provides full documentation and tutorials.

          Contact: k.lecao@ 123456uq.edu.au

          Supplementary information: Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references12

          • Record: found
          • Abstract: found
          • Article: not found

          A sparse PLS for variable selection when integrating omics data.

          Recent biotechnology advances allow for multiple types of omics data, such as transcriptomic, proteomic or metabolomic data sets to be integrated. The problem of feature selection has been addressed several times in the context of classification, but needs to be handled in a specific manner when integrating data. In this study, we focus on the integration of two-block data that are measured on the same samples. Our goal is to combine integration and simultaneous variable selection of the two data sets in a one-step procedure using a Partial Least Squares regression (PLS) variant to facilitate the biologists' interpretation. A novel computational methodology called ;;sparse PLS" is introduced for a predictive analysis to deal with these newly arisen problems. The sparsity of our approach is achieved with a Lasso penalization of the PLS loading vectors when computing the Singular Value Decomposition. Sparse PLS is shown to be effective and biologically meaningful. Comparisons with classical PLS are performed on a simulated data set and on real data sets. On one data set, a thorough biological interpretation of the obtained results is provided. We show that sparse PLS provides a valuable variable selection tool for highly dimensional data sets.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Sparse canonical methods for biological data integration: application to a cross-platform study

            Background In the context of systems biology, few sparse approaches have been proposed so far to integrate several data sets. It is however an important and fundamental issue that will be widely encountered in post genomic studies, when simultaneously analyzing transcriptomics, proteomics and metabolomics data using different platforms, so as to understand the mutual interactions between the different data sets. In this high dimensional setting, variable selection is crucial to give interpretable results. We focus on a sparse Partial Least Squares approach (sPLS) to handle two-block data sets, where the relationship between the two types of variables is known to be symmetric. Sparse PLS has been developed either for a regression or a canonical correlation framework and includes a built-in procedure to select variables while integrating data. To illustrate the canonical mode approach, we analyzed the NCI60 data sets, where two different platforms (cDNA and Affymetrix chips) were used to study the transcriptome of sixty cancer cell lines. Results We compare the results obtained with two other sparse or related canonical correlation approaches: CCA with Elastic Net penalization (CCA-EN) and Co-Inertia Analysis (CIA). The latter does not include a built-in procedure for variable selection and requires a two-step analysis. We stress the lack of statistical criteria to evaluate canonical correlation methods, which makes biological interpretation absolutely necessary to compare the different gene selections. We also propose comprehensive graphical representations of both samples and variables to facilitate the interpretation of the results. Conclusion sPLS and CCA-EN selected highly relevant genes and complementary findings from the two data sets, which enabled a detailed understanding of the molecular characteristics of several groups of cell lines. These two approaches were found to bring similar results, although they highlighted the same phenomenons with a different priority. They outperformed CIA that tended to select redundant information.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Estimation of principal components and related models by iterative least squares.

                Bookmark

                Author and article information

                Journal
                Bioinformatics
                bioinformatics
                bioinfo
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                1 November 2009
                25 August 2009
                25 August 2009
                : 25
                : 21
                : 2855-2856
                Affiliations
                1 Institute for Molecular Biosciences and ARC Centre of Excellence in Bioinformatics,The University of Queensland, Brisbane QLD 4072, Australia, 2 Plateforme Biopuces, Genopôle Toulouse Midi-Pyrénées, Institut National des Sciences Appliquées, F-31077 and 3 Institut de Mathématiques de Toulouse, UMR 5219, Université de Toulouse et CNRS, F-31062, France
                Author notes
                * To whom correspondence should be addressed.

                Associate Editor: Martin Bishop

                Article
                btp515
                10.1093/bioinformatics/btp515
                2781751
                19706745
                7941ef53-fe07-4378-b9be-32e806be1b65
                © The Author(s) 2009. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 8 July 2009
                : 4 August 2009
                : 7 August 2009
                Categories
                Applications Note
                Systems Biology

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article