2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Interpretable meta-learning of multi-omics data for survival analysis and pathway enrichment

      research-article
      , , , ,
      Bioinformatics
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          Despite the success of recent machine learning algorithms’ applications to survival analysis, their black-box nature hinders interpretability, which is arguably the most important aspect. Similarly, multi-omics data integration for survival analysis is often constrained by the underlying relationships and correlations that are rarely well understood. The goal of this work is to alleviate the interpretability problem in machine learning approaches for survival analysis and also demonstrate how multi-omics data integration improves survival analysis and pathway enrichment. We use meta-learning, a machine-learning algorithm that is trained on a variety of related datasets and allows quick adaptations to new tasks, to perform survival analysis and pathway enrichment on pan-cancer datasets. In recent machine learning research, meta-learning has been effectively used for knowledge transfer among multiple related datasets.

          Results

          We use meta-learning with Cox hazard loss to show that the integration of TCGA pan-cancer data increases the performance of survival analysis. We also apply advanced model interpretability method called DeepLIFT (Deep Learning Important FeaTures) to show different sets of enriched pathways for multi-omics and transcriptomics data. Our results show that multi-omics cancer survival analysis enhances performance compared with using transcriptomics or clinical data alone. Additionally, we show a correlation between variable importance assignment from DeepLIFT and gene coenrichment, suggesting that genes with higher and similar contribution scores are more likely to be enriched together in the same enrichment sets.

          Availability and implementation

          https://github.com/berkuva/TCGA-omics-integration.

          Related collections

          Most cited references61

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets

          Abstract Cellular life depends on a complex web of functional associations between biomolecules. Among these associations, protein–protein interactions are particularly important due to their versatility, specificity and adaptability. The STRING database aims to integrate all known and predicted associations between proteins, including both physical interactions as well as functional associations. To achieve this, STRING collects and scores evidence from a number of sources: (i) automated text mining of the scientific literature, (ii) databases of interaction experiments and annotated complexes/pathways, (iii) computational interaction predictions from co-expression and from conserved genomic context and (iv) systematic transfers of interaction evidence from one organism to another. STRING aims for wide coverage; the upcoming version 11.5 of the resource will contain more than 14 000 organisms. In this update paper, we describe changes to the text-mining system, a new scoring-mode for physical interactions, as well as extensive user interface features for customizing, extending and sharing protein networks. In addition, we describe how to query STRING with genome-wide, experimental data, including the automated detection of enriched functionalities and potential biases in the user's query data. The STRING resource is available online, at https://string-db.org/.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update)

            Abstract Biological data analysis often deals with lists of genes arising from various studies. The g:Profiler toolset is widely used for finding biological categories enriched in gene lists, conversions between gene identifiers and mappings to their orthologs. The mission of g:Profiler is to provide a reliable service based on up-to-date high quality data in a convenient manner across many evidence types, identifier spaces and organisms. g:Profiler relies on Ensembl as a primary data source and follows their quarterly release cycle while updating the other data sources simultaneously. The current update provides a better user experience due to a modern responsive web interface, standardised API and libraries. The results are delivered through an interactive and configurable web design. Results can be downloaded as publication ready visualisations or delimited text files. In the current update we have extended the support to 467 species and strains, including vertebrates, plants, fungi, insects and parasites. By supporting user uploaded custom GMT files, g:Profiler is now capable of analysing data from any organism. All past releases are maintained for reproducibility and transparency. The 2019 update introduces an extensive technical rewrite making the services faster and more flexible. g:Profiler is freely available at https://biit.cs.ut.ee/gprofiler.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics

              SUMMARY For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                April 2023
                02 March 2023
                02 March 2023
                : 39
                : 4
                : btad113
                Affiliations
                Department of Computer Science, University of Virginia , United States
                Department of Computer Science, University of Virginia , United States
                Department of Biochemistry and Molecular Genetics, University of Virginia , United States
                Center for Public Health Genomics, University of Virginia , United States
                Department of Public Health Sciences, University of Virginia , United States
                Department of Computer Science, University of Virginia , United States
                Author notes
                Corresponding author. Department of Computer Science, University of Virginia, United States. E-mail: hc2kc@ 123456virginia.edu
                Author information
                https://orcid.org/0000-0002-9537-1781
                https://orcid.org/0000-0003-4812-3627
                Article
                btad113
                10.1093/bioinformatics/btad113
                10079355
                36864611
                a4b8fdc9-855e-4fe0-ae84-46f778b42bae
                © The Author(s) 2023. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 09 August 2022
                : 07 February 2023
                : 27 February 2023
                : 01 March 2023
                : 06 April 2023
                Page count
                Pages: 9
                Categories
                Original Paper
                Data and Text Mining
                AcademicSubjects/SCI01060

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article