7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A Multifaceted benchmarking of synthetic electronic health record generation models

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Synthetic health data have the potential to mitigate privacy concerns in supporting biomedical research and healthcare applications. Modern approaches for data generation continue to evolve and demonstrate remarkable potential. Yet there is a lack of a systematic assessment framework to benchmark methods as they emerge and determine which methods are most appropriate for which use cases. In this work, we introduce a systematic benchmarking framework to appraise key characteristics with respect to utility and privacy metrics. We apply the framework to evaluate synthetic data generation methods for electronic health records data from two large academic medical centers with respect to several use cases. The results illustrate that there is a utility-privacy tradeoff for sharing synthetic health data and further indicate that no method is unequivocally the best on all criteria in each use case, which makes it evident why synthetic data generation methods need to be assessed in context.

          Abstract

          Synthetic health data have the potential to mitigate privacy concerns when sharing data to support biomedical research and the development of innovative healthcare applications. In this work, the authors introduce a use case oriented benchmarking framework to evaluate data synthesis models through a set of utility and privacy metrics.

          Related collections

          Most cited references49

          • Record: found
          • Abstract: found
          • Article: not found

          Generative adversarial networks

          Generative adversarial networks are a kind of artificial intelligence algorithm designed to solve the generative modeling problem. The goal of a generative model is to study a collection of training examples and learn the probability distribution that generated them. Generative Adversarial Networks (GANs) are then able to generate more examples from the estimated probability distribution. Generative models based on deep learning are common, but GANs are among the most successful generative models (especially in terms of their ability to generate realistic high-resolution images). GANs have been successfully applied to a wide variety of tasks (mostly in research settings) but continue to present unique challenges and research opportunities because they are based on game theory while most other approaches to generative modeling are based on optimization.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            CovidGAN: Data Augmentation Using Auxiliary Classifier GAN for Improved Covid-19 Detection

            Coronavirus (COVID-19) is a viral disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The spread of COVID-19 seems to have a detrimental effect on the global economy and health. A positive chest X-ray of infected patients is a crucial step in the battle against COVID-19. Early results suggest that abnormalities exist in chest X-rays of patients suggestive of COVID-19. This has led to the introduction of a variety of deep learning systems and studies have shown that the accuracy of COVID-19 patient detection through the use of chest X-rays is strongly optimistic. Deep learning networks like convolutional neural networks (CNNs) need a substantial amount of training data. Because the outbreak is recent, it is difficult to gather a significant number of radiographic images in such a short time. Therefore, in this research, we present a method to generate synthetic chest X-ray (CXR) images by developing an Auxiliary Classifier Generative Adversarial Network (ACGAN) based model called CovidGAN. In addition, we demonstrate that the synthetic images produced from CovidGAN can be utilized to enhance the performance of CNN for COVID-19 detection. Classification using CNN alone yielded 85% accuracy. By adding synthetic images produced by CovidGAN,the accuracy increased to 95%. We hope this method will speed up COVID-19 detection and lead to more robust systems of radiology.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The National COVID Cohort Collaborative (N3C): Rationale, Design, Infrastructure, and Deployment

              Abstract Objective COVID-19 poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. Methods The Clinical and Translational Science Award (CTSA) Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. Organized in inclusive workstreams, in two months we created: legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. Discussion The N3C has demonstrated that a multi-site collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multi-organizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19. LAY SUMMARY COVID-19 poses societal challenges that require expeditious data and knowledge sharing. Though medical records are abundant, they are largely inaccessible to outside researchers. Statistical, machine learning, and causal research are most successful with large datasets beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many clinical centers to reveal patterns in COVID-19 patients. To create N3C, the community had to overcome technical, regulatory, policy, and governance barriers to sharing patient-level clinical data. In less than 2 months, we developed solutions to acquire and harmonize data across organizations and created a secure data environment to enable transparent and reproducible collaborative research. We expect the N3C to help save lives by enabling collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care needs and thereby reduce the immediate and long-term impacts of COVID-19.
                Bookmark

                Author and article information

                Contributors
                sdmooney@uw.edu
                b.malin@vumc.org
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                9 December 2022
                9 December 2022
                2022
                : 13
                : 7609
                Affiliations
                [1 ]GRID grid.412807.8, ISNI 0000 0004 1936 9916, Department of Biomedical Informatics, , Vanderbilt University Medical Center, ; Nashville, TN USA
                [2 ]GRID grid.430406.5, ISNI 0000 0004 6023 5303, Sage Bionetworks, ; Seattle, WA USA
                [3 ]GRID grid.152326.1, ISNI 0000 0001 2264 7217, Department of Computer Science, , Vanderbilt University, ; Nashville, TN USA
                [4 ]GRID grid.34477.33, ISNI 0000000122986657, Department of Biomedical Informatics and Medical Education, , University of Washington, ; Seattle, WA USA
                [5 ]GRID grid.511425.6, ISNI 0000 0004 9346 3636, Tempus Labs, ; Chicago, IL USA
                [6 ]GRID grid.412807.8, ISNI 0000 0004 1936 9916, Department of Biostatistics, , Vanderbilt University Medical Center, ; Nashville, TN USA
                Author information
                http://orcid.org/0000-0002-6719-1388
                http://orcid.org/0000-0002-6544-1478
                http://orcid.org/0000-0003-3752-5778
                http://orcid.org/0000-0002-3184-0147
                http://orcid.org/0000-0002-4719-9120
                http://orcid.org/0000-0003-3040-5175
                Article
                35295
                10.1038/s41467-022-35295-1
                9734113
                36494374
                069e7966-0ce9-49bc-883d-907681827adc
                © The Author(s) 2022

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 5 August 2022
                : 28 November 2022
                Funding
                Funded by: FundRef https://doi.org/10.13039/100000002, U.S. Department of Health & Human Services | National Institutes of Health (NIH);
                Award ID: UL1TR002243
                Award Recipient :
                Categories
                Article
                Custom metadata
                © The Author(s) 2022

                Uncategorized
                translational research,computer science
                Uncategorized
                translational research, computer science

                Comments

                Comment on this article