The Use of Synthetic Electronic Health Record Data and Deep Learning to Improve Timing of High-Risk Heart Failure Surgical Intervention by Predicting Proximity to Catastrophic Decompensation

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Objective: Although many clinical metrics are associated with proximity to decompensation in heart failure (HF), none are individually accurate enough to risk-stratify HF patients on a patient-by-patient basis. The dire consequences of this inaccuracy in risk stratification have profoundly lowered the clinical threshold for application of high-risk surgical intervention, such as ventricular assist device placement. Machine learning can detect non-intuitive classifier patterns that allow for innovative combination of patient feature predictive capability. A machine learning-based clinical tool to identify proximity to catastrophic HF deterioration on a patient-specific basis would enable more efficient direction of high-risk surgical intervention to those patients who have the most to gain from it, while sparing others. Synthetic electronic health record (EHR) data are statistically indistinguishable from the original protected health information, and can be analyzed as if they were original data but without any privacy concerns. We demonstrate that synthetic EHR data can be easily accessed and analyzed and are amenable to machine learning analyses.

Methods: We developed synthetic data from EHR data of 26,575 HF patients admitted to a single institution during the decade ending on 12/31/2018. Twenty-seven clinically-relevant features were synthesized and utilized in supervised deep learning and machine learning algorithms (i.e., deep neural networks [DNN], random forest [RF], and logistic regression [LR]) to explore their ability to predict 1-year mortality by five-fold cross validation methods. We conducted analyses leveraging features from prior to/at and after/at the time of HF diagnosis.

Results: The area under the receiver operating curve (AUC) was used to evaluate the performance of the three models: the mean AUC was 0.80 for DNN, 0.72 for RF, and 0.74 for LR. Age, creatinine, body mass index, and blood pressure levels were especially important features in predicting death within 1-year among HF patients.

Conclusions: Machine learning models have considerable potential to improve accuracy in mortality prediction, such that high-risk surgical intervention can be applied only in those patients who stand to benefit from it. Access to EHR-based synthetic data derivatives eliminates risk of exposure of EHR data, speeds time-to-insight, and facilitates data sharing. As more clinical, imaging, and contractile features with proven predictive capability are added to these models, the development of a clinical tool to assist in timing of intervention in surgical candidates may be possible.

Related collections

Most cited references 24

Record: found
Abstract: found
Article: not found

SMOTE: Synthetic Minority Over-sampling Technique

N. Chawla, K. W. Bowyer, L Hall … (2002)

An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ``normal'' examples with only a small percentage of ``abnormal'' or ``interesting'' examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

0 comments Cited 2595 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Book: not found

Applied Logistic Regression

David Hosmer Jr., Stanley Lemeshow, Rodney Sturdivant (2013)

0 comments Cited 1049 times – based on 0 reviews

Bookmark

Record: found
Abstract: not found
Article: not found

Learning Deep Architectures for AI

Y Bengio (2009)

0 comments Cited 407 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Aixia Guo: URI : http://loop.frontiersin.org/people/1016318/overview

Journal

Journal ID (nlm-ta): Front Digit Health

Journal ID (iso-abbrev): Front Digit Health

Journal ID (publisher-id): Front. Digit. Health

Title: Frontiers in Digital Health

Publisher: Frontiers Media S.A.

ISSN (Electronic): 2673-253X

Publication date (Electronic): 07 December 2020

Publication date Collection: 2020

Volume: 2

Electronic Location Identifier: 576945

Affiliations

[1] ¹Institute for Informatics (I2), Washington University School of Medicine , St. Louis, MO, United States

[2] ²Department of Internal Medicine, Washington University School of Medicine , St. Louis, MO, United States

[3] ³Department of Surgery, Washington University School of Medicine , St. Louis, MO, United States

Author notes

Edited by: Juan Liu, Huazhong University of Science and Technology, China

Reviewed by: Liang Zhang, Xidian University, China; Zhibo Wang, University of Central Florida, United States; Kongtao Chen, University of Pennsylvania, United States

*Correspondence: Aixia Guo aixia.guo@ 123456wustl.edu

This article was submitted to Health Informatics, a section of the journal Frontiers in Digital Health

Article

DOI: 10.3389/fdgth.2020.576945

PMC ID: 8521851

PubMed ID: 34713050

SO-VID: 8569f5e5-1544-417b-8c61-f6b0c4b2272d

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

History

Date received : 27 June 2020

Date accepted : 13 November 2020

Page count

Figures: 4, Tables: 2, Equations: 0, References: 24, Pages: 8, Words: 4708

Comments

Comment on this article

scite_

Cited by 10

See all cited by

Most referenced authors 575

See all reference authors

The Use of Synthetic Electronic Health Record Data and Deep Learning to Improve Timing of High-Risk Heart Failure Surgical Intervention by Predicting Proximity to Catastrophic Decompensation

Read this article at

Abstract

Related collections

Cardiovascular Innovations and Applications

Most cited references 24

SMOTE: Synthetic Minority Over-sampling Technique

Applied Logistic Regression

Learning Deep Architectures for AI

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 54

Cited by 10

Most referenced authors 575