11
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Natural language processing for disease phenotyping in UK primary care records for research: a pilot study in myocardial infarction and death

      research-article
      1 , 2 , 3 , 4 , , 4 , 5 , 1 , 2 , 3 , 1 , 2 , 3 , 6 , 1 , 2 , 3
      Journal of Biomedical Semantics
      BioMed Central
      UK Healthcare Text Analysis Conference (HealTAC 2018) (HealTAC 2018)
      18-19 April 2018
      Free text, Myocardial infarction, Primary care, Chest pain, Natural language processing

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Free text in electronic health records (EHR) may contain additional phenotypic information beyond structured (coded) information. For major health events – heart attack and death – there is a lack of studies evaluating the extent to which free text in the primary care record might add information. Our objectives were to describe the contribution of free text in primary care to the recording of information about myocardial infarction (MI), including subtype, left ventricular function, laboratory results and symptoms; and recording of cause of death. We used the CALIBER EHR research platform which contains primary care data from the Clinical Practice Research Datalink (CPRD) linked to hospital admission data, the MINAP registry of acute coronary syndromes and the death registry. In CALIBER we randomly selected 2000 patients with MI and 1800 deaths. We implemented a rule-based natural language engine, the Freetext Matching Algorithm, on site at CPRD to analyse free text in the primary care record without raw data being released to researchers. We analysed text recorded within 90 days before or 90 days after the MI, and on or after the date of death.

          Results

          We extracted 10,927 diagnoses, 3658 test results, 3313 statements of negation, and 850 suspected diagnoses from the myocardial infarction patients. Inclusion of free text increased the recorded proportion of patients with chest pain in the week prior to MI from 19 to 27%, and differentiated between MI subtypes in a quarter more patients than structured data alone. Cause of death was incompletely recorded in primary care; in 36% the cause was in coded data and in 21% it was in free text. Only 47% of patients had exactly the same cause of death in primary care and the death registry, but this did not differ between coded and free text causes of death.

          Conclusions

          Among patients who suffer MI or die, unstructured free text in primary care records contains much information that is potentially useful for research such as symptoms, investigation results and specific diagnoses. Access to large scale unstructured data in electronic health records (millions of patients) might yield important insights.

          Related collections

          Most cited references23

          • Record: found
          • Abstract: found
          • Article: not found

          Validation and validity of diagnoses in the General Practice Research Database: a systematic review

          AIMS To investigate the range of methods used to validate diagnoses in the General Practice Research Database (GPRD), to summarize findings and to assess the quality of these validations. METHODS A systematic literature review was performed by searching PubMed and Embase for publications using GPRD data published between 1987 and April 2008. Additional publications were identified from conference proceedings, back issues of relevant journals, bibliographies of retrieved publications and relevant websites. Publications that reported attempts to validate disease diagnoses recorded in the GPRD were included. RESULTS We identified 212 publications, often validating more than one diagnosis. In total, 357 validations investigating 183 different diagnoses met our inclusion criteria. Of these, 303 (85%) utilized data from outside the GPRD to validate diagnoses. The remainder utilized only data recorded in the database. The median proportion of cases with a confirmed diagnosis was 89% (range 24–100%). Details of validation methods and results were often incomplete. CONCLUSIONS A number of methods have been used to assess validity. Overall, estimates of validity were high. However, the quality of reporting of the validations was often inadequate to permit a clear interpretation. Not all methods provided a quantitative estimate of validity and most methods considered only the positive predictive value of a set of diagnostic codes in a highly selected group of cases. We make recommendations for methodology and reporting to strengthen further the use of the GPRD in research.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register: current status and recent enhancement of an Electronic Mental Health Record-derived data resource

            Purpose The South London and Maudsley National Health Service (NHS) Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register and its Clinical Record Interactive Search (CRIS) application were developed in 2008, generating a research repository of real-time, anonymised, structured and open-text data derived from the electronic health record system used by SLaM, a large mental healthcare provider in southeast London. In this paper, we update this register's descriptive data, and describe the substantial expansion and extension of the data resource since its original development. Participants Descriptive data were generated from the SLaM BRC Case Register on 31 December 2014. Currently, there are over 250 000 patient records accessed through CRIS. Findings to date Since 2008, the most significant developments in the SLaM BRC Case Register have been the introduction of natural language processing to extract structured data from open-text fields, linkages to external sources of data, and the addition of a parallel relational database (Structured Query Language) output. Natural language processing applications to date have brought in new and hitherto inaccessible data on cognitive function, education, social care receipt, smoking, diagnostic statements and pharmacotherapy. In addition, through external data linkages, large volumes of supplementary information have been accessed on mortality, hospital attendances and cancer registrations. Future plans Coupled with robust data security and governance structures, electronic health records provide potentially transformative information on mental disorders and outcomes in routine clinical care. The SLaM BRC Case Register continues to grow as a database, with approximately 20 000 new cases added each year, in addition to extension of follow-up for existing cases. Data linkages and natural language processing present important opportunities to enhance this type of research resource further, achieving both volume and depth of data. However, research projects still need to be carefully tailored, so that they take into account the nature and quality of the source information.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Completeness and diagnostic validity of recording acute myocardial infarction events in primary care, hospital care, disease registry, and national mortality records: cohort study

              Objective To determine the completeness and diagnostic validity of myocardial infarction recording across four national health record sources in primary care, hospital care, a disease registry, and mortality register. Design Cohort study. Participants 21 482 patients with acute myocardial infarction in England between January 2003 and March 2009, identified in four prospectively collected, linked electronic health record sources: Clinical Practice Research Datalink (primary care data), Hospital Episode Statistics (hospital admissions), the disease registry MINAP (Myocardial Ischaemia National Audit Project), and the Office for National Statistics mortality register (cause specific mortality data). Setting One country (England) with one health system (the National Health Service). Main outcome measures Recording of acute myocardial infarction, incidence, all cause mortality within one year of acute myocardial infarction, and diagnostic validity of acute myocardial infarction compared with electrocardiographic and troponin findings in the disease registry (gold standard). Results Risk factors and non-cardiovascular coexisting conditions were similar across patients identified in primary care, hospital admission, and registry sources. Immediate all cause mortality was highest among patients with acute myocardial infarction recorded in primary care, which (unlike hospital admission and disease registry sources) included patients who did not reach hospital, but at one year mortality rates in cohorts from each source were similar. 5561 (31.0%) patients with non-fatal acute myocardial infarction were recorded in all three sources and 11 482 (63.9%) in at least two sources. The crude incidence of acute myocardial infarction was underestimated by 25-50% using one source compared with using all three sources. Compared with acute myocardial infarction defined in the disease registry, the positive predictive value of acute myocardial infarction recorded in primary care was 92.2% (95% confidence interval 91.6% to 92.8%) and in hospital admissions was 91.5% (90.8% to 92.1%). Conclusion Each data source missed a substantial proportion (25-50%) of myocardial infarction events. Failure to use linked electronic health records from primary care, hospital care, disease registry, and death certificates may lead to biased estimates of the incidence and outcome of myocardial infarction. Trial registration NCT01569139 clinicaltrials.gov.
                Bookmark

                Author and article information

                Contributors
                +44 78 7676 7478 , anoop@doctors.org.uk
                Conference
                J Biomed Semantics
                J Biomed Semantics
                Journal of Biomedical Semantics
                BioMed Central (London )
                2041-1480
                12 November 2019
                12 November 2019
                2019
                : 10
                Issue : Suppl 1 Issue sponsor : Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. The articles have undergone the journal's standard peer review process for supplements. The Supplement Editors declare that they have no competing interests.
                : 20
                Affiliations
                [1 ]ISNI 0000000121901201, GRID grid.83440.3b, Health Data Research UK London, , University College London, ; 222 Euston Road, London, NW1 2DA UK
                [2 ]ISNI 0000000121901201, GRID grid.83440.3b, Institute of Health Informatics, , University College London, ; 222 Euston Road, London, NW1 2DA UK
                [3 ]ISNI 0000000121901201, GRID grid.83440.3b, The National Institute for Health Research University College London Hospitals Biomedical Research Centre, , University College London, ; 222 Euston Road, London, NW1 2DA UK
                [4 ]ISNI 0000 0000 8937 2257, GRID grid.52996.31, University College London Hospitals NHS Foundation Trust, ; 235 Euston Road, London, NW1 2BU UK
                [5 ]GRID grid.57981.32, Clinical Practice Research Datalink, , Medicines and Healthcare products Regulatory Agency, ; 10 South Colonnade, London, E14 4PU UK
                [6 ]ISNI 0000 0001 2322 6764, GRID grid.13097.3c, Department of Biostatistics and Health Informatics, , King’s College London, ; De Crespigny Park, Denmark Hill, London, SE5 8AF UK
                Article
                214
                10.1186/s13326-019-0214-4
                6849160
                31711543
                f92969de-f779-45a9-84a9-55670090bb34
                © The Author(s). 2019

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                UK Healthcare Text Analysis Conference (HealTAC 2018)
                HealTAC 2018
                Manchester, UK
                18-19 April 2018
                History
                Categories
                Research
                Custom metadata
                © The Author(s) 2019

                Bioinformatics & Computational biology
                free text,myocardial infarction,primary care,chest pain,natural language processing

                Comments

                Comment on this article