13
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      Submit your digital health research with an established publisher
      - celebrating 25 years of open access

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      ChatGPT-Generated Differential Diagnosis Lists for Complex Case–Derived Clinical Vignettes: Diagnostic Accuracy Evaluation

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The diagnostic accuracy of differential diagnoses generated by artificial intelligence chatbots, including ChatGPT models, for complex clinical vignettes derived from general internal medicine (GIM) department case reports is unknown.

          Objective

          This study aims to evaluate the accuracy of the differential diagnosis lists generated by both third-generation ChatGPT (ChatGPT-3.5) and fourth-generation ChatGPT (ChatGPT-4) by using case vignettes from case reports published by the Department of GIM of Dokkyo Medical University Hospital, Japan.

          Methods

          We searched PubMed for case reports. Upon identification, physicians selected diagnostic cases, determined the final diagnosis, and displayed them into clinical vignettes. Physicians typed the determined text with the clinical vignettes in the ChatGPT-3.5 and ChatGPT-4 prompts to generate the top 10 differential diagnoses. The ChatGPT models were not specially trained or further reinforced for this task. Three GIM physicians from other medical institutions created differential diagnosis lists by reading the same clinical vignettes. We measured the rate of correct diagnosis within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and the top diagnosis.

          Results

          In total, 52 case reports were analyzed. The rates of correct diagnosis by ChatGPT-4 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 83% (43/52), 81% (42/52), and 60% (31/52), respectively. The rates of correct diagnosis by ChatGPT-3.5 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 73% (38/52), 65% (34/52), and 42% (22/52), respectively. The rates of correct diagnosis by ChatGPT-4 were comparable to those by physicians within the top 10 (43/52, 83% vs 39/52, 75%, respectively; P=.47) and within the top 5 (42/52, 81% vs 35/52, 67%, respectively; P=.18) differential diagnosis lists and top diagnosis (31/52, 60% vs 26/52, 50%, respectively; P=.43) although the difference was not significant. The ChatGPT models’ diagnostic accuracy did not significantly vary based on open access status or the publication date (before 2011 vs 2022).

          Conclusions

          This study demonstrates the potential diagnostic accuracy of differential diagnosis lists generated using ChatGPT-3.5 and ChatGPT-4 for complex clinical vignettes from case reports published by the GIM department. The rate of correct diagnoses within the top 10 and top 5 differential diagnosis lists generated by ChatGPT-4 exceeds 80%. Although derived from a limited data set of case reports from a single department, our findings highlight the potential utility of ChatGPT-4 as a supplementary tool for physicians, particularly for those affiliated with the GIM department. Further investigations should explore the diagnostic accuracy of ChatGPT by using distinct case materials beyond its training data. Such efforts will provide a comprehensive insight into the role of artificial intelligence in enhancing clinical decision-making.

          Related collections

          Most cited references27

          • Record: found
          • Abstract: found
          • Article: not found

          When to use the Bonferroni correction.

          The Bonferroni correction adjusts probability (p) values because of the increased risk of a type I error when making multiple statistical tests. The routine use of this test has been criticised as deleterious to sound statistical judgment, testing the wrong hypothesis, and reducing the chance of a type I error but at the expense of a type II error; yet it remains popular in ophthalmic research. The purpose of this article was to survey the use of the Bonferroni correction in research articles published in three optometric journals, viz. Ophthalmic & Physiological Optics, Optometry & Vision Science, and Clinical & Experimental Optometry, and to provide advice to authors contemplating multiple testing.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

            We evaluated the performance of a large language model called ChatGPT on the United States Medical Licensing Exam (USMLE), which consists of three exams: Step 1, Step 2CK, and Step 3. ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results suggest that large language models may have the potential to assist with medical education, and potentially, clinical decision-making.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine

                Bookmark

                Author and article information

                Contributors
                Journal
                JMIR Med Inform
                JMIR Med Inform
                JMI
                JMIR Medical Informatics
                JMIR Publications (Toronto, Canada )
                2291-9694
                2023
                9 October 2023
                : 11
                : e48808
                Affiliations
                [1 ] Department of Diagnostic and Generalist Medicine Dokkyo Medical University Tochigi Japan
                [2 ] Department of General Medicine Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences Okayama Japan
                [3 ] Department of General Medicine International University of Health and Welfare Narita Hospital Chiba Japan
                [4 ] Department of Hospital Medicine Urasoe General Hospital Okinawa Japan
                Author notes
                Corresponding Author: Takanobu Hirosawa hirosawa@ 123456dokkyomed.ac.jp
                Author information
                https://orcid.org/0000-0002-3573-8203
                https://orcid.org/0000-0002-5632-3218
                https://orcid.org/0000-0001-6042-7397
                https://orcid.org/0009-0000-8822-7127
                https://orcid.org/0000-0001-9513-6864
                https://orcid.org/0000-0002-0267-9876
                https://orcid.org/0000-0002-5557-0516
                https://orcid.org/0000-0002-3788-487X
                Article
                v11i1e48808
                10.2196/48808
                10594139
                37812468
                82e08a90-ca90-40f7-83e3-efcd822d774a
                ©Takanobu Hirosawa, Ren Kawamura, Yukinori Harada, Kazuya Mizuta, Kazuki Tokumasu, Yuki Kaji, Tomoharu Suzuki, Taro Shimizu. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 09.10.2023.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.

                History
                : 9 May 2023
                : 17 July 2023
                : 20 July 2023
                : 13 September 2023
                Categories
                Original Paper
                Original Paper

                artificial intelligence,ai chatbot,chatgpt,large language models,clinical decision support,natural language processing,diagnostic excellence,language model,vignette,case study,diagnostic,accuracy,decision support,diagnosis

                Comments

                Comment on this article