Comparing ChatGPT’s and surgeon’s responses to thyroid-related questions from patients

doi:10.1210/clinem/dgae235

DOI: 10.1210/clinem/dgae235 ISSN: 0021-972X

Comparing ChatGPT’s and surgeon’s responses to thyroid-related questions from patients

Siyin Guo, Ruicen Li, Genpeng Li, Wenjie Chen, Jing Huang, Linye He, Yu Ma, Liying Wang, Hongping Zheng, Chunxiang Tian, Yatong Zhao, Xinmin Pan, Hongxing Wan, Dasheng sLiu, Zhihui Li, Jianyong Lei

Biochemistry (medical)
Clinical Biochemistry
Endocrinology
Biochemistry
Endocrinology, Diabetes and Metabolism

Show PDF Cite

Abstract

Background

For some common thyroid-related conditions with high prevalence and long follow-up times, ChatGPT can be used to respond to common thyroid-related questions. In this cross-sectional study, we assessed the ability of ChatGPT (version GPT-4.0) to provide accurate, comprehensive, compassionate, and satisfactory responses to common thyroid-related questions.

Study design

First, we obtained 28 thyroid-related questions from the Huayitong app, which together with the two interfering questions eventually formed 30 questions. Then, these questions were responded to by ChatGPT (on July 19, 2023), junior specialist and senior specialist (on July 20, 2023) separately. Finally, 26 patients and 11 thyroid surgeons evaluated those responses on four dimensions: accuracy, comprehensiveness, compassion, and satisfaction.

Results

Among the 30 questions and responses, ChatGPT’s speed of response was faster than that of the junior specialist (8.69 [7.53-9.48] vs. 4.33 [4.05-4.60], P <.001) and senior specialist (8.69 [7.53-9.48] vs. 4.22 [3.36-4.76], P <.001). The word count of the ChatGPT’s responses was greater than that of both junior specialist (341.50 [301.00-384.25] vs. 74.50 [51.75-84.75], P <0.001) and senior specialist (341.50 [301.00-384.25] vs. 104.00 [63.75-177.75], P <0.001). ChatGPT received higher scores than junior specialist and senior specialist in terms of accuracy, comprehensiveness, compassion and satisfaction in responding to common thyroid-related questions.

Conclusions

ChatGPT performed better than junior specialist and senior specialist in answering common thyroid-related questions, but further research is needed to validate the logical ability of the ChatGPT for complex thyroid questions.

Outline

Comparing ChatGPT’s and surgeon’s responses to thyroid-related questions from patients

Abstract

Background

Study design

Results

Conclusions

More from our Archive