Comparing ChatGPT’s and surgeon’s responses to thyroid-related questions from patients
Siyin Guo, Ruicen Li, Genpeng Li, Wenjie Chen, Jing Huang, Linye He, Yu Ma, Liying Wang, Hongping Zheng, Chunxiang Tian, Yatong Zhao, Xinmin Pan, Hongxing Wan, Dasheng sLiu, Zhihui Li, Jianyong Lei- Biochemistry (medical)
- Clinical Biochemistry
- Endocrinology
- Biochemistry
- Endocrinology, Diabetes and Metabolism
Abstract
Background
For some common thyroid-related conditions with high prevalence and long follow-up times, ChatGPT can be used to respond to common thyroid-related questions. In this cross-sectional study, we assessed the ability of ChatGPT (version GPT-4.0) to provide accurate, comprehensive, compassionate, and satisfactory responses to common thyroid-related questions.
Study design
First, we obtained 28 thyroid-related questions from the Huayitong app, which together with the two interfering questions eventually formed 30 questions. Then, these questions were responded to by ChatGPT (on July 19, 2023), junior specialist and senior specialist (on July 20, 2023) separately. Finally, 26 patients and 11 thyroid surgeons evaluated those responses on four dimensions: accuracy, comprehensiveness, compassion, and satisfaction.
Results
Among the 30 questions and responses, ChatGPT’s speed of response was faster than that of the junior specialist (8.69 [7.53-9.48] vs. 4.33 [4.05-4.60], P <.001) and senior specialist (8.69 [7.53-9.48] vs. 4.22 [3.36-4.76], P <.001). The word count of the ChatGPT’s responses was greater than that of both junior specialist (341.50 [301.00-384.25] vs. 74.50 [51.75-84.75], P <0.001) and senior specialist (341.50 [301.00-384.25] vs. 104.00 [63.75-177.75], P <0.001). ChatGPT received higher scores than junior specialist and senior specialist in terms of accuracy, comprehensiveness, compassion and satisfaction in responding to common thyroid-related questions.
Conclusions
ChatGPT performed better than junior specialist and senior specialist in answering common thyroid-related questions, but further research is needed to validate the logical ability of the ChatGPT for complex thyroid questions.