An evaluation of ‘ChatGPT’ Compared to Dermatological Surgeons’ Choice of Reconstruction of Mohs Surgical Defects
Adrian Cuellar-Barboza, Elizabeth Brussolo-Marroquín, Fanny C Cordero-Martinez, Patrizia E Aguilar-Calderon, Osvaldo Vazquez-Martinez, Jorge Ocampo-CandianiAbstract
Background
ChatGPT® (OpenAI; California, USA) is an open-access chatbot developed using artificial intelligence (AI) that generates human-like responses.
Objective
To evaluate the ChatGPT-4’s concordance with three dermatologic surgeons on reconstructions for dermatological surgical defects.
Methods
A total of 70 cases of non-melanoma skin cancer treated with surgery were obtained from clinical records for analysis. A list of 30 reconstruction options was designed by the main authors which included primary closure, secondary skin closure, skin flaps and skin grafts. Three blinded dermatologic surgeons, along with ChatGPT-4, were asked to select two reconstruction options from the list.
Results
Seventy responses were analyzed using Cohen’s kappa looking for concordance between each dermatologist and ChatGPT. The level of agreement among dermatologic surgeons was higher compared to that between dermatologic surgeons and ChatGPT, highlighting differences in decision-making. In the best reconstruction technique, the results indicated a fair level of agreement among the dermatologists ranging between κ 0.268 and 0.331. However, the concordance with ChatGPT-4 and the dermatologists was slight with κ values from 0.107 to 0.121. In the analysis of the second-choice options, the dermatologists showed slight agreement. In contrast, the level of concordance between ChatGPT-4 and the dermatologists was below chance.
Conclusions
As anticipated, this study reveals variability in medical decisions between dermatologic surgeons and ChatGPT. Although these tools offer exciting possibilities for the future, it's vital to acknowledge the risk of inadvertently rely on non-certified AI for medical advice.