Can Large Language Models Aid Caregivers of Pediatric Cancer Patients in Information Seeking? A Cross‐Sectional Investigation
Emre Sezgin, Daniel I. Jackson, A. Baki Kocaballi, Mindy Bibart, Sue Zupanec, Wendy Landier, Anthony Audino, Mark Ranalli, Micah SkeensABSTRACT
Purpose
Caregivers in pediatric oncology need accurate and understandable information about their child's condition, treatment, and side effects. This study assesses the performance of publicly accessible large language model (LLM)‐supported tools in providing valuable and reliable information to caregivers of children with cancer.
Methods
In this cross‐sectional study, we evaluated the performance of the four LLM‐supported tools—ChatGPT (GPT‐4), Google Bard (Gemini Pro), Microsoft Bing Chat, and Google SGE—against a set of frequently asked questions (FAQs) derived from the Children's Oncology Group Family Handbook and expert input (In total, 26 FAQs and 104 generated responses). Five pediatric oncology experts assessed the generated LLM responses using measures including accuracy, clarity, inclusivity, completeness, clinical utility, and overall rating. Additionally, the content quality was evaluated including readability, AI disclosure, source credibility, resource matching, and content originality. We used descriptive analysis and statistical tests including Shapiro–Wilk, Levene's, Kruskal–Wallis H‐tests, and Dunn's post hoc tests for pairwise comparisons.
Results
ChatGPT shows high overall performance when evaluated by the experts. Bard also performed well, especially in accuracy and clarity of the responses, whereas Bing Chat and Google SGE had lower overall scores. Regarding the disclosure of responses being generated by AI, it was observed less frequently in ChatGPT responses, which may have affected the clarity of responses, whereas Bard maintained a balance between AI disclosure and response clarity. Google SGE generated the most readable responses whereas ChatGPT answered with the most complexity. LLM tools varied significantly (p < 0.001) across all expert evaluations except inclusivity. Through our thematic analysis of expert free‐text comments, emotional tone and empathy emerged as a unique theme with mixed feedback on expectations from AI to be empathetic.
Conclusion
LLM‐supported tools can enhance caregivers' knowledge of pediatric oncology. Each model has unique strengths and areas for improvement, indicating the need for careful selection based on specific clinical contexts. Further research is required to explore their application in other medical specialties and patient demographics, assessing broader applicability and long‐term impacts.