Will code one day run a code? Performance of language models on <scp>ACEM</scp> primary examinations and implications

doi:10.1111/1742-6723.14280

Jesse Smith, Philip MC Choi, Paul Buntine

Will code one day run a code? Performance of language models on ACEM primary examinations and implications

Emergency Medicine

AbstractObjectiveLarge language models (LLMs) have demonstrated mixed results in their ability to pass various specialist medical examination and their performance within the field of emergency medicine remains unknown.MethodsWe explored the performance of three prevalent LLMs (OpenAI's GPT series, Google's Bard, and Microsoft's Bing Chat) on a practice ACEM primary examination.ResultsAll LLMs achieved a passing score, with scores with GPT 4.0 outperforming the average candidate.ConclusionLarge language models, by passing the ACEM primary examination, show potential as tools for medical education and practice. However, limitations exist and are discussed.

Need a simple solution for managing your BibTeX entries? Explore CiteDrive!

Web-based, modern reference management
Collaborate and share with fellow researchers
Integration with Overleaf
Comprehensive BibTeX/BibLaTeX support
Save articles and websites directly from your browser
Search for new articles from a database of tens of millions of references

Try out CiteDrive

Will code one day run a code? Performance of language models on ACEM primary examinations and implications

Need a simple solution for managing your BibTeX entries? Explore CiteDrive!

More from our Archive

Management of major bleeding for anticoagulated patients in the Emergency Department: an European experts consensus statement

Point‐of‐care electroencephalography enables rapid evaluation and management of non‐convulsive seizures and status epilepticus in the emergency department

Tripterine Inhibits Proliferation and Promotes Apoptosis of Keloid Fibroblasts by Targeting ROS/JNK Signaling

Factors that influence interprofessional implementation of trauma‐informed care in the emergency department

Augmented Renal Function in Burn Patients: Occurrence and Discordance With Commonly Used Methods to Assess Renal Function

Invasive Non-<i>Candida</i>Fungal Infections in Acute Burns—A 13-Year Review of a Single Institution and Review of the Literature

Will code one day run a code? Performance of language models on <scp>ACEM</scp> primary examinations and implications

Wnt4 increases the thickness of the epidermis in burn wounds by activating canonical Wnt signalling and decreasing the cell junctions between epidermal cells

Utilization of Ophthalmic Management in Patients with Head-and-Neck Trauma Secondary to Firearms

De-escalation Techniques for the Agitated Pediatric Patient