Jesse Smith, Philip MC Choi, Paul Buntine

Will code one day run a code? Performance of language models on ACEM primary examinations and implications

  • Emergency Medicine

AbstractObjectiveLarge language models (LLMs) have demonstrated mixed results in their ability to pass various specialist medical examination and their performance within the field of emergency medicine remains unknown.MethodsWe explored the performance of three prevalent LLMs (OpenAI's GPT series, Google's Bard, and Microsoft's Bing Chat) on a practice ACEM primary examination.ResultsAll LLMs achieved a passing score, with scores with GPT 4.0 outperforming the average candidate.ConclusionLarge language models, by passing the ACEM primary examination, show potential as tools for medical education and practice. However, limitations exist and are discussed.

Need a simple solution for managing your BibTeX entries? Explore CiteDrive!

  • Web-based, modern reference management
  • Collaborate and share with fellow researchers
  • Integration with Overleaf
  • Comprehensive BibTeX/BibLaTeX support
  • Save articles and websites directly from your browser
  • Search for new articles from a database of tens of millions of references
Try out CiteDrive

More from our Archive