In a groundbreaking proof-of-concept study, OpenAI’s ChatGPT 4.0, the latest iteration of its large language model (LLM), has demonstrated a remarkable 85% accuracy in answering questions from the American Board of Psychiatry and Neurology’s neurology exam. Conducted by researchers from the University Hospital Heidelberg and the German Cancer Research Center, the study compared ChatGPT 4.0 with its predecessor, ChatGPT 3.5, revealing a significant improvement from 66.8% to 85% correct answers.
The experiment, carried out on May 31, showcased the LLMs’ prowess in clinical neurology, surpassing the average human score of 73.8%. Notably, ChatGPT 4.0 outperformed human participants in behavioral, cognitive, and psychological-related questions, achieving a passing score as per educational standards.
While these results highlight the potential of LLMs in clinical neurology, researchers emphasize the need for further refinement and fine-tuning before widespread application. Dr. Varun Venkataramani, part of the research team, emphasized that the study serves as a proof of concept for LLM capabilities, acknowledging the necessity for ongoing development to ensure their suitability for clinical neurology.
Despite the promising outlook, reservations persist, particularly in tasks requiring “higher-order thinking.” The researchers caution neurologists about potential limitations in LLMs’ performance, emphasizing the imperative for careful consideration in their practical use. Dr. Venkataramani underscored that while LLMs show promise in documentation and decision-making support systems, specific modifications are essential to address current imperfections in high-order cognitive tasks.
This achievement adds to the growing impact of AI in healthcare, showcasing the potential for advanced language models to contribute significantly to clinical neurology with further advancements and tailored adaptations.