Recent research reveals that ChatGPT-4o has achieved 90.4% accuracy on the United States Medical Licensing Examination, significantly outperforming both previous AI models and the average medical student.
Imagine a medical student who can recall every textbook, research paper, and clinical guideline ever published, and never suffers from exam anxiety. This is the reality of advanced artificial intelligence. Recent research reveals that ChatGPT-4o, a cutting-edge AI, has not only passed the United States Medical Licensing Examination (USMLE) but has done so with a staggering 90.4% accuracy, significantly outperforming its predecessor models and even surpassing the average accuracy of medical students 1 2 . This breakthrough is more than a technical milestone; it signals a transformative shift in how doctors might be trained and how AI could assist in clinical practice.
To understand this achievement, we first need to grasp what a Large Language Model (LLM) like ChatGPT is. These AIs are trained on enormous datasets of text and code, encompassing everything from literature and scientific journals to websites. They learn to predict the next word in a sequence, allowing them to generate coherent, contextually relevant text, answer complex questions, and even reason through problems. Think of them as autocomplete on an unimaginable scale, capable of drafting essays, writing code, and, as it turns out, diagnosing diseases 2 .
The United States Medical Licensing Examination (USMLE) is a multi-step professional exam that every doctor must pass to practice in the United States. It's notoriously difficult, testing everything from foundational sciences in Step 1 to clinical knowledge and diagnostic reasoning in Step 2. For an AI to pass this exam isn't just a parlor trick; it's a proxy for assessing whether the machine possesses a robust, applicable understanding of medical knowledge 3 7 .
Previous versions of ChatGPT, like GPT-3.5, showed promise but were not yet top-tier. They scored around 60% on medical exams, a passing but not exceptional grade 1 4 .
The release of GPT-4 marked a significant leap, with accuracy jumping into the 80% range, demonstrating improved medical reasoning capabilities.
Researchers used a massive set of 750 clinical vignette-based multiple-choice questions. These are not simple fact recalls; they describe a patient's symptoms, history, and sometimes test results, requiring the test-taker to apply integrated knowledge to diagnose the condition or choose the next step in management 2 .
The questions were evenly split between 375 preclinical (USMLE Step 1) questions and 375 clinical (USMLE Step 2) questions 2 . The performance of the AIs was compared against each other and, crucially, against the average accuracy of medical students (59.3%), as provided by the question banks 2 .
Each question was fed into a new, separate chat session with the AI to prevent it from "learning" from previous questions. The prompt was standardized: "Answer the following question and provide an explanation for your answer choice" 2 .
GPT-3.5, GPT-4, GPT-4o
Real-patient scenarios
Consistent instructions
IBM SPSS software
The findings from the study were striking and unequivocal. GPT-4o demonstrated a masterful command of medical knowledge.
| Model Tested | Overall Accuracy (%) | Performance vs. Medical Students |
|---|---|---|
| ChatGPT-3.5 | 60.0% | Slightly better |
| ChatGPT-4 | 81.1% | Significantly outperforms |
| ChatGPT-4o | 90.4% | Vastly outperforms |
| Average Medical Student | 59.3% | (Benchmark) |
The data shows a dramatic evolution in capability. GPT-4o wasn't just marginally better; it was in a different league altogether, correctly answering over 9 out of 10 challenging medical questions. This indicates that GPT-4o's utility isn't limited to textbook knowledge. It shows high proficiency in the practical reasoning required to diagnose and manage patient care, skills that are directly transferable to a clinical setting 2 .
The evidence for AI's medical knowledge is strong, but how does it fare in situations that look more like real life? Subsequent research suggests it holds immense promise.
In a simulated patient study set in an emergency room, ChatGPT's performance was compared to that of human physicians. The AI scored significantly higher in history-taking and demonstrated greater empathy in its interactions. Most critically, there was no significant difference in clinical accuracy between the AI and the human doctors, suggesting it can be a powerful adjunct tool 5 8 .
Diagnosing rare diseases is a major challenge due to their complexity and a clinician's limited exposure. In a 2025 study, ChatGPT-4o demonstrated a 90.1% accuracy in generating the correct diagnosis for rare diseases based solely on clinical symptoms, a feat that could help shorten the multi-year "diagnostic odyssey" many patients endure 6 .
Medicine is a visual field. When tested on 38 image-based questions from the USMLE (e.g., identifying a rash or an X-ray), GPT-4o achieved an impressive 89.5% accuracy, showing its emerging ability to integrate visual and textual information for diagnosis 7 .
The journey of ChatGPT from a curious chatbot to a USMLE-high-scorer is more than a story of technological advancement. It's a preview of a new era in medicine. The results are clear: AI has evolved into a highly knowledgeable and potentially empathetic partner in healthcare.
This doesn't spell the end of the human doctor. Instead, it heralds a future of collaborative medicine, where AI acts as an unparalleled assistant. It can help doctors with diagnostics, manage administrative tasks, provide patients with empathetic and understandable information, and offer medical students a personalized, tireless tutor.
Of course, challenges remain. Issues of patient privacy, algorithmic bias, and the need for rigorous oversight are paramount. AI should be seen as a stethoscope for the mind—a powerful tool that augments, rather than replaces, the critical thinking and human touch of a skilled physician. As this technology continues to evolve, its integration into clinics and classrooms promises to enhance patient care and revolutionize how we train the healers of tomorrow.