Harvard researchers tested large language models against human physicians on emergency room diagnostics. At least one AI model outperformed two doctors in accuracy on real ER cases.

The study evaluated how LLMs handle medical contexts across varied scenarios. Researchers used actual emergency department patient data to benchmark performance. The results show AI systems can match or exceed physician-level diagnostic accuracy in specific settings.

This research arrives as healthcare systems explore AI integration for clinical decision support. Emergency departments face crushing volume and time pressure, creating prime use cases for diagnostic assistance tools. Faster, more accurate triage could reduce patient wait times and improve outcomes.

The findings suggest LLMs trained on medical literature and clinical data can identify patterns human doctors might miss. However, the study doesn't indicate whether any company commercialized these models or raised funding around this capability. The research establishes proof of concept for AI diagnostic tools in acute care settings.

The broader market opportunity is substantial. U.S. emergency departments handle 150 million visits annually. Even modest diagnostic accuracy improvements could save lives and reduce healthcare costs. This work validates the business case for medical AI companies building diagnostic assistants.