AI outperforms Doctors in Harvard trial of Emergency diagnosis

Sponsored by

Welcome, AI enthusiasts

A new Harvard study found OpenAI's o1 preview can match or beat expert ER doctors on real diagnostic calls, and it was sharpest exactly where humans struggle most. One of the clearest signs yet that AI is stepping into roles we thought were untouchable. Let's dive in!

In today’s insights:

AI outperforms Doctors in Harvard trial of Emergency diagnosis
Pentagon Brings Eight AI Giants Into Its Classified Networks
White House Considers Vetting AI Models before Public Release

Read time: 4 minutes

LATEST DEVELOPMENTS

HARVARD
⚕️AI outperforms Doctors in Harvard trial of Emergency diagnosis

Source: NIKO YAITANES / HARVARD MAGAZINE

Evolving AI: Harvard researchers found OpenAI's o1 preview matched or surpassed expert ER physicians across triage, diagnosis, and case management.

Key Points:

OpenAI's o1 was best when it had the least information, like during initial triage.
76 real ER cases tested at a Boston hospital, blinded review.
Roughly 20% of clinicians were already using LLMs for second opinions in 2025.

Details:

The Harvard study, published in Science, evaluated o1 preview across three ER stages, starting with arrival triage, then first physician contact, and finally admission. Two reviewers unaware of the source judged its assessments as equal to or better than attending physicians, and the model went on to dominate NEJM cases that have served as diagnostic benchmarks since 1959. The authors do stress one limit though, the inputs were text-only, with images and EKGs still being studied.

Why It Matters:

Even with text-only inputs, o1 preview is already clearing benchmarks doctors have leaned on since 1959, yet most hospitals are still moving cautiously while roughly 40 million people ping ChatGPT about health every day. The real opening sits somewhere in between, a model quietly scanning messy EHRs for missed diagnoses before they happen, and how that gets adopted may end up shaping medicine more than the benchmark ever did.

Your next great hire lives in Slack.

Viktor is an AI coworker that connects to your tools and ships real work. Ask Viktor to pull a report, build a client dashboard, or source 200 leads matching your ICP. Most teams hand over half their ops within a week.

Add Viktor to Slack for free.

PENTAGON
🪖 Pentagon Brings Eight AI Giants Into Its Classified Networks

Source:Alpha Coders

Evolving AI: The Pentagon signed eight AI firms for classified work and quietly adopted the safety limits it spent months rejecting.

Key Points:

SpaceX, OpenAI, Google, Microsoft, AWS, Oracle, NVIDIA, and Reflection signed the deal.
All of the firms cleared for IL6 and IL7 networks, though Anthropic still excluded.
Contracts include limits on autonomous weapons and domestic surveillance.

Details:

The announcement is all about decision superiority and an AI-first fighting force, but the more interesting story sits in the contract language itself. After months of demanding access for "all lawful purposes," the Department of War quietly committed to human oversight rules and protections against unauthorized surveillance, which is pretty close to what it had refused back in February. Meanwhile GenAI.mil, the Pentagon's in-house platform, has already pulled in 1.3 million users in just five months.

Why It Matters:

AI safety norms are getting shaped through procurement now rather than policy. A judge already called the February blacklist retaliation, yet Anthropic is still on the outside even as everyone else operates under similar guardrails. The deeper signal is that whoever loses the contract also loses the seat at the table, even when their position quietly ends up shaping the final terms.

Same Kafka Protocol. Zero Kafka Baggage.

WarpStream BYOC speaks the Kafka protocol. Your existing clients, tools, and consumers work as-is. What disappears: local disks, partition rebalancing, inter-AZ fees, broker crashes, and capacity planning.

Agents auto-scale to match traffic automatically – no custom tooling, scripts, or operators required. Cursor's team reported spending zero hours thinking about scaling WarpStream. Character.AI called it operationally simpler at scale.

See how it works, then sign up free. Get $400 in credits that never expire. No credit card required to start.

Learn More

WHITE HOUSE
🏛️ White House Considers Vetting AI Models before Public Release

Source: wirestock on Magnific

Evolving AI: A possible Trump executive order would vet powerful AI models before release, signaling a new federal posture.

Key Points:

A draft executive order would set up a working group of tech execs and government officials to vet new models before public release.
Google, Anthropic and OpenAI leaders were briefed on the plans last week.
Anthropic's Mythos model can reportedly exploit flaws in every major OS and browser, and the NSA has used it to probe federal software.

Details:

The plan would give the government first look at frontier AI before it ships, possibly modeled on the UK's review setup, and it wouldn't necessarily block releases. Officials worry about political fallout from an AI-enabled cyberattack and are also weighing what these systems could offer the Pentagon and intelligence agencies. The shift comes after Trump scrapped Biden's safety-testing order on day one.

Why It Matters:

The same White House that called AI rules a drag on competitiveness is now drafting them. That tells you how fast capability concerns have caught up with the deregulation push, especially once the NSA started running Mythos against federal software. A model too risky for the public but already useful to the government leaves Washington in a tough spot, as they will be testing use case for agencies, while reviewing it for release.

👀 Click on the image you think is real

QUICK HITS

🛡️GPT 5.5 matches Mythos Preview Model in new cybersecurity test

💾 Cerebras Systems' AI chipmaker is set to go for IPO

💼 Bret Taylor's Sierra rasies nearly $1B for enterprise AI

🏦Citi debut platform to bring AI agents to Banking

📈 Trending AI Tools

📝 Granola - AI notetaker that captures the real insights and turns every conversation into ready-to-share, action-driving notes*
🤖 Taskade - AI-powered productivity workspace
🎨 VideoGen - Text-to-image and video generator
📊 HypeAuditor - AI-powered influencer marketing platform

*partner link

AI outperforms Doctors in Harvard trial of Emergency diagnosis

Welcome, AI enthusiasts

Read time: 4 minutes

HARVARD
⚕️AI outperforms Doctors in Harvard trial of Emergency diagnosis

Key Points:

Details:

Why It Matters:

Your next great hire lives in Slack.

PENTAGON
🪖 Pentagon Brings Eight AI Giants Into Its Classified Networks

Key Points:

Details:

Why It Matters:

Same Kafka Protocol. Zero Kafka Baggage.

WHITE HOUSE
🏛️ White House Considers Vetting AI Models before Public Release

Key Points:

Details:

Why It Matters:

👀 Click on the image you think is real

📈 Trending AI Tools

What'd you think of today's edition?

Reply

Keep Reading

Evolving AI Insights

AI outperforms Doctors in Harvard trial of Emergency diagnosis

Welcome, AI enthusiasts

Read time: 4 minutes

HARVARD⚕️AI outperforms Doctors in Harvard trial of Emergency diagnosis

Key Points:

Details:

Why It Matters:

Your next great hire lives in Slack.

PENTAGON🪖 Pentagon Brings Eight AI Giants Into Its Classified Networks

Key Points:

Details:

Why It Matters:

Same Kafka Protocol. Zero Kafka Baggage.

WHITE HOUSE🏛️ White House Considers Vetting AI Models before Public Release

Key Points:

Details:

Why It Matters:

👀 Click on the image you think is real

📈 Trending AI Tools

What'd you think of today's edition?

Reply

Keep Reading

Evolving AI Insights

HARVARD
⚕️AI outperforms Doctors in Harvard trial of Emergency diagnosis

PENTAGON
🪖 Pentagon Brings Eight AI Giants Into Its Classified Networks

WHITE HOUSE
🏛️ White House Considers Vetting AI Models before Public Release