ON
← Back to feed
United KingdomCulture4 days ago

Towards Conversational AI for Disease Management

This article discusses advancements in conversational AI for disease management using a large language model called AMIE. The system was tested against primary care physicians in a virtual OSCE study and showed non-inferior performance in management reasoning, with strengths in treatment precision and adherence to clinical guidelines. The researchers also introduced RxQA, a benchmark for evaluating medication reasoning based on national drug formularies.

Abstract

While large language models (LLMs) have shown promise in diagnostic dialogue 1 , their capabilities for effective management reasoning—including disease progression, therapeutic response, and safe medication prescription—remain under-explored. We advance the previously demonstrated diagnostic capabilities of the Articulate Medical Intelligence Explorer (AMIE) 1−3 through a new LLM-based agentic system optimized for multi-visit clinical management and dialogue. To ground its reasoning in authoritative clinical knowledge, AMIE leverages Gemini’s long-context capabilities 4 , combining in-context retrieval with structured reasoning to align its output with up-to-date clinical practice guidelines and drug formularies. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) study, AMIE was compared to 21 primary care physicians (PCPs) across 100 multi-visit case scenarios designed to reflect UK NICE Guidance and BMJ Best Practice guidelines. AMIE was non-inferior to PCPs in management reasoning as assessed by specialists and scored better in both preciseness of treatments and investigations, and in its alignment with and grounding in clinical guidelines. To benchmark medication reasoning, we developed RxQA, a multiple-choice question benchmark derived from two national drug formularies (US, UK) and validated by board-certified pharmacists. Though AMIE and PCPs both benefited from the ability to access external drug information, AMIE outperformed PCPs on higher difficulty questions. While further research would be needed before real-world translation, AMIE’s strong performance across evaluations marks a significant step towards conversational AI as a tool in disease management.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

27,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 52 print issues and online access

199,00 € per year

only 3,83 € per issue

Rent or buy this article

Prices vary by article type

from $1.95

to $39.95

Prices may be subject to local taxes which are calculated during checkout

Additional access options:

Log in

Learn about institutional subscriptions

Read our FAQs

Contact customer support

Author information

Author notes

These authors jointly supervised this work: Alan Karthikesalingam, Mike Schaekermann

These authors contributed equally: Valentin Liévin, Anil Palepu

Authors and Affiliations

Google DeepMind, Mountain View, California, USA

Valentin Liévin, Khaled Saab, David Stutz, Yong Cheng, S. Sara Mahdavi, Joëlle Barral, Ryutaro Tanno & Tao Tu

Google Research, Mountain View, California, USA

Anil Palepu, Wei-Hung Weng, Kavita Kulkarni, Dale R. Webster, Katherine Chou, Avinatan Hassidim, Yossi Matias, James Manyika, Vivek Natarajan, Adam Rodman, Alan Karthikesalingam & Mike Schaekermann

Authors

Valentin Liévin

Anil Palepu

Wei-Hung Weng

Khaled Saab

David Stutz

Yong Cheng

Kavita Kulkarni

S. Sara Mahdavi

Joëlle Barral

Dale R. Webster

Katherine Chou

Avinatan Hassidim

Yossi Matias

James Manyika

Ryutaro Tanno

Vivek Natarajan

Adam Rodman

Tao Tu

Alan Karthikesalingam

Mike Schaekermann

Corresponding authors

Correspondence to

Valentin Liévin , Anil Palepu , Alan Karthikesalingam or Mike Schaekermann .

Supplementary information

Supplementary Information (download PDF )

Supplementary discussion, methods and results (Sections 1-16). Contains related work, details on the system design for the Mx agent and Dialogue agent, details on the OSCE evaluation study (inter-rater reliability analysis, clinician metadata, scenario metadata, ablation analysis), and methods details and further results for the RxQA medication reasoning benchmark.

Reporting Summary (download PDF )

Supplementary Data 1 (download PDF )

Detailed view of two sample scenarios with AMIE and PCP output and evaluation gradings. Full details for two sample scenarios used in the OSCE evaluation study, including scenario information, AMIE-patient-actor conversations, PCP-patient-actor conversations, specialist physician gradings and patient actor gradings for all three visits per scenario.

Supplementary Data 2 (download PDF )

Details for all 120 OSCE scenarios with AMIE output (PDF). Scenario details and AMIE output for all 120 scenarios used either in the OSCE evaluation study (100) or for validation purposes (20), in human-readable PDF format.

Supplementary Data 3 (download CSV )

Details for all 120 OSCE scenarios with AMIE output (CSV). Scenario details and AMIE output for all 120 scenarios used either in the OSCE evaluation study (100) or for validation purposes (20), in machine-readable CSV format.

Peer Review File (download PDF )

About this article

Cite this article

Liévin, V., Palepu, A., Weng, WH. et al. Towards Conversational AI for Disease Management.

Nature (2026). https://doi.org/10.1038/s41586-026-10764-5

Download…

Read the full article at Nature News
Source document: Articulate Medical Intelligence Explorer (AMIE)

1 reports

Nature NewsParty-alignedCenter4 days ago
Towards Conversational AI for Disease Management

This article discusses advancements in conversational AI for disease management using a large language model called AMIE. The system was tested against primary care physicians in a virtual OSCE study and showed non-inferior performance in management reasoning, with strengths in treatment precision and adherence to clinical guidelines. The researchers also introduced RxQA, a benchmark for evaluating medication reasoning based on national drug formularies.

Bias read (Center): The article presents technical findings about an AI system used in medical contexts without taking a stance on political issues. It focuses on scientific evaluation and does not exhibit biased language, one-sided sourcing, or omission of context.

Official sources cited

  • organisation Articulate Medical Intelligence Explorer (AMIE)
  • organisation Gemini
  • government UK NICE Guidance
  • organisation BMJ Best Practice guidelines
  • government US and UK national drug formularies

Go to the primary sources (5)

The official sources this coverage is built on. Read them directly to bypass framing.

  • organisationArticulate Medical Intelligence Explorer (AMIE)
  • organisationGemini
  • governmentUK NICE Guidance
  • organisationBMJ Best Practice guidelines
  • governmentUS and UK national drug formularies