Kevin Wu


I'm a 5th year PhD at Stanford University working with Prof. James Zou. I am fortunate to have been supported by the Stanford Data Science Scholars program.

My research focuses on why AI fails to be effectively deployed in critical settings, especially in healthcare. My work aims to understand and address these barriers through data-driven methods, and has been cited by organizations like the FDA, WHO, NIST, and AI.gov. Prior to my PhD, I helped start Deep Health, a health AI company. I completed my Master's degree at Harvard University in 2018, and my bachelor's at Duke University in 2015.

a picture of kevin wu

Recent News

  • 2025-03: Successfully defended my PhD dissertation! Thankful for my advisor James Zou and collaborators through the years.
  • 2025-03: MedArena is now live! Excited to partner with NEJM AI and Doximity to release the first public LLM leaderboard for clinicians only.
  • 2025-03: SourceCheckup is accepted to Nature Communications!
  • 2024-12: Excited to present ClashEval at NeurIPS 2024 in Vancouver!

Research

ClashEval
ClashEval: Quantifying the Tug-of-War Between an LLM's Internal Prior and External Evidence

Kevin Wu, Eric Wu, James Zou. NeurIPS 2024

We explore how LLMs behave when presented with information that conflicts with their internal knowledge.

[PDF] [GitHub]


SourceCheckup
How Well Do LLMs Cite Relevant Medical References? An Evaluation Framework and Analyses

Kevin Wu, Eric Wu, Ally Casasola, Angela Zhang, Kevin Wei, Teresa Nguyen, Sith Riantawan, Patricia Shi, Daniel Ho, James Zou. Nature Communications (Forthcoming)

We introduce SourceCheckup, an automated LLM agent that evaluates citation relevance in LLM responses.

[PDF] [GitHub]

Press: [NIST] [NHHF] [CEP]


AI Adaptation
Regulating AI Adaptation: An Analysis of AI Medical Device Updates

Kevin Wu, Eric Wu, Kit Rodolfa, Daniel E. Ho, James Zou. CHIL 2024

We present a systematic analysis of the frequency and types of model updating in FDA-cleared medical devices.

[PDF]


AI Usage
Characterizing the Clinical Adoption of Medical AI Devices through U.S. Insurance Claims

Kevin Wu, Eric Wu, Brandon Theodorou, Weixin Liang, Christina Mack, Lucas Glass, Jimeng Sun, and James Zou. NEJM AI

We analyze billions of insurance claims data to produce a first-look at medical AI adoption.

[PDF]

Press: [Nature Medicine] [ARPA-H] [MA State Government] [CO State Government] [The Imaging Wire] [Cardiac Wire]


DataInf
DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models

Yongchan Kwon*, Eric Wu*, Kevin Wu*, and James Zou. ICLR 2024

We propose a significantly faster approximation method for estimating influence scores that is well-suited for LoRA fine-tuned large language models.

[PDF]


OD-SHAP
Collecting data when missingness is unknown: a method for improving model performance given under-reporting in patient populations

Kevin Wu, Dominik Dahlem, Christopher Hane, Eran Halperin, and James Zou. CHIL 2023

We propose a model-guided method for data collection when missingness is unknown, and the model is fixed. Work completed with Optum Labs.

[PDF]


Fidocure
Analyses of canine cancer mutations and treatment outcomes using real-world clinico-genomics data of 2119 dogs

Kevin Wu*, Lucas Rodrigues*, Gerald Post, Garrett Harvey, Michelle White, Aubrey Miller, Lindsay Lambert, Benjamin Lewis, Christina Lopes, and James Zou. npj Precision Oncology

We analyze thousands of veterinary records to better understand how dogs (and maybe humans) respond to targeted cancer treatments.

[PDF]

Press: [Wired] [Washington Post]


Clinical Trials
Machine learning prediction of clinical trial operational efficiency

Kevin Wu, Eric Wu, Michael DAndrea, Nandini Chitale, Melody Lim, Marek Dabrowski, Klaudia Kantor, Hanoor Rangi, Ruishan Liu, Marius Garmhausen, Navdeep Pal, Chris Harbron, Shemra Rizzo, Ryan Copping, James Zou. AAPSJ (American Association of Pharmaceutical Scientists)

We forecast the operational efficiency of clinical trials using dozens of covariates such as study phase, number of eligibility criteria, and number of procedures.

[PDF]


FDA Approvals
How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals

Eric Wu, Kevin Wu, Roxana Daneshjou, David Ouyang, Daniel E Ho, James Zou. Nature Medicine

We analyze hundreds of FDA-approved AI medical devices and create a taxonomy of how medical AI are evaluated.

[PDF]

Press: [FDA] [WHO] [AI.gov]


Talks & Presentations

  • Aug 5, 2024: Summer 2024. Invited talk: Towards Reliable, Valid, and Safe Systems for Biomedical Data Science, JSM 2024, Portland, OR.
  • Dec 4, 2023: AI & Health Regulatory Policy Conference (Panelist)
  • June 23, 2023: Conference on Health, Inference, and Learning (Oral Presentation)
  • Apr 4, 2022: American Association for Cancer Research, Emerging Topics in Computational Oncology (Oral Presentation)