Liam McCoy, MD MSc

@LiamGMcCoy

PGY4 @uofa_neurology | Research @mitcriticaldata @BIDMC_medicine | MSc @ihpmeuoft | MD @uoftmedicine | trying to fix the medical knowledge system

Joined April 2019

2KFollowing

1KFollowers

Pinned

Liam McCoy, MD MSc@LiamGMcCoy · Sep 25

How do we surface and interrogate the subtle and complex biases we can see in the freeform generation of LLMs? Out today in @NatureMedicine, I collaborated with a great team @GoogleHealth @GoogleDeepMind to engage in the largest-scale exploration of this question to date.

LiamGMcCoy's tweet image. How do we surface and interrogate the subtle and complex biases we can see in the freeform generation of LLMs?

Out today in @NatureMedicine, I collaborated with a great team @GoogleHealth @GoogleDeepMind to engage in the largest-scale exploration of this question to date.

3.0K

Pinned

Liam McCoy, MD MSc Retweeted

Liam McCoy, MD MSc@LiamGMcCoy · Jul 16

I think we are also destined, somewhat ironically, for a period of less evidence-based practice. Adherence to opinion-heavy guidelines is the easy proximal target for reasoning systems, before the era of truly high quality auto-gathered evidence

105

Liam McCoy, MD MSc Retweeted

Liam McCoy, MD MSc@LiamGMcCoy · Jul 24

We have an upcoming BMJ AI topic collection on exactly this! We know a lot about the ways models perform, but we know so little about why bmjdigitalhealth.bmj.com/pages/topic-co…

170

Liam McCoy, MD MSc@LiamGMcCoy · Jul 19

A great soft indicator of just how much health information consumption has already shifted to ChatGPT

SShashank Joshi@shashj · Jul 19

“Yet as Google does the Googling, humans no longer visit the websites from which the information is gleaned. Similarweb, which measures traffic to more than 100m web domains, estimates that worldwide search traffic … fell by about 15% in the year to June” economist.com/business/2025/…

300

Liam McCoy, MD MSc@LiamGMcCoy · Jul 19

This is the same with medical data — with the added step that you need understanding of the underlying clinical context Only by getting knee-deep in mind-numbing data work do you realize just how significant the gaps are between the data and the reality you hope to model

AAnshul Kundaje (anshulkundaje@bluesky)@anshulkundaje · Jul 19

For biological data, if you don't have deep expertise in this low value work called data cleaning, u r lacking a fundamental understanding of the idiosyncrasies of the data. Without this knowledge, it is impossible to seriously model data.

668

Liam McCoy, MD MSc@LiamGMcCoy · Jul 16

Amazing work by @PierreEliasMD and team - among the clearest examples of AI signal analysis finding a meaningful and actionable signal. Proper ML For healthcare, not just ML on healthcare data. Was a pleasure to hear about this at SAIL and I am glad to see the final paper!

PPierre Elias, MD@PierreEliasMD · Jul 16

🧵1/Today, we published a key milestone towards AI based cardiac screening in Nature. doi.org/10.1038/s41586… EchoNext outperformed cardiologists and found thousands of high-risk patients missed in routine care. We also made a version available to the world.

549

Liam McCoy, MD MSc@LiamGMcCoy · Jul 4

add to the annals of "multiple choice questions are bad benchmarks" - you don't even need to give the model the question for it to get the answers

SShashwat Goel@ShashwatGoel7 · Jul 4

There's been a hole at the heart of #LLM evals, and we can now fix it. 📜New paper: Answer Matching Outperforms Multiple Choice for Language Model Evaluations. ❗️We found MCQs can be solved without even knowing the question. Looking at just the choices helps guess the answer…

315

Liam McCoy, MD MSc@LiamGMcCoy · Jul 4

146

Liam McCoy, MD MSc@LiamGMcCoy · Jul 2

Not only do these cases fail to capture the ambiguity of real clinical scenarios (e.g. contradictory/red herring findings), I worry that this approach enables the LLMs to secretly share the answer with each other. Outputs from from "don't reveal X" still involve the circuits of X

DDavid Ouyang, MD@David_Ouyang · Jul 2

Hallucinating “Numerically or descriptively consistent” results is … hard. And not how medicine works. Why do we need to draw labs if we can think through what it should be? Tests are meant to make some diagnoses more likely and some less likely. And they can suprise you, making…

375

Liam McCoy, MD MSc@LiamGMcCoy · Jul 1

Our exact point in our @NEJM editorial last fall. Writing notes (and knowing you'll have to write a note) impacts your cognitive process. Further, will fatigued, burnt out docs really be effectively supervising and reviewing those LLM-driven notes?

ZZiad Obermeyer@oziadias · Jul 1

Remember: Writing helps doctors think. Automation skips that step—see this story by @adamcifu here x.com/adamcifu/statu… CC'ing folks thinking about LLMs as tools for better cognition: @m_sendhil @keyonV @2plus2make5 @EricTopol

680

Liam McCoy, MD MSc@LiamGMcCoy · Jun 25

This is also key to our ongoing clinical LLM work at Harvard. An effective prompt is necessary but far from sufficient, and relatively easy compared to the steps of wrangling clinical data streams appropriately into context at the right time

AAndrej Karpathy@karpathy · Jun 25

+1 for "context engineering" over "prompt engineering". People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window…

572

Liam McCoy, MD MSc@LiamGMcCoy · Jun 24

Claude has a spiritual bliss attractor, Gemini has a suicidal shame attractor inside you there are two models...

AAdam Karvonen@a_karvonen · Jun 24

Man, what happened to Gemini? This is like the third time I've seen it threaten suicide ("delete my own source code") after making too many coding mistakes.

261

Liam McCoy, MD MSc@LiamGMcCoy · Jun 23

The fish don't see the water, o3 doesn't smell the slop

MMedical Education Flamingo, MD, PhD@MedEdFlamingo · Jun 23

o3 couldn't understand the irony: chatgpt.com/share/68598b25…

313

Liam McCoy, MD MSc@LiamGMcCoy · Jun 23

I, for one, would never use an LLM to draft my tweets—it's not just quality, it's respect for my followers.

2.0K

Liam McCoy, MD MSc@LiamGMcCoy · Jun 20

ChatGPT and other popular LLMs have too many writing tells. This is why I only use Mistral models pour mon slop d'AI

342