Jason Alan Fries
@jasonafries
Researcher at Stanford University. Working on healthcare AI, multimodal foundation models, and data-centric AI.
🎉 We're thrilled to announce the general release of three de-identified, longitudinal EHR datasets from Stanford Medicine—now freely available for non-commercial research-use worldwide! 🚀 Read our HAI blog post for more details: hai.stanford.edu/news/advancing… 𝗗𝗮𝘁𝗮𝘀𝗲𝘁…
Amazing work by @SnorkelAI —scaling domain expertise for evaluation and data curation is key to unlocking AI’s potential in high-stakes fields like healthcare. So excited for what’s next! 🚀
Agentic AI will transform every enterprise–but only if agents are trusted experts. The key: Evaluation & tuning on specialized, expert data. I’m excited to announce two new products to support this–@SnorkelAI Evaluate & Expert Data-as-a-Service–along w/ our $100M Series D! ---…
A delightful Sunday at #ICLR2025 in the Pediatric AI workshop pediamedai.com/ai4chl/ listening to an exciting talk by @jasonafries describing his exciting work with @drnigam @StanfordHealth and others!
Excited to present this work at ICLR's SynthData Workshop on Sunday April 27! Come through from 11:30-12:30 @ Peridot 202-203 to talk anything synthetic data for post-training, benchmarking, and AI for healthcare in general.
1/🧵Introducing TIMER: Temporal Instruction Modeling and Evaluation for Longitudinal Clinical Records When we evaluate LLMs for reasoning over longitudinal clinical records, can we leverage synthetic data generation to create scalable benchmarks and improve model performance?
!!! I'm at #ICLR2025 to present 🧄Aioli🧄 a unified framework for data mixing on Thursday afternoon! 🔗 arxiv.org/abs/2411.05735 Message me to chat about pre/post training data (mixing, curriculum, understanding); test-time compute/verification; or to try new food 🇸🇬
🎉 Excited to present our #ICLR2025 work—leveraging future medical outcomes to improve pretraining for prognostic vision models. 🖼️ "Time-to-Event Pretraining for 3D Medical Imaging" 👉 Hall 3+2B #23 📍 Sat 26 Apr, 10 AM–12:30 PM 🔗 iclr.cc/virtual/2025/p…
Today at #ICLR2025---come chat with @Changho_Shin_ about our work on what types of data drive weak-to-strong generalization!
Reminder: COLM abstract deadline! Should be an amazing conference this year in Montreal.
1/🧵Introducing TIMER: Temporal Instruction Modeling and Evaluation for Longitudinal Clinical Records When we evaluate LLMs for reasoning over longitudinal clinical records, can we leverage synthetic data generation to create scalable benchmarks and improve model performance?
Can AI in healthcare truly be responsible without full patient histories? New longitudinal EHR datasets provide a better way to benchmark models. Read more from #StanDOM's @jasonafries, Zepeng Frazier Huo, Hejie Cui, @drnigam & Shah Lab colleagues. stanford.io/41lLvg0
1/🧵How do we know if AI is actually ready for healthcare? We built a benchmark, MedHELM, that tests LMs on real clinical tasks instead of just medical exams. #AIinHealthcare Blog, GitHub, and link to leaderboard in thread!
OpenAI's Health AI team is now hiring backend/fullstack SWEs towards our mission of universalizing access to health information! Please apply if you: - Can write maintainable, high-quality backend / fullstack code at high velocity - Are willing to run through walls towards this…
Also, please follow me on BlueSky bsky.app/profile/jason-…
Some new work from our group that I'm very excited about! What makes weak-to-strong generalization possible? We think it's all about data!
What enables a strong model to surpass its weaker teacher? 🚀 Excited to share our ICLR 2025 paper: "Weak-to-Strong Generalization Through the Data-Centric Lens"! 🧵
Excited to share that our paper "Time-to-Event Pretraining for 3D Medical Imaging" has been accepted to ICLR 2025! 🚀 Electronic health records (EHRs) contain a wealth of longitudinal data on disease progression. In this work, we use methods from survival analysis to transform…
🎉 Excited to share that our latest research, 𝘛𝘪𝘮𝘦-𝘵𝘰-𝘌𝘷𝘦𝘯𝘵 𝘗𝘳𝘦𝘵𝘳𝘢𝘪𝘯𝘪𝘯𝘨 𝘧𝘰𝘳 3𝘋 𝘔𝘦𝘥𝘪𝘤𝘢𝘭 𝘐𝘮𝘢𝘨𝘪𝘯𝘨, has been accepted at 𝗜𝗖𝗟𝗥 2025! 🚀 🔍 𝗜𝗺𝗽𝗿𝗼𝘃𝗶𝗻𝗴 𝗠𝗲𝗱𝗶𝗰𝗮𝗹 𝗜𝗺𝗮𝗴𝗲 𝗣𝗿𝗲𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗧𝗶𝗺𝗲-𝘁𝗼-𝗘𝘃𝗲𝗻𝘁…
While we celebrate @deepseek_ai 's release of open-weight models that we can all play with at home, just a friendly reminder that they are not *open-source*; there’s no training / data processing code, and hardly any information about the data.
Excited to share our open-source code for cancer survival prediction using radiology (MRI) and pathology (H&E) images - this walkthrough uses our lightweight domain specific multimodal medical imaging embedding models + adapters to produce hazard scores and survival…
Super excited to join @ColumbiaDBMI this coming July! If you're looking for postdoctoral or PhD opportunities in Health AI (in particular in building foundation models for EHR and multimodal health data), message me!
Matthew McDermott (@MattBMcDermott) and Xuhai “Orson” Xu (Orson_Xu) will both join the DBMI faculty in 2025 to enhance both the research and training at one of the nation’s oldest biomedical informatics departments. #DBMI24in2024 dbmi.columbia.edu/matthew-mcderm…
We kicked off our day with a fantastic talk from our first Keynote speaker @jasonafries on “The Missing Context Problem in Foundation Models for Healthcare” Thanks Jason for a great talk! 😃