James Burgess (at CVPR)
@jmhb0
https://jmhb0.github.io/ PhD student in ML, computer vision & biology at Stanford 🇦🇺
Introducing MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research #CVPR2025 ✅ 1k multimodal reasoning VQAs testing MLLMs for science 🧑🔬 Biology researchers manually created the questions 🤖 RefineBot: a method for fixing QA language shortcuts 🧵

I've been working with the @reflection_ai team on Asimov, our best-in-class code research agent. I am super excited for you all to try it. Let me know here if you want to try it and I can move you off the waitlist. :)
Engineers spend 70% of their time understanding code, not writing it. That’s why we built Asimov at @reflection_ai. The best-in-class code research agent, built for teams and organizations.
Get around our very cool #ICML paper that predicts how biological cells respond to drug treatments or gene knockdowns. It was led by the legendary @Zhang_Yu_hui and @hhhhh2033528, and I was happy to contribute a tiny bit :)
🧬 What if we could build a virtual cell to predict how it responds to drugs or genetic perturbations? Super excited to introduce CellFlux at #ICML2025 — an image generative model that simulates cellular morphological changes from microscopy images. yuhui-zh15.github.io/CellFlux/ 💡…
I'm at CVPR! Come see me at one of my posters, or reach out for a chit chat. MicroVQA: reasoning llm benchmark in biology Sat 5-7pm, hall D, poster #357 jmhb0.github.io/microvqa/ BIOMEDICA: a massive vision-language dataset Sat 5-7pm, hall D, poster #374 minwoosun.github.io/biomedica-webs…
Introducing MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research #CVPR2025 ✅ 1k multimodal reasoning VQAs testing MLLMs for science 🧑🔬 Biology researchers manually created the questions 🤖 RefineBot: a method for fixing QA language shortcuts 🧵
My lab is starting at UW-Madison! This is a unique opportunity to contribute to impactful computational neuropathology research in a collaborative environment. Join the Nirschl Lab and help drive discoveries that improve our understanding of neurodegenerative disorders!🧠
I'm at #ICLR2025 presenting "Video Action Differencing". Keen to chat with anyone interested in MLLMs - both for general data & for scientific reasoning
🚨Large video-language models LLaVA-Video can do single-video tasks. But can they compare videos? Imagine you’re learning a sports skill like kicking: can an AI tell how your kick differs from an expert video? 🚀 Introducing "Video Action Differencing" (VidDiff), ICLR 2025 🧵
Three papers being presented by my amazing collaborators at #ICLR2025! 🌟 (sadly I can't make it) 1. Mechanistic Interpretability Meets Vision Language Models: Insights and Limitations 🔍 A deep dive into mechanistic interpretation techniques for VLMs & future…
🤗The SmolVLM report is out, with all the experiments, findings, and insights that led to high performance at tiny sizes🤏. 📱These models can run on most mobile/edge devices. 📖Give it a look!
Today, we share the tech report for SmolVLM: Redefining small and efficient multimodal models. 🔥 Explaining how to design a tiny 256M VLM that uses less than 1GB of RAM and outperforms our 80B models from 18 months ago! Here are the coolest insights from our experiments: ✨…
Excited to see SmolVLM powering BMC-SmolVLM in the latest BIOMEDICA update! At just 2.2B params, it matches 7-13B biomedical VLMs. Check out the full release: @huggingface #smolvlm
Earlier this year, we released the BIOMEDICA dataset, featuring 24 million unique image caption pairs and 30 million image references derived from open-source biomedical literature. It's been great to see the community engaging with it—we're currently seeing around 6K downloads…
Earlier this year, we released the BIOMEDICA dataset, featuring 24 million unique image caption pairs and 30 million image references derived from open-source biomedical literature. It's been great to see the community engaging with it—we're currently seeing around 6K downloads…