Josiah Aklilu

@AkliluJosiah2

PhD student @Stanford | part of MARVL graciously advised by @syeung10 | AI for surgery

Joined January 2022

155Following

140Followers

Pinned

Josiah Aklilu@AkliluJosiah2 · May 30

🚨 Checkout our new method ZEAL -> zero-shot action localization in long-form video using LLMs + VLMs, accepted to the VidLLMs workshop at #CVPR2025! 📍 No labels. No fine-tuning. Strong zero-shot results. 📝arxiv.org/abs/2410.14340 💫 Shoutout to my amazing collaborator…

171

Josiah Aklilu Retweeted

Yuhui Zhang@Zhang_Yu_hui · Jul 10

🧬 What if we could build a virtual cell to predict how it responds to drugs or genetic perturbations? Super excited to introduce CellFlux at #ICML2025 — an image generative model that simulates cellular morphological changes from microscopy images. yuhui-zh15.github.io/CellFlux/ 💡…

6.0K

Josiah Aklilu Retweeted

Nick Jiang@nickhjiang · Jun 10

Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by retraining with “register” tokens, we find the mechanism behind outliers and make registers at ✨test-time✨—giving clean features and better performance! 🧵

134

997

826

128.0K

Josiah Aklilu Retweeted

Ludwig Schmidt@lschmidt3 · Jun 5

Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.

212

1.0K

875

167.0K

Josiah Aklilu Retweeted

Yuhui Zhang@Zhang_Yu_hui · Apr 30

📢 The First Workshop on Multimodal Foundation Models for Biomedicine (MMFM-BIOMED) at #CVPR2025 is still accepting submissions until May 7, 11:59 PM PT! Join speakers from Stanford, Google, MIT & more exploring the intersection of #CV, #NLP & #healthcare. Submit your 4-page…

3.0K

Josiah Aklilu@AkliluJosiah2 · Apr 23

I'm at #ICLR2025 presenting "Video Action Differencing". Keen to chat with anyone interested in MLLMs - both for general data & for scientific reasoning

JJames Burgess (at CVPR)@jmhb0 · Mar 12

🚨Large video-language models LLaVA-Video can do single-video tasks. But can they compare videos? Imagine you’re learning a sports skill like kicking: can an AI tell how your kick differs from an expert video? 🚀 Introducing "Video Action Differencing" (VidDiff), ICLR 2025 🧵

950

Josiah Aklilu@AkliluJosiah2 · Apr 8

🤗The SmolVLM report is out, with all the experiments, findings, and insights that led to high performance at tiny sizes🤏. 📱These models can run on most mobile/edge devices. 📖Give it a look!

AAndi Marafioti@andimarafioti · Apr 8

Today, we share the tech report for SmolVLM: Redefining small and efficient multimodal models. 🔥 Explaining how to design a tiny 256M VLM that uses less than 1GB of RAM and outperforms our 80B models from 18 months ago! Here are the coolest insights from our experiments: ✨…

5.0K

Josiah Aklilu Retweeted

Alejandro Lozano@Ale9806_ · Apr 2

Earlier this year, we released the BIOMEDICA dataset, featuring 24 million unique image caption pairs and 30 million image references derived from open-source biomedical literature. It's been great to see the community engaging with it—we're currently seeing around 6K downloads…

3.0K