Dhruv Gautam
@dhrvji
generalization @berkeley_ai @arcinstitute
Really fun project I worked on at @arcinstitute, very bullish for how perturbation models will transform the way research is done in the next few years
Introducing Arc Institute’s first virtual cell model: STATE
Teaching computer vision next semester? Hoping to finally learn about diffusion models in 2025? Check out this diffusion project that we designed and test-drove this past semester at Berkeley and Michigan!
Routing is quickly becoming a necessity to be at the frontier: dhruvji.github.io/routing.html New post where I describe the broader implications of labs having to create their own RL data; the real frontier will soon just be routing correctly between "frontier" models.
Next-token prediction is more customizable than you think: dhruvji.github.io/next_token.html New post where I wrote down some thoughts on two (imo) under-explored directions in ML: understanding models with models and training low-latency models to predict discrete actions.
Register today for the Virtual Cell Challenge and use AI to solve one of biology’s most complex problems. Announced in @CellCellPress, the competition is hosted by Arc Institute and sponsored by @nvidia, @10xGenomics, and @UltimaGenomics.
Cells are dynamic, messy and context dependent. Scaling models across diverse states needs flexibility to capture heterogeneity Introducing State, a transformer that predicts perturbation effects by training over sets of cells Team effort led by the unstoppable @abhinadduri
Introducing Arc Institute’s first virtual cell model: STATE
Over the last 2 weeks, I took a deep dive into Evo 2, Arc's Genomic Foundation model. But, I couldn't find a crisp primer on Evo 2 that covered the decisions for the ML architecture, the inference-time scaling results or the mechanistic interpretability results. So, I wrote one!
Announcing Evo 2: The largest publicly available, AI model for biology to date, capable of understanding and designing genetic code across all three domains of life. arcinstitute.org/manuscripts/Ev…
Thrilled to share our @IEEESSP '25 work "Myco 🌳🍄: Unlocking Polylogarithmic Accesses in Metadata-Private Messaging" with @deevashwer, @kean00reeves, @ralucaadapopa. We break a decade-old asymptotic barrier in cryptographic metadata-private messaging. eprint.iacr.org/2025/687👇
One of the craziest projects I’ve been a part of, excited for more to come from @AutoScienceAI!
Introducing Carl, the first AI system to create a research paper that passes peer review. Carl's work was just accepted at an @ICLR_conf workshop on the Tiny Papers track. Carl forms new research hypotheses, tests them & writes up results. Learn more: autoscience.ai/blog/meet-carl…
Introducing Carl, the first AI system to create a research paper that passes peer review. Carl's work was just accepted at an @ICLR_conf workshop on the Tiny Papers track. Carl forms new research hypotheses, tests them & writes up results. Learn more: autoscience.ai/blog/meet-carl…
last year i learned that writing ML valentines by hand doesn't scale.. so this year i made a website :)
wrote my first personal blog tldr: recent thoughts about the human brain and predicting the effects of experiences dhruvji.github.io/mind_transform…
Some thoughts on how to think about "world models" in language models and beyond: lingo.csail.mit.edu/blog/world_mod…
We should call models like Llama 3, Mixtral, etc. “open-weight models”, not “open-source models”. For a model to be open-source, the code and training data need to be public (good examples: GPT-J, OLMo, RedPajama, StarCoder, K2, etc.). Weights are like an exe file, which would be…