Weijia Shi

@WeijiaShi2

PhD student @uwnlp @allen_ai | Prev @MetaAI @CS_UCLA | 🏠 http://weijiashi.notion.site

Seattle, WA

Joined August 2019

1KFollowing

8KFollowers

Pinned

Weijia Shi@WeijiaShi2 · Jul 9

Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data…

AAi2@allen_ai · Jul 9

Introducing FlexOlmo, a new paradigm for language model training that enables the co-development of AI through data collaboration. 🧵

269

52.0K

Weijia Shi@WeijiaShi2 · Jul 26

Phase 1 of Physics of Language Models code release ✅our Part 3.1 + 4.1 = all you need to pretrain strong 8B base model in 42k GPU-hours ✅Canon layers = strong, scalable gains ✅Real open-source (data/train/weights) ✅Apache 2.0 license (commercial ok!) 🔗github.com/facebookresear…

ZZeyuan Allen-Zhu, Sc.D.@ZeyuanAllenZhu · May 3

(1/8)🍎A Galileo moment for LLM design🍎 As Pisa Tower experiment sparked modern physics, our controlled synthetic pretraining playground reveals LLM architectures' true limits. A turning point that might divide LLM research into "before" and "after." physics.allen-zhu.com/part-4-archite…

384

270

49.0K

Weijia Shi@WeijiaShi2 · Jul 26

I’m gonna be recruiting students thru both @LTIatCMU (NLP) and @CMU_EPP (Engineering and Public Policy) for fall 2026! If you are interested in reasoning, memorization, AI for science & discovery and of course privacy, u can catch me at ACL! Prospective students fill this form:

NNiloofar (✈️ ACL)@niloofar_mire · May 6

📣Thrilled to announce I’ll join Carnegie Mellon University (@CMU_EPP & @LTIatCMU) as an Assistant Professor starting Fall 2026! Until then, I’ll be a Research Scientist at @AIatMeta FAIR in SF, working with @kamalikac’s amazing team on privacy, security, and reasoning in LLMs!

244

33.0K

Weijia Shi Retweeted

Yihe Deng@Yihe__Deng · Jul 24

🙌 We've released the full version of our paper, OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles Our OpenVLThinker-v1.2 is trained through three lightweight SFT → RL cycles, where SFT first “highlights” reasoning behaviors and RL then explores and…

176

11.0K

Weijia Shi@WeijiaShi2 · Jul 24

Counting down the days until ACL, hosting another @aclmentorship session, diving into timely topics: when it is so tempting to rely on AI for writing, what role do we play, and what might we be losing? #ACL2025NLP #NLProc

AACL Mentorship@aclmentorship · Jul 24

📢 Join us for the ACL Mentorship Session @aclmeeting #ACL2025NLP #NLProc • Session Link: mentorship.aclweb.org/schedule • Ask Questions: tinyurl.com/y2v2j462 Mentors: • @May_F1_ (@hkust) • @d_aumiller (@cohere) • @vernadankers (@Mila_Quebec) • @ziqiao_ma (@UMichCSE) •…

1.0K

Weijia Shi@WeijiaShi2 · Jul 14

Can we build an operating system entirely powered by neural networks? Introducing NeuralOS: towards a generative OS that directly predicts screen images from user inputs. Try it live: neural-os.com Paper: huggingface.co/papers/2507.08… Inspired by @karpathy's vision. 1/5

AAndrej Karpathy@karpathy · May 1

"Chatting" with LLM feels like using an 80s computer terminal. The GUI hasn't been invented, yet but imo some properties of it can start to be predicted. 1 it will be visual (like GUIs of the past) because vision (pictures, charts, animations, not so much reading) is the 10-lane…

185

26.0K

Weijia Shi@WeijiaShi2 · Jul 24

How to write good reviews & rebuttals? We've invited 🌟 reviewers to share their expertise in person at our ACL mentorship session #ACL2025NLP next week

AACL Mentorship@aclmentorship · Jul 24

4.0K

Weijia Shi Retweeted

Niloofar (✈️ ACL)@niloofar_mire · Jul 24

🧵 Academic job market season is almost here! There's so much rarely discussed—nutrition, mental and physical health, uncertainty, and more. I'm sharing my statements, essential blogs, and personal lessons here, with more to come in the upcoming weeks! ⬇️ (1/N)

249

259

25.0K

Weijia Shi@WeijiaShi2 · Jul 23

Building AI reasoning models with extremely long context lengths - think days, weeks, even years of context - is the next big challenge in AI. that's why i'm extremely excited about the latest work from Ao Qu @ao_qu18465, incoming PhD student in our group, on MEM1: RL for Memory…

AAo Qu@ao_qu18465 · Jul 23

🚀 Excited to share my first tweet and to introduce our latest work: MEM1: RL for Memory Consolidation in Long-Horizon Agents. Long-horizon agents (e.g., deep research, web agents) typically store all observations, actions, and intermediate thoughts in context. However, much of…

7.0K

Weijia Shi@WeijiaShi2 · Jul 22

Since our initial arXiv post, several concurrent papers have introduced new architectures with log-linear properties in various forms. Two personal favorites of mine (among others) are: - Transformer-PSM by @MorrisYau et al., and - Radial Attention by Xingyang and @lmxyy1999 et…

HHan Guo@HanGuo97 · Jun 6

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

273

164

20.0K

Weijia Shi Retweeted

Stella Li ➡️ CogSci2025@StellaLisy · Jul 22

WHY do you prefer something over another? Reward models treat preference as a black-box😶‍🌫️but human brains🧠decompose decisions into hidden attributes We built the first system to mirror how people really make decisions in our #COLM2025 paper🎨PrefPalette✨ Why it matters👉🏻🧵

369

261

41.0K

Weijia Shi Retweeted

Abhilasha Ravichander@lasha_nlp · Jul 22

Life update: I’m excited to share that I’ll be starting as faculty at the Max Planck Institute for Software Systems(@mpi_sws_) this Fall!🎉 I’ll be recruiting PhD students in the upcoming cycle, as well as research interns throughout the year: lasharavichander.github.io/contact.html

529

47.0K

Weijia Shi@WeijiaShi2 · Jul 22

Check out @YuncongYY post on test-time scaling for spatial reasoning with world models!

YYuncong Yang@YuncongYY · Jul 21

Test-time scaling nailed code & math—next stop: the real 3D world. 🌍 MindJourney pairs any VLM with a video-diffusion World Model, letting it explore an imagined scene before answering. One frame becomes a tour—and the tour leads to new SOTA in spatial reasoning. 🚀 🧵1/

5.0K

Weijia Shi@WeijiaShi2 · Jul 22

Spatial reasoning from a single image is inherently difficult, but it becomes significantly easier when leveraging a controlled world model, analogous to the mental models used by humans! Code: github.com/UMass-Embodied…

YYuncong Yang@YuncongYY · Jul 21

15.0K

Weijia Shi@WeijiaShi2 · Jul 21

Gemini + Deep Think won IMO gold this year 🏅 super honored to be part of this dream team!

GGoogle DeepMind@GoogleDeepMind · Jul 21

An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵

350

8.0K

Weijia Shi@WeijiaShi2 · Jul 17

Come check out recent work on history guided video diffusion tomorrow!

KKiwhan Song@kiwhansong0 · Jul 17

Come visit our #ICML2025 poster on Diffusion Forcing Transformer tomorrow! Stop by to chat about sequence/video diffusion, or anything related to generative and world models. I’ll be presenting with @du_yilun on Thursday, 4:30–7pm at West Hall B2-B3 (#W-205).

7.0K

Weijia Shi Retweeted

Pratyush Maini@pratyushmaini · Jul 16

At #ICML2025, I am super excited to introduce STAMP. This is a marriage b/w dataset inference & watermarking that finally(!) lets creators PROVE their content was used to train LLMs🔍 Its a MAJOR push taking the academic problem into real world. w/Saksham Rastogi @danish037 🧵

103

13.0K

Weijia Shi@WeijiaShi2 · Jul 16

I am at #ICML2025! 🇨🇦🏞️ Catch me: 1️⃣ Today at the @WiMLworkshop mentoring roundtables (1-2pm in W211-214) 2️⃣ Presenting this paper👇 tomorrow 11-11:30 at East #1205 3️⃣ At the Actionable Interpretability @ActInterp workshop on Saturday in East Ballroom A (I’m an organizer!)

AAaron Mueller@amuuueller · Apr 23

Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work? We propose 😎 𝗠𝗜𝗕: a Mechanistic Interpretability Benchmark!

6.0K

Weijia Shi@WeijiaShi2 · Jul 12

Check out our work led by @Cumquaaa on a hybrid autoregressive-diffusion architecture for image generation -- it flexibly balances the number of autoregressive and diffusion layers for optimal generation quality and inference speed! Autoregressive vs. diffusion -- you don't have…

JJunhao Chen@Cumquaaa · Jun 30

🚀 Training an image generation model and picking sides between autoregressive (AR) and diffusion? Why not both? Check out MADFormer with half of the model layers for AR and half for diffusion. AR gives a fast guess for the next patch prediction while diffusion helps refine the…

2.0K

Weijia Shi@WeijiaShi2 · Jul 15

If you can't make it to ICML and want to learn more about @du_yilun's work, check out the great talk he gave at the #KempnerInstitute's #NeuroAI2025 symposium: youtube.com/watch?v=UKbLBO… #AI #NeuroAI

YYilun Du@du_yilun · Jul 14

I'll be at @icmlconf! Will help present: - Scene Understanding with Generative Models (shorturl.at/JrvJL) - History-guided World Models (shorturl.at/lCkfc) - Adaptable World Models (shorturl.at/99Xmw) We'll also host a workshop on physical world models!

7.0K

Weijia Shi Retweeted

Akari Asai@AkariAsai · Jul 15

I'll be hiring a couple of Ph.D. students at CMU (via LTI or MLD) in the upcoming cycle! If you are interested in joining my group, please read the FAQ before reaching out to me via email :) docs.google.com/document/d/12V…

9.0K