Victoria X Lin

@VictoriaLinML

Research Scientist @AIatMeta | MoMa🖼 • RA-DIT🔍• OPT-IML Ex: @SFResearch • PhD @uwcse 📜 http://threads.net/@v.linspiration 🌴 Bay Area

Menlo Park, CA

Joined December 2010

933Following

4KFollowers

Pinned

Victoria X Lin@VictoriaLinML · Aug 1

1/n Introducing MoMa 🖼, our new sparse early-fusion architecture for mixed-modal language modeling that significantly boosts pre-training efficiency 🚀 (arxiv.org/pdf/2407.21770). MoMa employs a mixture-of-expert (MoE) framework with modality-specific expert groups. Given any…

VictoriaLinML's tweet image. 1/n Introducing MoMa 🖼, our new sparse early-fusion architecture for mixed-modal language modeling that significantly boosts pre-training efficiency 🚀 (arxiv.org/pdf/2407.21770).
MoMa employs a mixture-of-expert (MoE) framework with modality-specific expert groups. Given any…

306

189

95.0K

Pinned

Victoria X Lin@VictoriaLinML · Jun 4

We don't often see prep thread for paper announcement on X, but this mini crash course on the information capacity of LLM is well worth checking out

jjxmo@jxmnop · Jun 1

in prep for our new research dropping on ArXiv tomorrow (i think), here is a thread about.... CAPACITY MEASUREMENTS FOR LANGUAGE MODELS 🧵

2.0K

Pinned

Victoria X Lin Retweeted

jxmo@jxmnop · May 25

the scale of data collection in the AI labs pales in comparison to 2010s google it’s mostly web scraping and data-labeling. compare that to diligently photographing streets of every country, mapping earth via satellite, scanning every book known to man.. now *that* was ambitious

148

3.0K

322

135.0K

Pinned

Victoria X Lin@VictoriaLinML · Mar 27

and yet david burdeny still did it better in 2007 with a camera

�𝖓𝖎𝖓𝖊 🕯@atlanticesque · Mar 26

I think this was the first AI image to really strike me. The first one to make me think that people were going to use this stuff to make very interesting works. Feels like a million years ago now.

240

47.0K

612.0K

18.0K

12.0M

Victoria X Lin Retweeted

Infini-AI-Lab@InfiniAILab · Jun 16

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

221

108

77.0K

Victoria X Lin@VictoriaLinML · Jun 17

So this is not a benchmark for software engineering agents. It’s meant to test core reasoning and intelligence through coding—backed by 71 pages of deep analysis from some of the best competitive programmers out there. This effort was carried out by students across multiple…

ZZihan Zheng@ZihanZheng71803 · Jun 17

We introduce LiveCodeBench Pro, a live, exceptionally challenging benchmark comprising competitive programming problems sourced from IOI, Codeforces, and ICPC. Frontier models such as o3, and Gemini 2.5 achieve scores of 0% on the Hard split. Leaderboard: livecodebenchpro.com

266

126

72.0K

Victoria X Lin Retweeted

Lili Yu (ICLR2025)@liliyu_lili · Jun 8

splitting transformer parameters by⭐Understanding (X→text) vs. 📷Generation (X→image) functionality. We already did that in LMFusion

2.0K

Victoria X Lin@VictoriaLinML · Jun 6

Let's talk about Mixture-of-Transformers (MoT) and heterogeneous omni-model training. 1. Inspired by prior architectures consisting of modality-specific parameters—such as Flamingo, CogVLM, BEIT-3, and MoMA—MoT (arxiv.org/abs/2411.04996) pushes this idea further by using…

mmitte.ai@mitte_ai · May 27

Mixture-of-Transformers (MoTs) gain traction in new model designs. Here’s a visual breakdown of how it works 🧠👇

121

9.0K

Victoria X Lin@VictoriaLinML · May 27

Surprising result! Spurious rewards -- even random rewards -- boost RLVR performance on Qwen models — but not on OLMo or others. The paper explores some hypotheses, but it’s still unclear why. Key takeaway: always validate across base models when probing reasoning with RLVR.

SStella Li ➡️ CogSci2025@StellaLisy · May 27

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

4.0K

Victoria X Lin@VictoriaLinML · May 22

ByteDance | Seed has been consistently impressive over the past few months, publishing some truly insightful papers. BAGEL is one of them. I learned a lot from reading it. A few key takeaways: - Embedded "thinking" directly into native media generation, proving its effectiveness…

HHaoqi Fan@HaoqiFan · May 21

🚀 BAGEL — the Unified Multimodal Model with emergent capabilities and production-ready performance — is finally live! Dive in here: 👉 bagel-ai.org

2.0K

Victoria X Lin Retweeted

Kevin Patrick Murphy@sirbayes · May 20

I am pleased to announce a new version of my RL tutorial. Major update to the LLM chapter (eg DPO, GRPO, thinking), minor updates to the MARL and MBRL chapters and various sections (eg offline RL, DPG, etc). Enjoy! arxiv.org/abs/2412.05265

452

2.0K

117.0K

Victoria X Lin@VictoriaLinML · May 10

This is really cool work! I wonder if we could generalize even better by introducing modality as feature embedding to the router instead. That is router gets privileged information.

WWeixin Liang@liang_weixin · May 9

🎉 Excited to share: "𝐌𝐢𝐱𝐭𝐮𝐫𝐞-𝐨𝐟-𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬 (𝐌𝐨𝐓)" has been officially accepted to TMLR (March 2025) and the code is now open-sourced! 📌 GitHub repo: github.com/facebookresear… 📄 Paper: arxiv.org/abs/2411.04996 How can we reduce pretraining costs for…

9.0K

Victoria X Lin Retweeted

Weixin Liang@liang_weixin · May 9

437

299

43.0K

Victoria X Lin Retweeted

Rulin Shao@RulinShao · May 1

Meet ReasonIR-8B✨the first retriever specifically trained for reasoning tasks! Our challenging synthetic training data unlocks SOTA scores on reasoning IR and RAG benchmarks. ReasonIR-8B ranks 1st on BRIGHT and outperforms search engine and retriever baselines on MMLU and GPQA🔥

351

178

50.0K

Victoria X Lin Retweeted

hardmaru@hardmaru · Apr 23

We should host more top ML conferences (ICLR, ICML, NeurIPS) in Asia

689

84.0K

Victoria X Lin@VictoriaLinML · Apr 16

Our previous work showed that 𝐜𝐫𝐞𝐚𝐭𝐢𝐧𝐠 𝐯𝐢𝐬𝐮𝐚𝐥 𝐜𝐡𝐚𝐢𝐧‑𝐨𝐟‑𝐭𝐡𝐨𝐮𝐠𝐡𝐭𝐬 𝐯𝐢𝐚 𝐭𝐨𝐨𝐥 𝐮𝐬𝐞 significantly boosts GPT‑4o’s visual reasoning performance. Excited to see this idea incorporated into OpenAI’s o3 and o4‑mini models (openai.com/index/thinking…).…

WWeijia Shi@WeijiaShi2 · Sep 30

Visual Chain-of-Thought with ✏️Sketchpad Happy to share ✏️Visual Sketchpad accepted to #NeurIPS2024. Sketchpad thinks🤔by creating visual reasoning chains for multimodal LMs, enhancing GPT-4o's reasoning on math and vision tasks We’ve open-sourced code: visualsketchpad.github.io

254

26.0K

Victoria X Lin Retweeted

Aston Zhang@astonzhangAZ · Apr 5

Our Llama 4’s industry leading 10M+ multimodal context length (20+ hours of video) has been a wild ride. The iRoPE architecture I’d been working on helped a bit with the long-term infinite context goal toward AGI. Huge thanks to my incredible teammates! 🚀Llama 4 Scout 🔹17B…

136

1.0K

371

210.0K

Victoria X Lin Retweeted

Ahmad Al-Dahle@Ahmad_Al_Dahle · Apr 5

Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4…

326

934

6.0K

1.0K

1.1M

Victoria X Lin Retweeted

Xueguang Ma@xueguang_ma · Feb 26

Introducing DRAMA🎭: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers. We propose to train a smaller dense retriever using a pruned LLM as the backbone, fine-tuned with diverse LLM data augmentations. With single-stage training, DRAMA achieves strong…

11.0K

Victoria X Lin@VictoriaLinML · Feb 23

we've been working on democratizing fast kernel writing on the @PyTorch team. try the challenge, either you or your AI!

GGPU MODE@GPU_MODE · Feb 23

Write a fast kernel and run it on Discord. See how you compare against the best! If you're familiar with Leetcode, Kaggle or Codeforces then this should feel right at home

355

127

46.0K

Victoria X Lin Retweeted

Rulin Shao@RulinShao · Feb 21

New features added to MassiveDS-pipe to make it painless to build and serve trillion-token datastore: 1. Distributed API serving (<30ms latency); 2. Efficient indices: IVF-Flat, IVF-PQ; 3. Memory-free fast passage loading. It has been adopted by AI2 OpenScholar and Meta EWE 🥳

132

25.0K