Ethan Chern
@ethanchern
Master's of AI from Language Technologies Institute (LTI) in the School of Computer Science (SCS) @ CMU
We present Thinking with Generated Images, demonstrating how a single unified LMM can be trained to perform vision generation tasks using both textual and visual intermediate steps, along with critique and refinement capabilities. We believe that spontaneous multimodal thinking…
What if AI could mentally sketch its thoughts, just like you? The missing piece of AI multimodal reasoning is here! What if AI could daydream in images? Closing the imagination gap between humans and AI! Introducing Thinking with Generated Images — a new paradigm where large…
Ever seen your VLM go rogue? 🤯 We found a novel bug: VLMs for vision tasks can leak <|image_pad|> tokens into their text responses! A subtle yet critical issue we believe is a first. This sneaky leakage can CRASH your RL training when recomputing log_probs! 😱 We pinpointed…
This paper makes a bold claim! AlphaGo Moment for Model Architecture Discovery The researchers introduce ASI-Arch, the first Artificial Superintelligence for AI Research (ASI4AI), enabling fully automated neural architecture innovation. No human-designed search space. No human…
Thank you very much for your sharing!!! Check out our resources: Datasets & Models: huggingface.co/MegaScience Code base: github.com/GAIR-NLP/MegaS… Scientific Evaluation System: github.com/GAIR-NLP/lm-op…
MegaScience Pushing the Frontiers of Post-Training Datasets for Science Reasoning
🚨 New release: MegaScience The largest & highest-quality post-training dataset for scientific reasoning is now open-sourced (1.25M QA pairs)! 📈 Trained models outperform official Instruct baselines 🔬 Covers 7+ disciplines with university-level textbook-grade QA 📄 Paper:…
Apart from the performance, it’s pure entertainment just watching Qwen3‑Coder build Qwen Code all by itself. Agentic coding is really something: it explores, understands, plans, and acts seamlessly. Honored to be “in the game”—even if my entire work so far is smashing the Enter…
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
Excited to share that our two papers have been accepted to #ICML2025! @icmlconf However, I can't be there in person due to visa issues. What a pity.🥲 Feel free to check out our poster, neither online nor offline in the Vancouver Convention Center. Programming Every Example:…
Excited to share that our two papers have been accepted to #ICML2025! @icmlconf However, I can't be there in person due to visa issues. What a pity.🥲 Feel free to check out our poster, neither online nor offline in the Vancouver Convention Center. Programming Every Example:…
🎉 Excited to announce that LIMO has been accepted by COLM2025 @COLM_conf ! We'll be releasing an updated paper soon with detailed data construction processes and a new version of dataset - smaller in size but with better performance. Stay tuned!
🤔 How many examples does an LLM need to learn competition-level math? Conventional wisdom: 100,000+ examples Our discovery: Just 817 carefully chosen ones 🤩 With pure SFT, LIMO achieves: 57.1% on AIME 94.8% on MATH LIMO: Less is More for Reasoning 📝 🔗 arxiv.org/pdf/2502.03387
FacTool has been accepted to COLM 2025 - two years after its arXiv debut! While the landscape of LLMs has changed a lot since then, tool-augmented LLMs and RAG are still among the most effective and practical approaches for detecting / mitigating hallucinations (ref:…
In the era of 🤖#GenerativeAI, text of all forms can be generated by LLMs. How can we identify and rectify *factual errors* in the generated output? We introduce FacTool, a framework for factuality detection in Generative AI. Website: ethanc111.github.io/factool_websit… (1/n)
Excited to share our new survey on the reasoning paradigm shift from "Think with Text" to "Think with Image"! 🧠🖼️ Our work offers a roadmap for more powerful & aligned AI. 🚀 📜 Paper: arxiv.org/pdf/2506.23918 ⭐ GitHub (400+🌟): github.com/zhaochen0110/A…
OctoThinker Mid-training Incentivizes Reinforcement Learning Scaling
What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?…
🐙Octothinker tech report is finally out! We also release the 70B math-focusing mid-training dataset -- MegaMath-Web-Pro-Max. Hope you'll find it useful!🤗 👇 hf.co/datasets/OctoT… huggingface.co/papers/2506.20…
Say hi to 🐙 OctoThinker — our new mid-training efforts for building strong reasoning base models tailored for the RL scaling era. Still a WIP, but we're excited to share our early insights into rethinking base model development. 📖 Blog: tinyurl.com/OctoThinker 🤗 Huggingface:…
Say hi to 🔮MegaMath-Pro-Max High-quality corpora are vital for mid-training. When it comes to the math domain? Let me tell you the behind recipe. 1. Curating Pipeline Step 1: uniformly and randomly sample millions of documents from the MegaMath-Web corpus, stratified by…
The technical report comes (also along with MegaMath-Web-Pro-Max! Paper: arxiv.org/abs/2506.20512 Data: huggingface.co/datasets/OctoT… (uploading...
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling "we investigate how mid-training strategies shape RL dynamics, focusing on two representative model families: Qwen and Llama." "we introduce a two-stage mid-training strategy, Stable-then-Decay, in which base…
Finally had a bit of time to jot down some thoughts on this solid, open data engineering work from @essential_ai. This work brings Essential-Web, a 24T-token pre-training corpus, to the open-source community. I've always appreciated open-source research, as it can significantly…
[1/5] 🚀 Meet Essential-Web v1.0, a 24-trillion-token pre-training dataset with rich metadata built to effortlessly curate high-performing datasets across domains and use cases!
The real breakthrough isn't better AI—it's breaking free from nature's constraints We're witnessing a paradigm shift from "passive adaptation" to "active construction" in AI training. 🌊 The old way: AI learns from whatever data naturally exists • Constrained by existing…
📑Interesting paper by GAIR community Thinking with Generated Images🔥 enables a single large multimodal model to generate and reason with visual thoughts, greatly improving its ability to tackle complex vision and multimodal tasks. huggingface.co/papers/2505.22……
📑Interesting paper by GAIR community Thinking with Generated Images🔥 enables a single large multimodal model to generate and reason with visual thoughts, greatly improving its ability to tackle complex vision and multimodal tasks. huggingface.co/papers/2505.22……
Check out our latest work on self-improving LLMs, where we try to see if LLMs can utilize their internal self consistency as a reward signal to bootstrap itself using RL. TL;DR: it can, to some extent, but then ends up reward hacking the self-consistency objective. We try to see…