Ethan Chern

@ethanchern

Master's of AI from Language Technologies Institute (LTI) in the School of Computer Science (SCS) @ CMU

Joined July 2023

272Following

146Followers

Pinned

Ethan Chern@ethanchern · May 29

We present Thinking with Generated Images, demonstrating how a single unified LMM can be trained to perform vision generation tasks using both textual and visual intermediate steps, along with critique and refinement capabilities. We believe that spontaneous multimodal thinking…

PPengfei Liu@stefan_fee · May 29

What if AI could mentally sketch its thoughts, just like you? The missing piece of AI multimodal reasoning is here! What if AI could daydream in images? Closing the imagination gap between humans and AI! Introducing Thinking with Generated Images — a new paradigm where large…

1.0K

Pinned

Ethan Chern Retweeted

Yan Ma@ManTle_Ma · Jun 3

Ever seen your VLM go rogue? 🤯 We found a novel bug: VLMs for vision tasks can leak <|image_pad|> tokens into their text responses! A subtle yet critical issue we believe is a first. This sneaky leakage can CRASH your RL training when recomputing log_probs! 😱 We pinpointed…

2.0K

Ethan Chern Retweeted

机

机器之心 JIQIZHIXIN@jiqizhixin · Jul 25

This paper makes a bold claim! AlphaGo Moment for Model Architecture Discovery The researchers introduce ASI-Arch, the first Artificial Superintelligence for AI Research (ASI4AI), enabling fully automated neural architecture innovation. No human-designed search space. No human…

376

219

34.0K

Ethan Chern@ethanchern · Jul 23

Thank you very much for your sharing!!! Check out our resources: Datasets & Models: huggingface.co/MegaScience Code base: github.com/GAIR-NLP/MegaS… Scientific Evaluation System: github.com/GAIR-NLP/lm-op…

AAK@_akhaliq · Jul 23

MegaScience Pushing the Frontiers of Post-Training Datasets for Science Reasoning

471

Ethan Chern Retweeted

Run-Ze Fan@Vfrz525_ · Jul 23

🚨 New release: MegaScience The largest & highest-quality post-training dataset for scientific reasoning is now open-sourced (1.25M QA pairs)! 📈 Trained models outperform official Instruct baselines 🔬 Covers 7+ disciplines with university-level textbook-grade QA 📄 Paper:…

252

136

19.0K

Ethan Chern@ethanchern · Jul 23

Apart from the performance, it’s pure entertainment just watching Qwen3‑Coder build Qwen Code all by itself. Agentic coding is really something: it explores, understands, plans, and acts seamlessly. Honored to be “in the game”—even if my entire work so far is smashing the Enter…

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

4.0K

Ethan Chern@ethanchern · Jul 18

Excited to share that our two papers have been accepted to #ICML2025! @icmlconf However, I can't be there in person due to visa issues. What a pity.🥲 Feel free to check out our poster, neither online nor offline in the Vancouver Convention Center. Programming Every Example:…

ZZengzhi Wang@SinclairWang1 · Jul 18

3.0K

Ethan Chern@ethanchern · Jul 9

🎉 Excited to announce that LIMO has been accepted by COLM2025 @COLM_conf ! We'll be releasing an updated paper soon with detailed data construction processes and a new version of dataset - smaller in size but with better performance. Stay tuned!

YYixin Ye@BLeavesYe · Feb 6

🤔 How many examples does an LLM need to learn competition-level math? Conventional wisdom: 100,000+ examples Our discovery: Just 817 carefully chosen ones 🤩 With pure SFT, LIMO achieves: 57.1% on AIME 94.8% on MATH LIMO: Less is More for Reasoning 📝 🔗 arxiv.org/pdf/2502.03387

993

Ethan Chern@ethanchern · Jul 8

FacTool has been accepted to COLM 2025 - two years after its arXiv debut! While the landscape of LLMs has changed a lot since then, tool-augmented LLMs and RAG are still among the most effective and practical approaches for detecting / mitigating hallucinations (ref:…

EEthan Chern@ethanchern · Jul 27, 2023

In the era of 🤖#GenerativeAI, text of all forms can be generated by LLMs. How can we identify and rectify *factual errors* in the generated output? We introduce FacTool, a framework for factuality detection in Generative AI. Website: ethanc111.github.io/factool_websit… (1/n)

2.0K

Ethan Chern Retweeted

Zhaochen Su@SuZhaochen0110 · Jul 2

Excited to share our new survey on the reasoning paradigm shift from "Think with Text" to "Think with Image"! 🧠🖼️ Our work offers a roadmap for more powerful & aligned AI. 🚀 📜 Paper: arxiv.org/pdf/2506.23918 ⭐ GitHub (400+🌟): github.com/zhaochen0110/A…

161

15.0K

Ethan Chern Retweeted

AK@_akhaliq · Jun 26

OctoThinker Mid-training Incentivizes Reinforcement Learning Scaling

146

19.0K

Ethan Chern Retweeted

Zengzhi Wang@SinclairWang1 · Jun 26

What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?…

506

476

89.0K

Ethan Chern@ethanchern · Jun 26

🐙Octothinker tech report is finally out! We also release the 70B math-focusing mid-training dataset -- MegaMath-Web-Pro-Max. Hope you'll find it useful!🤗 👇 hf.co/datasets/OctoT… huggingface.co/papers/2506.20…

FFan Zhou@FaZhou_998 · Apr 24

Say hi to 🐙 OctoThinker — our new mid-training efforts for building strong reasoning base models tailored for the RL scaling era. Still a WIP, but we're excited to share our early insights into rethinking base model development. 📖 Blog: tinyurl.com/OctoThinker 🤗 Huggingface:…

8.0K

Ethan Chern Retweeted

Zengzhi Wang@SinclairWang1 · Jun 26

Say hi to 🔮MegaMath-Pro-Max High-quality corpora are vital for mid-training. When it comes to the math domain? Let me tell you the behind recipe. 1. Curating Pipeline Step 1: uniformly and randomly sample millions of documents from the MegaMath-Web corpus, stratified by…

5.0K

Ethan Chern Retweeted

Zengzhi Wang@SinclairWang1 · Jun 26

The technical report comes (also along with MegaMath-Web-Pro-Max! Paper: arxiv.org/abs/2506.20512 Data: huggingface.co/datasets/OctoT… (uploading...

470

Ethan Chern Retweeted

Tanishq Abraham is at ICML@iScienceLuvr · Jun 26

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling "we investigate how mid-training strategies shape RL dynamics, focusing on two representative model families: Qwen and Llama." "we introduce a two-stage mid-training strategy, Stable-then-Decay, in which base…

332

201

21.0K

Ethan Chern@ethanchern · Jun 18

Finally had a bit of time to jot down some thoughts on this solid, open data engineering work from @essential_ai. This work brings Essential-Web, a 24T-token pre-training corpus, to the open-source community. I've always appreciated open-source research, as it can significantly…

EEssential AI@essential_ai · Jun 18

[1/5] 🚀 Meet Essential-Web v1.0, a 24-trillion-token pre-training dataset with rich metadata built to effortlessly curate high-performing datasets across domains and use cases!

10.0K

Ethan Chern@ethanchern · May 30

The real breakthrough isn't better AI—it's breaking free from nature's constraints We're witnessing a paradigm shift from "passive adaptation" to "active construction" in AI training. 🌊 The old way: AI learns from whatever data naturally exists • Constrained by existing…

AAdina Yakup@AdinaYakup · May 30

📑Interesting paper by GAIR community Thinking with Generated Images🔥 enables a single large multimodal model to generate and reason with visual thoughts, greatly improving its ability to tackle complex vision and multimodal tasks. huggingface.co/papers/2505.22……

978

Ethan Chern Retweeted

Adina Yakup@AdinaYakup · May 30

2.0K

Ethan Chern Retweeted

Sheikh Shafayat@shafayat_sheikh · May 29

Check out our latest work on self-improving LLMs, where we try to see if LLMs can utilize their internal self consistency as a reward signal to bootstrap itself using RL. TL;DR: it can, to some extent, but then ends up reward hacking the self-consistency objective. We try to see…

143

11.0K