Yike Wang

@yikewang_

PhD student @uwcse @uwnlp | BA, MS @berkeley_ai

Joined June 2022

423Following

410Followers

Pinned

Yike Wang@yikewang_ · Jun 9

LLMs are helpful for scientific research — but will they continuously be helpful? Introducing 🔍ScienceMeter: current knowledge update methods enable 86% preservation of prior scientific knowledge, 72% acquisition of new, and 38%+ projection of future (arxiv.org/abs/2505.24302).

yikewang_'s tweet image. LLMs are helpful for scientific research — but will they continuously be helpful?

Introducing 🔍ScienceMeter: current knowledge update methods enable 86% preservation of prior scientific knowledge, 72% acquisition of new, and 38%+ projection of future (arxiv.org/abs/2505.24302).

241

127

23.0K

Yike Wang Retweeted

Hailey Joren@HaileyJoren · Jul 16

PhD in Computer Science, University of California San Diego 🎓 My research focused on uncertainty and safety in AI systems, including 🤷‍♀️letting models say "I don't know" under uncertainty 🔎understanding and reducing hallucinations 🔁 methods for answering "how much will…

631

36.0K

Yike Wang Retweeted

smitha milli@SmithaMilli · Jul 16

Today we're releasing Community Alignment - the largest open-source dataset of human preferences for LLMs, containing ~200k comparisons from >3000 annotators in 5 countries / languages! There was a lot of research that went into this... 🧵

311

183

31.0K

Yike Wang Retweeted

Junhao Chen@Cumquaaa · Jun 30

🚀 Training an image generation model and picking sides between autoregressive (AR) and diffusion? Why not both? Check out MADFormer with half of the model layers for AR and half for diffusion. AR gives a fast guess for the next patch prediction while diffusion helps refine the…

6.0K

Yike Wang Retweeted

Scott Geng@scottgeng00 · Jul 9

🤔 How do we train AI models that surpass their teachers? 🚨 In #COLM2025: ✨Delta learning ✨makes LLM post-training cheap and easy – with only weak data, we beat open 8B SOTA 🤯 The secret? Learn from the *differences* in weak data pairs! 📜 arxiv.org/abs/2507.06187 🧵 below

161

110

21.0K

Yike Wang Retweeted

Oreva Ahia@orevaahia · Jul 11

🎉 We’re excited to introduce BLAB: Brutally Long Audio Bench, the first benchmark for evaluating long-form reasoning in audio LMs across 8 challenging tasks, using 833+ hours of Creative Commons audio. (avg length: 51 minutes).

164

13.0K

Yike Wang Retweeted

Ari Holtzman@universeinanegg · Jul 9

Prompting is our most successful tool for exploring LLMs, but the term evokes eye-rolls and grimaces from scientists. Why? Because prompting as scientific inquiry has become conflated with prompt engineering. This is holding us back. 🧵and new paper: arxiv.org/abs/2507.00163

159

114

13.0K

Yike Wang@yikewang_ · Jul 9

Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data…

AAi2@allen_ai · Jul 9

Introducing FlexOlmo, a new paradigm for language model training that enables the co-development of AI through data collaboration. 🧵

268

51.0K

Yike Wang Retweeted

Valentina Pyatkin@valentina__py · Jul 3

💡Beyond math/code, instruction following with verifiable constraints is suitable to be learned with RLVR. But the set of constraints and verifier functions is limited and most models overfit on IFEval. We introduce IFBench to measure model generalization to unseen constraints.

353

183

46.0K

Yike Wang Retweeted

Ai2@allen_ai · Jul 1

Introducing SciArena, a platform for benchmarking models across scientific literature tasks. Inspired by Chatbot Arena, SciArena applies a crowdsourced LLM evaluation approach to the scientific domain. 🧵

408

210

69.0K

Yike Wang Retweeted

Jason Wei@_jasonwei · Jun 30

We don’t have AI self-improves yet, and when we do it will be a game-changer. With more wisdom now compared to the GPT-4 days, it's obvious that it will not be a “fast takeoff”, but rather extremely gradual across many years, probably a decade. The first thing to know is that…

166

1.0K

695

365.0K

Yike Wang Retweeted

CLS@ChengleiSi · Jun 30

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

169

595

202

138.0K

Yike Wang Retweeted

Hao Xu@xuhaoxh · Jun 17

Wanna 🔎 inside Internet-scale LLM training data w/o spending 💰💰💰? Introducing infini-gram mini, an exact-match search engine with 14x less storage req than the OG infini-gram 😎 We make 45.6 TB of text searchable. Read on to find our Web Interface, API, and more. (1/n) ⬇️

20.0K

Yike Wang Retweeted

Jihan Yao@jihan_yao · Jun 4

We introduce MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation ✅ Reliable: 94.3% agreement with human judgment ✅ Comprehensive: 4 modality combination × 49 tasks × 937 instructions 🔍Results and Takeaways: > GPT-Image-1 from @OpenAI…

12.0K