Melanie Sclar

@melaniesclar

PhD student @uwnlp @uwcse | Visiting Researcher @AIatMeta FAIR | Prev. Lead ML Engineer @asapp, intern @LTIatCMU | 🇦🇷

Seattle, WA

Joined January 2011

508Following

2KFollowers

Pinned

Melanie Sclar@melaniesclar · Jan 11, 2024

Did you know that depending on the format used in few-shot prompting, you may get accuracies ranging 4%-88% for a given task w/LLaMA-2-70B 5-shot? or 47%-85% w/GPT3.5?🤯 We explore this variance in FormatSpread, or: How I learned to start worrying about prompt formatting. 1/n

melaniesclar's tweet image. Did you know that depending on the format used in few-shot prompting, you may get accuracies ranging 4%-88% for a given task w/LLaMA-2-70B 5-shot? or 47%-85% w/GPT3.5?🤯

We explore this variance in FormatSpread, or: How I learned to start worrying about prompt formatting.

1/n

145

745

559

188.0K

Melanie Sclar@melaniesclar · Jul 22

Check out our work on preference modeling through latent (& interpretable) attribute representation learning! PrefPalette allows you to understand _why_ something is preferred and _how_ preference varies depending on context 🎨

SStella Li ➡️ CogSci2025@StellaLisy · Jul 22

WHY do you prefer something over another? Reward models treat preference as a black-box😶‍🌫️but human brains🧠decompose decisions into hidden attributes We built the first system to mirror how people really make decisions in our #COLM2025 paper🎨PrefPalette✨ Why it matters👉🏻🧵

996

Melanie Sclar Retweeted

Stella Li ➡️ CogSci2025@StellaLisy · Jul 22

368

259

40.0K

Melanie Sclar Retweeted

Oreva Ahia@orevaahia · Jul 11

🎉 We’re excited to introduce BLAB: Brutally Long Audio Bench, the first benchmark for evaluating long-form reasoning in audio LMs across 8 challenging tasks, using 833+ hours of Creative Commons audio. (avg length: 51 minutes).

164

13.0K

Melanie Sclar@melaniesclar · Jul 9

Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data…

AAi2@allen_ai · Jul 9

Introducing FlexOlmo, a new paradigm for language model training that enables the co-development of AI through data collaboration. 🧵

269

52.0K

Melanie Sclar Retweeted

Thao Nguyen@thao_nguyen26 · Jun 23

Web data, the “fossil fuel of AI”, is being exhausted. What’s next?🤔 We propose Recycling the Web to break the data wall of pretraining via grounded synthetic data. It is more effective than standard data filtering methods, even with multi-epoch repeats! arxiv.org/abs/2506.04689

220

125

32.0K

Melanie Sclar Retweeted

Yizhong Wang@yizhongwyz · May 30

Thrilled to announce that I will be joining @UTAustin @UTCompSci as an assistant professor in fall 2026! I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! 🤠🤘

101

669

73.0K

Melanie Sclar Retweeted

Stella Li ➡️ CogSci2025@StellaLisy · May 27

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

344

2.0K

1.0K

681.0K

Melanie Sclar@melaniesclar · May 17

Dear NYC friends: just got here and will be around until Thu!

493

Melanie Sclar Retweeted

Hyunwoo Kim@hyunw_kim · May 14

📢I'm thrilled to announce that I’ll be joining @KAIST_AI as an Assistant Professor in 2026, leading the Computation & Cognition (COCO) Lab🤖🧠: coco-kaist.github.io We'll be exploring reasoning, learning w/ synthetic data, and social agents! +I'm spending a gap year @nvidia✨

339

26.0K

Melanie Sclar Retweeted

Wenting Zhao@wzhao_nlp · May 12

Excited to announce our workshop on Visions of Language Modeling at COLM'25! 🔥 We thought that current LM research overly focuses on a narrow set of popular topics (e.g., test-time scaling and LLM agents), and we'd love to bring some entropy back 💪 To do this, we invited a…

15.0K

Melanie Sclar Retweeted

ComputerUseAgents Workshop@workshopcua · May 6

We begin our speaker spotlights with Alane Suhr (@alsuhr), Assistant Professor at UC Berkeley and an invited speaker at the Workshop on Computer Use Agents at @icmlconf 2025! Her research centers on building systems that use language to interact with people, enabling agents to…

2.0K

Melanie Sclar@melaniesclar · May 4

Still around at #NAACL2025 ? I will be presenting a poster for the work 👇at the Workshop on Narrative Understanding in Tesuque, Albuquerque Convention Center from 2:30 pm. Please stop by if interested. Here is the poster, designed by the amazing @advaitmb.

KKabir@kabirahuja004 · Apr 18

📢 New Paper! Tired 😴 of reasoning benchmarks full of math & code? In our work we consider the problem of reasoning for plot holes in stories -- inconsistencies in a storyline that break the internal logic or rules of a story’s world 🌎 W/ @melaniesclar, and @tsvetshop 1/n

6.0K

Melanie Sclar Retweeted

Graham Neubig@gneubig · Apr 30

Now @abertsch72 is talking about in context learning with long context models! arxiv.org/abs/2405.00200

2.0K

Melanie Sclar Retweeted

Ximing Lu@GXiming · Apr 22

With the rise of R1, search seems out of fashion? We prove the opposite! 😎 Introducing Retro-Search 🌈: an MCTS-inspired search algorithm that RETROspectively revises R1’s reasoning traces to synthesize untaken, new reasoning paths that are better 💡, yet shorter in length ⚡️.

252

175

70.0K

Melanie Sclar@melaniesclar · Apr 19

Will be in Berkeley for the weekend, and off to #ICLR2025 in Singapore on Monday night to present CreativityIndex and ExploreToM! Please reach out if you'd like to meet: these days I'm most excited about reliable synthetic data generation for reasoning in ¬(math & code) domains

6.0K

Melanie Sclar@melaniesclar · Apr 19

See our work on procedurally generating challenging reasoning problems on detecting inconsistencies in stories! FlawedFictions is a great example what I'm most excited about: reliable synthetic data for reasoning in under-explored domains. (I'll be at ICLR to chat, DMs open!)

KKabir@kabirahuja004 · Apr 18

11.0K