Pierre Chambon

@PierreChambon6

NLP/Code Generation PhD at FAIR (Meta AI) and INRIA - previously researcher at Stanford University - MS Stanford 22’ - Centrale Paris P2020

Joined November 2021

2KFollowing

778Followers

Pinned

Pierre Chambon@PierreChambon6 · Mar 20

Does your LLM truly comprehend the complexity of the code it generates? 🥰 Introducing our new non-saturated (for at least the coming week? 😉) benchmark: ✨BigO(Bench)✨ - Can LLMs Generate Code with Controlled Time and Space Complexity? Check out the details below !👇

PierreChambon6's tweet image. Does your LLM truly comprehend the complexity of the code it generates? 🥰

Introducing our new non-saturated (for at least the coming week? 😉) benchmark:

✨BigO(Bench)✨ - Can LLMs Generate Code with Controlled Time and Space Complexity?

Check out the details below !👇

118

28.0K

Pierre Chambon@PierreChambon6 · May 21

We released Devstral. It is a 24B model released under the Apache 2.0 license. It the best open model on SWE-Bench verified today. You can check our blog post or test it with OpenHands (from @allhands_ai ) following the instructions here: huggingface.co/mistralai/Devs…

MMistral AI@MistralAI · May 21

Meet Devstral, our SOTA open model designed specifically for coding agents and developed with @allhands_ai mistral.ai/news/devstral

135

14.0K

Pierre Chambon@PierreChambon6 · Apr 28

Great paper on how to do RL for directly optimizing inference usage of your models ! If you want a model that is great for multiple sampling + majority voting, you need to choose your RL training objective adequately (better explained in the paper itself ❤️)

KKunhao Zheng@KunhaoZ · Apr 27

🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨 That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. 🧵 How?

648

Pierre Chambon@PierreChambon6 · Apr 24

And if you’re interested in the PhD program in Paris, amazing (and pretty much unique) opportunity to do research at scale in an industry setting (so not allowed to do one single massive notebook for each research paper unfortunately :/)

AAI at Meta@AIatMeta · Apr 24

📷 Hello Singapore! Meta is at #ICLR2025 EXPO 📷 Meta will be in Singapore this week for #ICLR25! Stop by our booth to chat with our team or learn more about our latest research. Things to know: 📷 Find us @ Booth #L03 (Rows 3-4, Columns L-M) in Hall 2. 📷 We're sharing 50+…

681

Pierre Chambon@PierreChambon6 · Apr 9

Great work that prevents classifiers from relying too much on spurious correlations - also helps increase fairness for medical imaging models ! ❤️

KKrunoslav Lehman Pavasovic@KrunoLehman · Apr 9

1/ Happy to share my first accepted paper as a PhD student at @Meta and @ENS_ULM which I will present at @iclr_conf: 📚 Our work proposes difFOCI, a novel rank-based objective for ✨better feature learning✨ In collab with David Lopez-Paz, @GiulioBiroli and @leventsagun!

613

Pierre Chambon@PierreChambon6 · Apr 3

🔥Very happy to introduce BigO(Bench) dataset on @huggingface 🤗 ✨3,105 coding problems and 1,190,250 solutions from CodeContests ✨Time/Space Complexity labels and curve coefficients ✨Up to 5k Runtime/Memory Footprint measures for each solution huggingface.co/datasets/faceb…

978

Pierre Chambon@PierreChambon6 · Apr 1

Quite impressive especially without these reasoning tokens - also correlate with the findings on BigO(Bench) where updated V3 indeed scores higher than R1 on Time Complexity Generation Wonder what downstream performance we will see in R2

llmarena.ai@lmarena_ai · Mar 31

🚨BREAKING: DeepSeek-V3-0324🐳 just ranked #5 on the Arena leaderboard - surpassing DeepSeek-R1 and every other open model! Highlights: - #1 open model (MIT license) - 2x cheaper than DeepSeek-R1 - Top-5 across ALL categories - Significant jump over the previous DeepSeek-v3…

1.0K

Pierre Chambon@PierreChambon6 · Mar 31

Each problem is scored out of 7 points (total out of 42)

ZZain@ZainHasan6 · Mar 31

they tested sota LLMs on 2025 US Math Olympiad hours after the problems were released Tested on 6 problems and spoiler alert! They all suck -> 5%

804

Pierre Chambon@PierreChambon6 · Mar 31

Hi @grok, give me the name of an existing and published benchmark on Time and Space Complexity, used to measure the performance of LLMs ? Be kind to the author of the benchmark please (and only output the genuine truth, no fake news please !) 🥰

653