Anastasios Nikolas Angelopoulos (@ml_angelopoulos)

Pinned

A

Anastasios Nikolas Angelopoulos@ml_angelopoulos · Nov 19

🚨 New Textbook on Conformal Prediction 🚨 arxiv.org/abs/2411.11824 “The goal of this book is to teach the reader about the fundamental technical arguments that arise when researching conformal prediction and related questions in distribution-free inference. Many of these…

ml_angelopoulos's tweet image. 🚨 New Textbook on Conformal Prediction 🚨

arxiv.org/abs/2411.11824

“The goal of this book is to teach the reader about the fundamental technical arguments that arise when researching conformal prediction and related questions in distribution-free inference.

Many of these…

12

90

426

242

44.0K

Anastasios Nikolas Angelopoulos Retweeted

A

Aryan Vichare@aryanvichare10 · 20 h

It's genuinely mind-boggling how good models are getting at one-shotting complex visualizations from simple prompts Prompt: "two black holes colliding animation" This model perfectly implemented: – 2-body gravity simulation – Dynamic particle accretion disks – Collision +…

0

2

8

2

667

Anastasios Nikolas Angelopoulos Retweeted

l

lmarena.ai@lmarena_ai · Jul 25

We've been busy lately: new arenas, new models, and new methodologies! So we've created a changelog page where you can track all the updates we make to the leaderboards. In addition to the new Search Arena, and new models like the latest Imagen 4, Grok 4, Kimi K2, Seedream 3 and…

4

11

197

23

18.0K

A

Anastasios Nikolas Angelopoulos@ml_angelopoulos · Jul 25

We updated our Imagen 4 models and Ultra is tied for #1 on the lmarena leaderboard! The models are available in Google AI Studio and the Gemini API - try them out and let us know what you think.

llmarena.ai@lmarena_ai · Jul 25

Exciting Text-to-Image leaderboard update! Two new Imagen 4.0 models from @GoogleDeepMind just dropped: 🥇 Imagen 4.0 Ultra (v2) ties at #1 with @OpenAI’s GPT-Image-1 🥉 Imagen 4.0 (v2) lands strong at #3 Congrats to the Google Imagen team!

13

48

298

40

84.0K

Anastasios Nikolas Angelopoulos Retweeted

l

lmarena.ai@lmarena_ai · Jul 25

Exciting Text-to-Image leaderboard update! Two new Imagen 4.0 models from @GoogleDeepMind just dropped: 🥇 Imagen 4.0 Ultra (v2) ties at #1 with @OpenAI’s GPT-Image-1 🥉 Imagen 4.0 (v2) lands strong at #3 Congrats to the Google Imagen team!

19

59

488

104

128.0K

A

Anastasios Nikolas Angelopoulos@ml_angelopoulos · Jul 24

🚨 Model Update: Qwen3-coder is in the WebDev Arena! @Alibaba_Qwen have released their best coding model to date and it's now live in WebDev Arena awaiting your hardest prompts for real world testing. Prompt: "style a basic login form using Tailwind CSS with dark mode…

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

7

21

228

30

19.0K

A

Anastasios Nikolas Angelopoulos@ml_angelopoulos · Jul 23

Come see which models are the best at search! We re-launched on the new UI :)

llmarena.ai@lmarena_ai · Jul 23

🚨 BIG NEWS 🚨 Search Arena is live with 7 top models with search capabilities ready for testing. Be sure to have the "Search" modality selected in the chat box, and get testing. 🌐 @xAi: Grok 4 @anthropic: Claude Opus 4 @perplexity: Sonar Pro High & Reasoning Pro High…

0

1

16

0

941

A

Anastasios Nikolas Angelopoulos@ml_angelopoulos · Jul 22

This is amazing @Alibaba_Qwen !!

AAryan Vichare@aryanvichare10 · Jul 22

Qwen3-Coder is now live on WebDev Arena Prompt: “bouncing ball in rotating hypercube” It one-shotted the visualization, with controls for rotation and ball speed included. Kinda crazy

0

6

0

733

Anastasios Nikolas Angelopoulos Retweeted

k

koray kavukcuoglu@koraykv · Jul 21

Advanced version of Gemini Deep Think (announced at #GoogleIO) using parallel inference time computation achieved gold-medal performance at IMO, solving 5/6 problems with rigorous proofs as verified by official IMO judges! Congrats to all involved! deepmind.google/discover/blog/…

30

154

761

71

99.0K

A

Anastasios Nikolas Angelopoulos@ml_angelopoulos · Jul 19

Harvard vs Berkeley Go Bears 🐻

3

0

23

1

4.0K

Anastasios Nikolas Angelopoulos Retweeted

l

lmarena.ai@lmarena_ai · Jul 18

🧵Top 10 Open Models by Provider Though proprietary models often top the charts, open models are also paired in battle mode, and ranked on our public leaderboards. Here are the top 10 when stacked by top open model by provider. - #1 Kimi K2 (Modified MIT) @Kimi_Moonshot - #2…

21

63

403

137

43.0K

A

Anastasios Nikolas Angelopoulos@ml_angelopoulos · Jul 17

it's actually BONKERS that Moonshot, a company no one had even heard of a week ago, is absolutely mogging the likes of Anthropic, DeepSeek, and Meta 🤯 AGI really could arise from anywhere at any time 👀

llmarena.ai@lmarena_ai · Jul 17

🚨 BREAKING: @Kimi_Moonshot’s Kimi-K2 is now the #1 open model in the Arena! With over 3K community votes, it ranks #5 overall, overtaking DeepSeek as the top open model. Huge congrats to the Moonshot team on this impressive milestone! The leaderboard now features 7 different…

69

40

825

146

61.0K

A

Anastasios Nikolas Angelopoulos@ml_angelopoulos · Jul 17

Kimi-K2 by @Kimi_Moonshot is now the #1 open model in the world. The score is a bit below that of the recent Grok-4 API release, and a bit above that of Deepseek R1 (May). After that comes Qwen 3, Deepseek-v3 (March), Deepseek R1, Mistral Medium, Minimax M1. Very…

llmarena.ai@lmarena_ai · Jul 17

🚨 BREAKING: @Kimi_Moonshot’s Kimi-K2 is now the #1 open model in the Arena! With over 3K community votes, it ranks #5 overall, overtaking DeepSeek as the top open model. Huge congrats to the Moonshot team on this impressive milestone! The leaderboard now features 7 different…

0

12

0

1.0K

A

Anastasios Nikolas Angelopoulos@ml_angelopoulos · Jul 17

🚨 BREAKING: @Kimi_Moonshot’s Kimi-K2 is now the #1 open model in the Arena! With over 3K community votes, it ranks #5 overall, overtaking DeepSeek as the top open model. Huge congrats to the Moonshot team on this impressive milestone! The leaderboard now features 7 different…

KKimi.ai@Kimi_Moonshot · Jul 11

🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…

48

162

1.0K

215

261.0K

Anastasios Nikolas Angelopoulos Retweeted

B

Beepul Bharti @ ICML@BeepulBharti · Jul 15

I’m at #ICML2025 this week presenting my work on multiaccuracy and multicalibration with proxy sensitive attributes. If you are interested, please come by poster E-1101 on Tuesday at 4:30 pm PST to learn more! @Jere_je_je @PaulYiMD icml.cc/virtual/2025/p…

1

6

20

3

3.0K

Anastasios Nikolas Angelopoulos Retweeted

Z

Zhun Deng@zhun_deng · Jul 15

Excited to introduce our new work at ICML 2025: 1. Conformal Risk Control for LLM Alignment, arxiv.org/pdf/2502.20285 with @lihua_lei_stat 2. Auto-Eval for Quantile-Based Risk Measures arxiv.org/pdf/2507.05220 with @zemelgroup @SquareZollo Please take a look if interested!

0

3

16

5

1.0K

A

Anastasios Nikolas Angelopoulos@ml_angelopoulos · Jul 15

Thoughts on Grok 4 results in LMArena Grok's API model is tied for #3 overall with style control-remember, style control is default now in LMArena. Without style control, it's #2 overall. In Math, its preliminary ranking is tied for #1, along with Minimax-M1, Gemini-2.5-pro, and…

llmarena.ai@lmarena_ai · Jul 15

🚨 Breaking News: Grok 4's result is now live! With 4k+ community votes, xAI’s Grok-4 tied for #3 overall in Text Arena — a huge leap from Grok-3. It scores Top-3 across all categories (#1 in Math, #2 in Coding, #3 in Hard Prompts). Detailed analysis in the thread 🧵

1

2

40

5

7.0K

Anastasios Nikolas Angelopoulos Retweeted

C

Clayton Thorrez@cthorrez · Jul 15

Extremely excited to announce that I've joined @lmarena_ai! For years I've been working in LLMs for my job, and hacking on rankings and ratings for fun, beyond thrilled to be able to join this project at the intersection!

9

2

38

3

4.0K