andrea panizza

@unsorsodicorda

Data Scientist, aerospace engineer, trekking & comics lover, applying #MachineLearning #DeepLearning Statistics to Industrial Applications.

Firenze, Toscana

Joined March 2013

352Following

2KFollowers

Pinned

andrea panizza Retweeted

Andrew Lampinen@AndrewLampinen · Jul 21

Quick thread on the recent IMO results and the relationship between symbol manipulation, reasoning, and intelligence in machines and humans:

553

457

92.0K

andrea panizza Retweeted

InternLM@intern_lm · 5 h

🚀Introducing Intern-S1, our most advanced open-source multimodal reasoning model yet! 🥳Strong general-task capabilities + SOTA performance on scientific tasks, rivaling leading closed-source commercial models. 🥰Built upon a 235B MoE language model and a 6B Vision encoder.…

389

160

34.0K

andrea panizza Retweeted

Chujie Zheng@ChujieZheng · Jul 25

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…

159

1.0K

755

94.0K

andrea panizza Retweeted

Alexandr Wang@alexandr_wang · 19 h

We are excited to announce that @shengjia_zhao will be the Chief Scientist of Meta Superintelligence Labs! Shengjia is a brilliant scientist who most recently pioneered a new scaling paradigm in his research. He will lead our scientific direction for our team. Let's go 🚀

277

350

6.0K

853

1.3M

andrea panizza Retweeted

Shengjia Zhao@shengjia_zhao · 19 h

I am very excited to take up the role of chief scientist for meta super-intelligence labs. Looking forward to building asi and aligning it to empower people with the amazing team here. Let’s build!

354

261

7.0K

499

515.0K

andrea panizza Retweeted

Garrison Lovely@GarrisonLovely · 22 h

Anthropic could be bankrupted within the next few months, thanks to last week's barely covered legal ruling, which exposes the AI startup to billions to hundreds of billions in damages for its use of pirated, copyright-protected works.

145

317

4.0K

1.0K

502.0K

andrea panizza Retweeted

Qwen@Alibaba_Qwen · Jul 25

🚀 We’re excited to introduce Qwen3-235B-A22B-Thinking-2507 — our most advanced reasoning model yet! Over the past 3 months, we’ve significantly scaled and enhanced the thinking capability of Qwen3, achieving: ✅ Improved performance in logical reasoning, math, science & coding…

173

576

4.0K

844

670.0K

andrea panizza@unsorsodicorda · Jul 25

Beautiful work. Just a little sad that no university from the country where Latin literally originated, with the second largest epigraphics db, and the most Latin epigraphs in general, collaborated with Google Deepmind on this super-cool project

DDemis Hassabis@demishassabis · Jul 23

Our Aeneas AI model gives historians valuable new insights into ancient inscriptions & ancient history that may have taken years to uncover otherwise. Published in @Nature today: deepmind.google/discover/blog/…

149

andrea panizza Retweeted

vik@vikhyatk · Jul 24

i do not support all tech bros

839

32.0K

andrea panizza Retweeted

clem 🤗@ClementDelangue · Jul 24

I'm notorious for turning down 99% of the hundreds of requests every months to join calls (because I hate calls!). The @huggingface team saw an opportunity and bullied me in accepting to do a zoom call with users who upgrade to pro. I only caved under one strict condition:…

171

28.0K

andrea panizza Retweeted

Yukang Chen@yukangchen_ · Jul 25

Here is the weight, thanks. huggingface.co/Efficient-Larg…

andrea panizza Retweeted

Andrew White 🐦‍⬛@andrewwhite01 · Jul 23

HLE has recently become the benchmark to beat for frontier agents. We @FutureHouseSF took a closer look at the chem and bio questions and found about 30% of them are likely invalid based on our analysis and third-party PhD evaluations. 1/7

588

173

118.0K

andrea panizza@unsorsodicorda · Jul 24

I'm surprised I've seen exactly zero tweets about Google Agentspace, Google's B2B agentic framework cloud.google.com/products/agent…

243

andrea panizza Retweeted

Qwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

269

1.0K

9.0K

4.0K

1.8M

andrea panizza@unsorsodicorda · Jul 22

Qwen3-Coder is now available in Cline 🧵 New 480B parameter model with 35B active parameters. > 256K context window > comparable performance on SWE-bench to Claude Sonnet 4 > SoTA among open source models

QQwen@Alibaba_Qwen · Jul 22

635

176

60.0K

andrea panizza Retweeted

Daniel Kang@daniel_d_kang · Jul 22

SWE-bench Verified is the gold standard for evaluating coding agents: 500 real-world issues + tests by OpenAI. Sounds bullet-proof? Not quite. We show passing its unit tests != matching ground truth. In our ACL paper, we fixed buggy evals: 24% of agents moved up or down the…

202

114

25.0K

andrea panizza Retweeted

Wenda Li@WendaLi8 · Jul 23

Lovely to see the impressive performance of the Seed Prover developed by the ByteDance Seed team at IMO 2025 — achieving a silver-level score (30 out of 42) within three days, and reaching (35 out of 42) with extended compute time. leanprover.zulipchat.com/#narrow/channe…

6.0K

andrea panizza Retweeted

Gautam Kamath@thegautamkamath · Jul 21

Everyone's talking about AI performance on the IMO. Let me highlight 🇨🇦Canadian 11th grader Warren Bei🇨🇦, one of five participants with a *perfect* 42/42. This is his *fifth* (and final) IMO representing Canada, with three golds and two silvers. (➡️ MIT undergrad in the fall)

2.0K

210

113.0K

andrea panizza Retweeted

Sophia Yang, Ph.D.@sophiamyang · Jul 22

The environmental footprint of training Mistral Large 2: as of January 2025, and after 18 months of usage, Large 2 generated the following impacts: - 20,4 ktCO₂e, - 281 000 m3 of water consumed, - and 660 kg Sb eq (standard unit for resource depletion). The marginal impacts of…

24.0K

andrea panizza@unsorsodicorda · Jul 22

Interesting approach! However, we looked at the proofs and methodology and we found a few problems, specifically with the use of hints given to the model. While the scaffold indeed improves performance, it does not solve all problems accurately and would not get a gold medal.🧵

LLin Yang@lyang36 · Jul 22

🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025

149

31.0K