Suwako — e/acc
@suwakopro
🚀 e/acc 技术乐观主义者 | 💻 静态强类型类型爱好者 | 🎲 发言随机生成 | 📊 实证分析大于规范分析 | 🌐 水平思考 | @[email protected] | https://bsky.app/profile/suwakopro.bsky.social
What does getting a high humanity’s last exam score mean if this is the case lol
HLE has recently become the benchmark to beat for frontier agents. We @FutureHouseSF took a closer look at the chem and bio questions and found about 30% of them are likely invalid based on our analysis and third-party PhD evaluations. 1/7
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains 'We introduce Rubrics as Rewards (RaR), a framework that uses structured, checklist-style rubrics as interpretable reward signals for on-policy training with GRPO. Our best RaR method yields up to a relative…
这就是传说中的如何在不可验证的任务上做RL的方法吗? 不知道会不会容易训飞 但是人类也挺容易训飞的,会不会哪一天我们可以在LLM上搞出达克效应?
🚨New Paper!🚨 We trained reasoning LLMs to reason about what they don't know. o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more. Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --…
今天看到最令人感动的新闻!这个人训练并且开源了380+ sota的医学模型,当代华佗。 “医疗保健 AI 长期以来一直被付费墙禁锢。昂贵的许可证和有限的访问权限阻碍了创新。OpenMed 将改变这一现状,让先进的模型免费向所有人开放。不再有壁垒,只有进步!”
🚀 Big news in healthcare AI! I'm thrilled to announce the launch of OpenMed on @huggingface, releasing 380+ state-of-the-art medical NER models for free under Apache 2.0. And this is just the beginning! 🧵
Gemini has unlocked a new capability: conversational image segmentation 🖼️ This enables new use cases that were previously not possible, furthering Gemini’s SOTA image understanding capabilities! 🧵
amazing
🚀Introducing Hierarchical Reasoning Model🧠🤖 Inspired by brain's hierarchical processing, HRM delivers unprecedented reasoning power on complex tasks like ARC-AGI and expert-level Sudoku using just 1k examples, no pretraining or CoT! Unlock next AI breakthrough with…
A model that will be released is a good model.
Our IMO gold model is not just an "experimental reasoning" model. It is way more general purpose than anyone would have expected. This general deep think model is going to be shipped so stay tuned! 🔥
> "While our approach this year was based purely on natural language with Gemini, we also continue making progress on our formal systems, AlphaGeometry and AlphaProof. We believe agents that combine natural language fluency with rigorous reasoning - including verified reasoning…
An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵
Official results are in - Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced version was able to solve 5 out of 6 problems. Incredible progress - huge congrats to @lmthang and the team! deepmind.google/discover/blog/…
数学就是建立在形式系统之上的,和有没有用lean做自动化形式证明没有关系。 lean是建立在类型论上,传统数学是建立在集合论上,两者等价,但是都是形式系统
Ilya be getting blindsided ALL THE TIME
Also, according to the same report, Daniel Gross wanted Ilya to sell Safe Superintelligence Inc to META.
Scientists from the Hong Kong University of Science and Technology have developed an innovative AI model that can create 3D images of patients’ bones and organs in less than a minute, much faster than conventional approaches, significantly cutting radiation exposure by up to 99%.
感觉是不是很像最近很火的高熵token?思路类似。
"experts" for harder tokens? "Mixture-of-Recursions (MoR): Learning Dynamic Recursive Depths for Adaptive Token-Level Computation" MoR makes one shared Transformer block loop only for tokens that need extra thought, delivering quality with half the weights & twice the speed