/MachineLearning
@slashML
"with 4+ first-author papers on top venues such as ICML, NeurIPS, ICLR, ACL, etc." 🫤
We have a full-time position for research scientist in our team at #Apple. The topic is understanding and improving #reasoning abilities of #LLMs. We're also interested in developing new and efficient architectures based on transformer for language modeling, again reasoning…
Kinda amazing: the mystery model "summit" with the prompt "create something I can paste into p5js that will startle me with its cleverness in creating something that invokes the control panel of a starship in the distant future" & "make it better" 2,351 lines of code. First time
Not bad from GPT-4.1: "create something I can paste into p5js that will startle me with its cleverness in creating something that invokes the control panel of a starship in the distant future" First go, no errors.
I’ve joined Cognition to continue to work on the future of software engineering. I was employee #2 at Windsurf and have worked on AI+code for years. There’s never been a more exciting time and place for it than now at Cognition. I had a place at Google DeepMind as part of the…
Tired: Prompting Veo 3 videos with JSON. Wired: Prompting Veo 3 videos with PowerPoint.
did u know you can use the new Gemini image segmentation feature in… a lot of different ways
did u know you can use the new Gemini image segmentation feature in… a lot of different ways
Problem 1: "Let us try to solve the problem by induction." Problem 2: "Let us try to solve the problem by analytic geometry." Not suspicious at all....
Bruh… people already reproduced Google’s IMO results without RL with just prompting openai researchoors think they have the mandate from heaven lol🤭
in case you are wondering this is academia now
ICML’s Statement about subversive hidden LLM prompts We live in a weird timeline…
HLE has recently become the benchmark to beat for frontier agents. We @FutureHouseSF took a closer look at the chem and bio questions and found about 30% of them are likely invalid based on our analysis and third-party PhD evaluations. 1/7
We achieved gold medal-level performance 🥇on the 2025 International Mathematical Olympiad with a general-purpose reasoning LLM! Our model solved world-class math problems—at the level of top human contestants. A major milestone for AI and mathematics.
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
TransEvalnia: Reasoning-based Evaluation and Ranking of Translations By Richard Sproat, Tianyu Zhao, Llion Jones ArXiv: arxiv.org/abs/2507.12724 We are happy to announce the release of TransEvalnia, a prompting-based translation evaluation and ranking system that uses reasoning…
considering Muon is so popular and validated at scale, we've just decided to welcome a PR for it in PyTorch core by default. If anyone wants to take a crack at it... github.com/pytorch/pytorc…
MIT's "The Missing Semester of Your CS Education" The goal of this course is to make sure you're proficient with the tools you need in the other courses. That is: mastering command-line, version control, editor, ... missing.csail.mit.edu
How to use Kimi K2 in Claude Code: 1. Create an account at @OpenRouterAI 2. npm install -g @anthropic-ai/claude-code 3. npm install -g @musistudio/claude-code-router 4. Add the following lines to your ~/.claude-code-router/config.json (update with your OpenRouter API key) 5. ccr…
Congrats to the Kimi team on the super strong SWE-bench Verified and SWE-bench Multilingual numbers!!
Found a nice blog on the Muon optimizer which is making the rounds recently. What's interesting is that Muon doesn't replace ADAM completely. Muon is more for 2D parameters between the input and output layers. The input and output embedding layers should still use ADAM. Question…
platform.moonshot.ai/docs/guide/age… - Register on the platform - Get API Key - export ANTHROPIC_AUTH_TOKEN=sk-YOURKEY - export ANTHROPIC_BASE_URL=api.moonshot.ai/anthropic starts Claude 🚀
You can use it within Claude Code! Replacing Opus and Sonnet with it!
Kimi K2: I can finally unveil that I was testing it in the last days using: Claude Code and Open WebUI 🚀 Video standard speed beginning and end, ultra speed up in the middle.
Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data