Delip Rao e/σ
@deliprao
Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈
Engineering faculty will not admit, but this is more or less true in computer science programs at most schools. Most courses today are creating ‘busy work’ and evaluating students on that, in exchange for reputation signals. Academics will not acknowledge this as it would require…

An important caveat is they provide problem-specific hints like “use combinatorics for this problem”.
This is False. This is a separate group from UCLA that prompted Gemini 2.5 Pro and is completely unrelated to the official Google attempt which used an unreleased Gemini Deep Think system. This result also has a ton of caveats which I think deserve scrutiny
models are learning from mistakes and, increasingly, from doing, while human learning is increasingly "by theory" (consuming model outputs). something to ponder.
How much you learn:
A Palestinian mother in Gaza left in search of food for her starving children. When she returned, she found their home reduced to rubble. Israel had bombed it, killing every one of them. Israel bombed two homes killing at least 10 children.
A Palestinian mother in Gaza left in search of food for her starving children. When she returned, she found their home reduced to rubble. Israel had bombed it, killing every one of them. Israel bombed two homes killing at least 10 children.
There will be no other terminal UX agent better than this. It’s like Michelangelo sculpting your garden sculpture. Go kiss the toad!
The Toad is out of the bag! 🛍🐸 Announcing Toad - a universal UI for agentic coding in the terminal willmcgugan.github.io/announcing-toa…
models like Kimi, DeepSeek and Qwen will cost the closed AI labs BILLIONS of dollars. that's why nobody is talking about them. despite these LLMs absolutely crushing all of the benchmarks. Claude 4 Opus is literally *100x* more expensive than Kimi K2 yet both models have…
If you give your AI model a French name, it is perhaps not surprising it will be offline 20% of the year.
Claude going down is the new normal now. @AnthropicAI #claude
Release the hostages. Until then, starve away. (This is all a lie anyway. It amazes me that the media continues to regurgitate Muslim terror propaganda.)
Release the hostages. Until then, starve away. (This is all a lie anyway. It amazes me that the media continues to regurgitate Muslim terror propaganda.)
Super dumb take from ICML. Adding a subversive prompt to deter robot reviews is morally no different than using a fake email address to deter spammers. If authors don’t resort to subversive measures, reviewers will have no incentive not to use LLMs for reviewing.

Companies are using fake humans and AI to do interviews now...
person with the longest update during standups has the weakest update
Anthropic just released a research paper. Inverse Scaling in Test-Time Compute This study shows that longer reasoning in Large Reasoning Models (LRMs) can hurt performance—revealing a surprising inverse scaling between reasoning length and accuracy. According to this paper,…
if you poke a (inverted) bowl-shaped jello, it will wiggle in a certain way to eventually go back to being bowl-shaped. if something behaves this way consistently, it is undoubtedly a bowl-shaped jello.
Stein’s lemma for a scalar variable is a well-known characterization of the Gaussian: x ∼ 𝒩(m,σ²) ⇔ ∀f 𝔼[(x−m)f(x)] = σ² 𝔼[f'(x)] Interestingly, there’s also a multivariate version, but it’s 2nd order: x ∼ 𝒩(m, σ² I) ⇔ ∀f 𝔼[(x−m)ᵀ∇f(x)] = σ² 𝔼[∇² f(x)] 1/2
A decade ago, @RichardSocher gave the "don't be a hero" advice on choosing hyperparams for DL model training. Today's version of "don't be a hero" is picking the right LLM for the task at hand. If people you trust have said some model works for a task, use it.
"Roomba of coding" is such an apt analogy for all current-generation coding agent tools.
openai codex is like the roomba of coding
ok this video is funny ... how many memes can you recognize in this? :) On a serious note, "agents that work" seems like a pipe dream until it is not. I hope these folks will nail it.
Agents aren’t reliable. They don’t learn from experience. At @composiohq, we provide skills that evolve with your agents @lightspeedvp gave us $25M to make agents usable
Nothing routine about this either. Performed at a fraction of compute.
Another AI system, ByteDance's SeedProver solved 4 out of 6 IMO problems *with* Lean, and solved a fifth with extended compute. This is becoming routine, like when we went to the moon for the fourth time. There is *nothing* "routine" about this!!...
not a ding or a compliment, but OH codes like a grad student @gneubig :)

Factorize prime numbers to cross doors before the time runs out! Love it 😍