Jacob Buckman
@jacobmbuckman
Founder @manifest__ai. PhD candidate @MILAMontreal. Formerly @jhuclsp, @GoogleAI, @SCSatCMU.
The "it's not real intelligence just pattern matching!!" crowd are just modern-day geocentrists. We're not special dude, get over it
Context is more than a length-measuring contest open.substack.com/pub/jacobbuckm…
Balance is the key. We show, in our latest work, that both quadratic attention and linear-attention-based architectures are not fit for long context jobs because they spend way too much of their flops budget on either state or weight.
The age of transformers is ending...the dawn of linear-cost architectures is upon us. Power Attention replaces Flash Attention in any transformer, and removes the quadratic penalty of context scaling while achieving strong performance. The result: domination of both transformers…
Excellent post from @dwarkesh_sp. I think you are spot-on with the diagnosis, and are quite close on what the solution will look like -- all but dancing around it. The main claim I disagree with is "...there’s no obvious way to slot in online, continuous learning into the kinds…
New blog post where I explain why I disagree with this, and why I have slightly longer timelines to AGI than many of my guests. I think continual learning is a huge bottleneck to the usefulness of these models, and extended computer use may take years to sort out. L-nk below.
"Which jobs might AI automate first?" I asked @jacobmbuckman. Jacob founded Manifest AI & is ex-Google Brain. "By the time FAANG employees are feeling stressed, everyone else will have already felt a lot of stress, and society will have changed somewhat and new jobs will…
AI Timelines: When will AI reach human-level in computer-use skills? I surveyed AI researchers and forecasters. I asked: by what quarter & year are you nearly certain (9-in-10 chance) that AI will reach human-level on the OSWorld computer-use benchmark? Surveyed: @francedot,…
7/ @jacobmbuckman CEO of @manifest__ai calls expanded context length the next major breakthrough in AI model architecture because transformers' costly quadratic scaling limits how much data they retain. He proposes "power attention," a sub-quadratic approach that he expects most…
DeepSeek-R1: What's the main takeaway & what should we expect next? I asked AI researchers and Jordan Schneider from ChinaTalk. FYI: Long post. Finbarr Timbers, @finbarrtimbers (Artfintel, former DeepMind) 1) What's the main takeaway: The biggest update that we should see is…
"A new digging technique has been developed which unearths 45x more gold per scoop. This is terrible news for the shovel salesmen" - everyone in tech, apparently