Tamay Besiroglu
@tamaybes
Working to fully automate software engineering @MechanizeWork
“life, liberty & the preservation of happiness” → “life, liberty & the pursuit of happiness” I like this edit because it shifts emphasis from safeguarding ("preservation," cautious, European) to striving ("pursuit," distinctly American, growth-oriented).
Thomas Jefferson’s rough draft copy of the Declaration of Independence
If you give your AI model a French name, it is perhaps not surprising it will be offline 20% of the year.
Claude going down is the new normal now. @AnthropicAI #claude
Many software engineers want to move into AI but think they need to learn ML first. We're offering researcher-level pay with zero AI background required.
We're hiring software engineers. $500k base. x.com/i/jobs/1919892…
insane team. insane work. insane trajectory.
We're hiring software engineers. $500k base. x.com/i/jobs/1919892…
Reproducing the Chinchilla paper by parsing the svg files in the paper was a fun project. Too bad Google never issued an erratum or followed through on their promise to release the data.
The Chinchilla scaling paper by Hoffmann et al. has been highly influential in the language modeling community. We tried to replicate a key part of their work and discovered discrepancies. Here's what we found. (1/9)
i disagree. which people were actually harmed by "mecha-hitler"? what is the standard you're using here for "doing better"? imo xai is doing better because they released a nice product earlier
My guess is that big tech companies increasingly opting to poach key personnel without acquiring the whole startup is driven by antitrust concerns. If true, this means that antitrust regulation adds meaningful equity risk for startup employees, which is unfortunate.
Welcome Windsurf to this list of totally serious independent companies
Especially pertinent blog post now that Grok 4 supposedly increased RL compute to the level of pretraining compute without deriving any overwhelming increases in performance as a result.
Despite being trained on more compute than GPT-3, AlphaGo Zero could only play Go, while GPT-3 could write essays, code, translate languages, and assist with countless other tasks. That gap shows that what you train on matters. Rich RL environments are now the bottleneck.
I appreciate the 2027 team paying bounties for finding mistakes in their model. But what's far more important is catching these errors yourself before widely promoting your work on big platforms like the NYT, Dwarkesh, etc.
Great reaction to criticism from the AI 2027 team - not only a graceful response but also "a $500 bounty to represent our appreciation".
The Big Beautiful Bill lets U.S. hyperscalers and AI labs fully expense GPUs and training upfront, likely providing tens of billions in subsidies for compute through Trump’s term. Surprised this isn't getting more attention; journalists frame the bill as bad for AI somehow?
Under the big, beautiful bill, AI training compute expenses qualify as R&D, making them immediately deductible in full during the year they're incurred.
If your interviewing and vetting process for early hires doesn’t catch frauds, then you’re probably doing something wrong. Get references, fly them out, reconstruct their past work on the Wayback Machine to see how productive they are, etc.
You know you've had an impact when leadership decides the only path forward is a clean-room team reboot.
how come Yann Lecun is not part of the superintelligence team.
Economists really don't like to entertain the possibility that AI might actually generate new knowledge, and instead insist on dubious claims like it can only "interpolate between known points of knowledge."

How much do we need to scale RL to enable its GPT-3 moment? We expect we'll soon need roughly 10,000 years of human-equivalent task time, comparable to the cumulative effort behind major projects like GTA V or Windows 2008.
Before GPT-3, achieving good performance required specialized fine-tuning for each task. Today's RL is similar: models need to be carefully trained to handle tasks like deep research, web search, or coding. But we think RL will soon have its GPT-3 moment.
Before GPT-3, achieving good performance required specialized fine-tuning for each task. Today's RL is similar: models need to be carefully trained to handle tasks like deep research, web search, or coding. But we think RL will soon have its GPT-3 moment.