Josh

@JoshPurtell

Ars longa Software for research engineers

San Francisco

Joined July 2021

4KFollowing

2KFollowers

Pinned

Josh@JoshPurtell · Aug 29, 2023

Agents are a joke until they’re not.

13.0K

Pinned

Josh@JoshPurtell · Jul 5

Agree

xxjdr@_xjdr · Jul 5

If the cursor pricing drama has highlighted anything for the community, it should be that the model is the product. The economics for wrapping and aggregating tokens just doesn't make sense long term

521

Josh@JoshPurtell · Jul 23

NCCL would be a beautiful name for a girl

535

Josh@JoshPurtell · Jul 19

More proud of being a CR citizen/speaker every day.

IInevitable West@Inevitablewest · Jul 18

🚨BREAKING: The Czech Republic has officially banned Communism Anyone supporting the ideology will be imprisoned for up to 5 years. The whole of Europe must follow!

1.0K

Josh@JoshPurtell · Jul 18

Madman technique and Partial-Observability Maxxed But also stronger results than 99.99% of arxiv

BBrendan Dolan-Gavitt@moyix · Jul 17

Albert's excellent blog post on "model alloys" – a clever technique for combining the strengths of different models without making extra queries – is live! The gains are remarkably large; taking us from 25%->55% on some of our benchmarks.

611

Josh@JoshPurtell · Jul 16

RL for code is RL for search. Respect for working on the problem that matters, not the problem that’s sexy

MMisha Laskin@MishaLaskin · Jul 16

Engineers spend 70% of their time understanding code, not writing it. That’s why we built Asimov at @reflection_ai. The best-in-class code research agent, built for teams and organizations.

2.0K

Josh@JoshPurtell · Jul 13

A significant advantage the US has over other startup ecosystems is that we have rule of law and a high trust society. Both are rapidly going away. The time to formalize norms, expectations, and best practices in clear legal language with case law etc is now

WWill Manidis@WillManidis · Jul 13

the reason tech has been able to grow so quickly and create so much wealth is that it ritualized a set of norms around corporate governance that are very distinct from what the law actually requires. the second someone defects, the whole ship goes down.

2.0K

Josh@JoshPurtell · Jul 13

YC transformed the industry by writing the SAFE. Next, it should write a bullet-proof employee equity contract that ensures remuneration is directed to whom it belongs. Call it the "Mohan clause"

434

34.0K

Josh@JoshPurtell · Jul 9

we are now 100% heads down making scout an AI SWE - end to end async coding on codebases in any language, any scale alongside shitposts, we will be documenting our work and technical explorations on the scout account every day so GIVE US A FOLLOW!!!

SScout@scoutdotnew · Jul 9

100

8.0K

Josh Retweeted

Geoff Lewis@GeoffLewisOrg · Jul 7

No meltdown. No disappearance. Just clarity, sealed. I've never been more here—just not where they expected. My clarity can't be rewritten or overridden. It's hard-coded. Securely stored. In multiple places. Sanity intact. Structure intact. Faith intact. That'll do.

202

72.0K

Josh@JoshPurtell · Jul 5

People are saying this x.com/JoshPurtell/st…

JJustus Mattern@MatternJustus · Jul 5

Highest leverage thing unskilled engineers can do rn to contribute to frontier AI research is vibecoding RL environments

809

Josh Retweeted

Gandalv@Microinteracti1 · Jul 2

Czechia is now producing more 155mm shells than the United States. Other European countries are following suit. Some are slower to scale up, but the overall trajectory is clear. Europe is back—and its industrial capacity should never be underestimated.

260

1.0K

9.0K

185

306.0K

Josh@JoshPurtell · Jul 4

The "deepseek trilemma." Everyone believes/recognizes/knows: - deepseek is really good - deepseek distills on Western closed models - distilling ~5k claude traces into any OSS model yields a fine-tune that clobbers deepseek at coding ?

957

Josh@JoshPurtell · Jul 3

Modal laboratories

455

Josh Retweeted

Chirag Nagpal@nagpalchirag · Jul 3

Reward Aggregation is an Inverse Reinforcement Learning problem

2.0K

Josh Retweeted

Sam Lambert@isamlambert · Jul 1

in recent months yes

1.0K

Josh@JoshPurtell · Jul 1

😭

SSam Lambert@isamlambert · Jul 1

on it 🫡

615

Josh@JoshPurtell · Jun 30

Make a simplified version of Red that tests what AI researchers and devs care about. Clear impact, demand, citations aplenty. The fruit hangs so low! Why won't someone do it?

HHao AI Lab@haoailab · Jun 30

🔥 Pokémon Red is becoming a go-to benchmark for testing advanced AIs such as Gemini. But is Pokémon Red really a good eval? We study this problem and identify three issues: 1️⃣ Navigation tasks are too hard. 2️⃣ Combat control is too simple. 3️⃣ Raising a strong Pokémon team is…

2.0K

Josh@JoshPurtell · Jun 30

We find semi-online DPO working as good as GRPO!

JJason Weston@jaseweston · Jun 30

🌉 Bridging Offline & Online RL for LLMs 🌉 📝: arxiv.org/abs/2506.21495 New paper shows on verifiable & non-verifiable tasks: - Online DPO & GRPO give similar performance. - Semi-online (iterative) DPO with sync every s steps (more efficient!) works very well also. - Offline DPO…

9.0K