Ege Erdil

@EgeErdil2

update your effect size estimates downwards

San Francisco, CA

Joined August 2021

340Following

6KFollowers

Pinned

Ege Erdil@EgeErdil2 · Apr 17

doing something new with @tamaybes and @MatthewJBar, check it out

MMechanize@MechanizeWork · Apr 17

Today we’re announcing Mechanize, a startup focused on developing virtual work environments, benchmarks, and training data that will enable the full automation of the economy. We will achieve this by creating simulated environments and evaluations that capture the full scope of…

11.0K

Ege Erdil Retweeted

Mechanize@MechanizeWork · Jul 20

We're hiring software engineers. $500k base. x.com/i/jobs/1919892…

284

185

84.0K

Ege Erdil Retweeted

Matthew Barnett@MatthewJBar · Jul 13

What's missing in the AI safety literature is a cost-benefit framework for evaluating when we should do more AI safety work vs. proceed with AI development. Indeed, one often finds an implicit assumption that we should ~always do more safety work, as if there are no tradeoffs.

102

14.0K

Ege Erdil@EgeErdil2 · Jul 9

language models confabulate a lot when you ask them how they work internally this should make people question how much introspective access they really have to what's happening in their own brains

109

6.0K

Ege Erdil@EgeErdil2 · Jun 29

my wild and speculative guess is that the people benefiting might be the people who in aggregate are paying openai $10B/year

MMario Guglielmetti@mario_gug · Jun 28

Who's actually benefiting from gen-AI? 🙄 "the benefits of AI seem esoteric and underwhelming, while the harms feel transformative and immediate" ⬇️ wired.com/story/generati…

218

20.0K

Ege Erdil@EgeErdil2 · Jun 28

that's just 3% of US military spending in that 11 day period hardly "massive"

OOpen Source Intel@Osint613 · Jun 27

In just 11 days, the U.S. burned through 15–20% of its entire global THAAD missile stockpile defending Israel from Iranian attacks at an estimated cost of $800 million. America stepped up, but the price was massive.

4.0K

Ege Erdil Retweeted

Mechanize@MechanizeWork · May 30

Imagine trying to train GPT-4 on just the text data available in 1980. This would be totally inadequate. In 2025, our situation in automating software engineering is similar: we simply lack the relevant data and environments.

216

48.0K

Ege Erdil Retweeted

Grant Slatton@GrantSlatton · May 17

Eval I use for shortform: "Write a story where the reader assumes the protagonist is human, but the twist ending is that they're actually a robot. The story should have very subtle hints about the truth that are only detectable upon re-reading" Current models are NOT subtle

3.0K

Ege Erdil@EgeErdil2 · Apr 19

imo this is not right the reasoning/agency RL is resulting in a lot of unreliability, hallucinations, reward hacking, etc. that will seriously impede consumer use cases if not addressed much of the cost of having an unsafe model is internalized for this reason alone

AAusten Allred@Austen · Apr 19

I genuinely believe Anthropic will end up losing the AI code model contest simply because they’re so obsessed with safety

272

33.0K

Ege Erdil Retweeted

Dwarkesh Patel@dwarkesh_sp · Apr 18

.@tamaybes on the value of twitter

486

175

47.0K