Tejal Patwardhan

@tejalpatwardhan

thinking hard about hard evals // @openai

Joined June 2014

607Following

4KFollowers

Pinned

Tejal Patwardhan@tejalpatwardhan · Dec 20

Are you prepared?

1.0K

111

108.0K

Tejal Patwardhan Retweeted

Tifa Chen@tifafafafa · Jul 20

Last night we IMO tonight we party

833

122

87.0K

Tejal Patwardhan@tejalpatwardhan · May 2

in the end, all problems are people problems

10.0K

Tejal Patwardhan@tejalpatwardhan · Apr 16

o3 is paying my rent

948

249

146.0K

Tejal Patwardhan@tejalpatwardhan · Mar 17

all aligned models look the same but every misaligned model is misaligned in its own way

3.0K

Tejal Patwardhan@tejalpatwardhan · Mar 1

the future ever-so-slowly does arrive

2.0K

Tejal Patwardhan@tejalpatwardhan · Feb 27

It was a *character-building* privilege to post-train GPT 4.5

SSam Altman@sama · Feb 27

GPT-4.5 is ready! good news: it is the first model that feels like talking to a thoughtful person to me. i have had several moments where i've sat back in my chair and been astonished at getting actually good advice from an AI. bad news: it is a giant, expensive model. we…

696

170.0K

Tejal Patwardhan@tejalpatwardhan · Feb 26

worth looking at the bio results.

LLuca Righetti@lucafrighetti · Feb 26

OpenAI and Anthropic *both* warn there's a sig. chance that their next models might hit ChemBio risk thresholds -- and are investing in safeguards to prepare. Kudos to OpenAI for consistently publishing these eval results, and great to see Anthropic now sharing a lot more too.

15.0K

Tejal Patwardhan@tejalpatwardhan · Feb 25

deep research system card is out! We included new SWE-Lancer results as part of our Preparedness evaluations. Deep research (with browsing) achieves SOTA on SWE-Lancer Diamond, earning $259K / $500K and solving 46% of IC SWE and 51% of SWE Manager tasks.

OOpenAI@OpenAI · Feb 25

We're also sharing the system card, detailing how we built deep research, assessed its capabilities and risks, and improved safety. openai.com/index/deep-res…

145

18.0K