Mert Ünsal

@mertunsal2020

Leading agents @browser_use, prev. RL Research at @ProjectNumina @ETH_en

Zurich

Joined January 2018

667Following

658Followers

Pinned

Mert Ünsal@mertunsal2020 · Jul 21

still curious as to how @GoogleDeepMind and @OpenAI scale parallel inference while keeping the objective pass@1 for gold in IMO sampling many times is easy but having the samples interact is hard the idea should surface common research some time soon, similar to RLVR

Pinned

Mert Ünsal@mertunsal2020 · Jul 12

Running last evals 👀

BBrowser Use@browser_use · Jul 12

Would you like to have a @browser_use mode that works x2 faster? Shipping a flash mode soon ⚡️

224

Mert Ünsal@mertunsal2020 · 23 h

Some lessons we learned in @browser_use

MMagnus Müller@mamagnus00 · Jul 22

If you build AI agents 1 step consumes 10k input tokens with less than 250 output tokens. So if you optimize for - speed: reduce output tokens - cost: reduce input tokens - reliability: don't just put more into context (performance can drop), but build clever systems around…

195

Mert Ünsal@mertunsal2020 · Jul 21

I think OpenAI is speculating 300M offer as anchor so that if you get 5M, you don’t feel that good anymore

107

Mert Ünsal@mertunsal2020 · Jul 20

I love my cursor rules

269

Mert Ünsal@mertunsal2020 · Jul 18

How do I make Cursor less agreeable?

235

Mert Ünsal@mertunsal2020 · Jul 16

My CEO sends me his requests on X directly and we discuss them publicly. #buildinpublic

855

Mert Ünsal@mertunsal2020 · Jul 16

We compared Kimi K2 from @GroqInc with O3 from @OpenAI on @browser_use (K2 on top) K2 is lightning fast on @GroqInc ⚡️⚡️⚡️

253

157

48.0K

Mert Ünsal@mertunsal2020 · Jul 15

healthy working environment is when CEO is massaging the employees @browser_use

5.0K

Mert Ünsal@mertunsal2020 · Jul 15

10 years from now everyone will realize how overrated “human intelligence” is

vvitrupo@vitrupo · Jul 14

Eric Weinstein says we're more or less LLMs. Most of life runs on script: greetings, replies, small talk, all on pre-trained loops. That's why they mimic us so well. “We don't realize that intelligence is a last resort for us.”

433

Mert Ünsal@mertunsal2020 · Jul 14

pretty interesting idea to 1. run model 2. extract what went wrong as a lesson 3. add to system prompt the lesson on what went wrong to fix it 4. if fixed, finetune on it

208

Mert Ünsal@mertunsal2020 · Jul 14

vibe coding with a view

GGregor Zunic@gregpr07 · Jul 14

Live from @browser_use mountain hq

361

Mert Ünsal@mertunsal2020 · Jul 10

Some of our latest work in @ProjectNumina! 500 line long proofs are pretty cool to see :)

JJia Li@JiaLi52524397 · Jul 10

Happy to introduce Kimina-Prover-72B ! Reaching 92.2% on miniF2F using Test time RL. It can solve IMO problems using more than 500 lines of Lean 4 code ! Check our blog post here: huggingface.co/blog/AI-MO/kim… And play with our demo ! demo.projectnumina.ai

300

Mert Ünsal@mertunsal2020 · Jul 8

Takes 18 seconds only to confirm my bs

217

Mert Ünsal Retweeted

Project Numina@ProjectNumina · Jul 7

Hello World! 👋 We're thrilled to officially launch the X account for Numina, dedicated to advancing frontier AI in mathematics. Stay tuned for updates on our research, achievements, and the future of mathematical AI! #AI4Math #FormalMath #LeanProver #AutomatedReasoning…

1.0K

Mert Ünsal@mertunsal2020 · Jul 7

Unexpectedly GPT-4.5 is particularly better at creative discussions - I wonder what kind of post-training went into that

2.0K

Mert Ünsal@mertunsal2020 · Jul 5

Funny thing about AI research: the more you bring in your “smart human intuition,” the less your approach scales. Just let the machines do their thing.

402