Omar Khattab

@lateinteraction

Asst professor @MIT EECS & CSAIL (@nlp_mit). Author of http://ColBERT.ai and http://DSPy.ai (@DSPyOSS). Prev: CS PhD @StanfordNLP. Research @Databricks.

Cambridge, MA

Joined December 2022

3KFollowing

23KFollowers

Pinned

Omar Khattab@lateinteraction · May 11

DSPy's biggest strength is also the reason it can admittedly be hard to wrap your head around it. It's basically say: LLMs & their methods will continue to improve but not equally in every axis, so: - What's the smallest set of fundamental abstractions that allow you to build…

DDSPy@DSPyOSS · May 11

Is this guy talking about DSPy?

130

895

1.0K

300.0K

Pinned

Omar Khattab Retweeted

Andrew Lampinen@AndrewLampinen · Jul 21

In this vein, the IMO results show how deep learning that uses symbolic tools (either in learning or solving) can achieve performance competitive with elite humans even in these formal domains.

9.0K

Omar Khattab@lateinteraction · 15 h

The best research questions arise from engaging hands-on with working systems & experiencing the issues; abstracting them into well-defined technical problems, and only then thinking about solutions! P.S. Most high value work in industry actually involves smart "data cleaning."

LLuke Heeney@heeney_luke · Jul 18

Academia must be the only industry where extremely high-skilled PhD students spend much of their time doing low value work (like data cleaning). A 1st year management consultant outsources this immediately. Imagine the productivity gains if PhDs could focus on thinking

7.0K

Omar Khattab@lateinteraction · 17 h

Great minds think alike! 👀🧠 We also found that more thinking ≠ better reasoning. In our recent paper (arxiv.org/abs/2506.04210), we show how output variance creates the illusion of improvement—when in fact, it can hurt precision. Naïve test-time scaling needs a rethink. 👇…

AAryo Pradipta Gema@aryopg · 23 h

New Anthropic Research: “Inverse Scaling in Test-Time Compute” We found cases where longer reasoning leads to lower accuracy. Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. 🧵

9.0K

Omar Khattab@lateinteraction · Jul 22

Thanks @lateinteraction ! Every time I think about the gazillion prompt / systems engineering tweaks that also go into making an AI system work I think about how early you were with @DSPyOSS :) Shared theme: find the key human input and make it programmatic.

OOmar Khattab@lateinteraction · Jul 21

Every time I think about what it takes to systematically organize the gazillion training tasks that together make a great foundation model, my appreciation for how early @SnorkelAI was increases.

6.0K

Omar Khattab Retweeted

Prashanth Rao@tech_optimist · Jul 22

The code is actually super nice to read too. Having great abstractions truly does make a big difference to usability. Also, I highly recommend reading the DSPy docs end-to-end. It makes learning and using it SO much nicer, because it all "clicks".

2.0K

Omar Khattab Retweeted

Prashanth Rao@tech_optimist · Jul 22

Day 1 of using @DSPyOSS, and it's amazing 🚀. It's indeed remarkable how simple its API is to relate with - all you need to wrap your head around is the idea of signatures & modules (don't worry about optimizers to start with). And knowledge of Pydantic helps. It's a breeze!

3.0K

Omar Khattab Retweeted

Arvind Narayanan@random_walker · Jul 21

Back in grad school, when I realized how the “marketplace of ideas” actually works, it felt like I’d found the cheat codes to a research career. Today, this is the most important stuff I teach students, more than anything related to the substance of our research. A quick…

408

511

45.0K

Omar Khattab Retweeted

imit (research preview)@imitationlearn · Jul 21

oh shit nice, glad i get this better now than when you og posted lol

2.0K

Omar Khattab Retweeted

Omar Khattab@lateinteraction · Jan 16

This original RLHF process by itself is not enough to define rewards for most types of tasks. It's the one that bakes in the fewest assumptions, but it's not enough for math, coding, factual knowledge, instruction-following, etc where you can design better reward functions.

8.0K