Andrew Drozdov

@mrdrozdov

Context Engineering (and Science!) for Knowledge Assistant @ Databricks

NYC

Joined August 2010

2KFollowing

3KFollowers

Pinned

Andrew Drozdov@mrdrozdov · Jul 3

2.0K

Andrew Drozdov Retweeted

Sophia Simeng Han@HanSineng · Jul 25

Missing @aclmeeting but sending “ATEB: Rethinking Advanced NLP Tasks in an Information Retrieval Setting” in my place ! Come check it out at the Knowledgeable Foundation Models Workshop! Excited that our work is already influencing how embedding models are evaluated on…

2.0K

Andrew Drozdov@mrdrozdov · Jul 25

TREC RAG 2025 official retrieval baselines are available now! 💥💥💥 Time to start generating those answers and submit them to eval base before August 17th! 🗓️ Let the games begin, well you have less than a month remaining to submit! 🍻

TTREC RAG @ 2025@TREC_RAG · Jul 25

🚀 The official baselines and validation scripts for TREC RAG 2025 are now available! These include both retrieval results (for the AG task) and the corresponding end-to-end augmented generation outputs. Access the baselines and necessary scripts here: trec-rag.github.io/annoucements/2…

1.0K

Andrew Drozdov Retweeted

Taco Cohen@TacoCohen · Jul 23

What I look for when hiring? EXTREME PARANOIA about code and data

307

62.0K

Andrew Drozdov@mrdrozdov · Jul 24

This aligns with #2 in the proposals I described at the #SIGIR2025 panel on “LLMs and IR”. Really cool to see a whole team forming to tackle this effort! x.com/mrdrozdov/stat…

JJack Lindsey@Jack_W_Lindsey · Jul 23

We're launching an "AI psychiatry" team as part of interpretability efforts at Anthropic! We'll be researching phenomena like model personas, motivations, and situational awareness, and how they lead to spooky/unhinged behaviors. We're hiring - join us! job-boards.greenhouse.io/anthropic/jobs…

511

Andrew Drozdov Retweeted

Ao Qu@ao_qu18465 · Jul 23

🚀 Excited to share my first tweet and to introduce our latest work: MEM1: RL for Memory Consolidation in Long-Horizon Agents. Long-horizon agents (e.g., deep research, web agents) typically store all observations, actions, and intermediate thoughts in context. However, much of…

10.0K

Andrew Drozdov Retweeted

Voyage AI by MongoDB@VoyageAI · Jul 23

📢 voyage-context-3: contextualized chunk embeddings - Auto captures of chunk level detail & global doc context, w/o metadata augmentation - Beats OpenAI-v3-large by 14.24% & Cohere-v4 by 7.89% - Binary 512-dim matches OpenAI (float, 3072-dim) in accuracy, but 192x cheaper in…

18.0K

Andrew Drozdov Retweeted

Ravid Shwartz Ziv@ziv_ravid · Jul 23

We want to start a podcast about cutting-edge AI research and technical breakthroughs. Need a catchy name! What would you call it? The one who suggest the best name will be our guest 🥳

124

245

31.0K

Andrew Drozdov@mrdrozdov · Jul 22

ChatGPT Agent is a huge step up on BearCubs, esp on multimodal/interactive tasks (e.g., playing web games)! It gets 65.8% accuracy vs Deep Research's 36% and Operator's 23%. Humans are at ~85%, and clearly better/faster at fine control & complex filtering.

YYixiao Song@yixiao_song · Mar 12

Introducing 🐻 BEARCUBS 🐻, a “small but mighty” dataset of 111 QA pairs designed to assess computer-using web agents in multimodal interactions on the live web! ✅ Humans achieve 85% accuracy ❌ OpenAI Operator: 24% ❌ Anthropic Computer Use: 14% ❌ Convergence AI Proxy: 13%

3.0K

Andrew Drozdov Retweeted

Cassidy@cassidoo · Jul 21

I launched PocketCal on Product Hunt if y'all wouldn't mind passing along an upvote! ❤️ producthunt.com/products/pocke…

7.0K

Andrew Drozdov Retweeted

Wenting Zhao@wzhao_nlp · Jul 21

Silly but important question: what metrics do you look at / how to vibe-check your training runs are going well, specially under the context of RL/GRPO? Rewards, response lengths, entropy, what more?

307

339

39.0K

Andrew Drozdov Retweeted

Kyunghyun Cho@kchonyc · Jul 21

😂 @wellecks , i think this “challenging problem” may have been finally solved after five years. === Understanding and creating mathematics using natural mathematical language … used by humans is a challenging and important problem for driving progress in machine learning. ===

103

9.0K

Andrew Drozdov@mrdrozdov · Jul 21

arxiv.org/abs/2412.18232

211

Andrew Drozdov@mrdrozdov · Jul 19

Haha. Well, if you want to read @SherylHsu02’s excellent last paper just before joining OpenAI, read LeReT at ICLR’25. She showed that using @DSPyOSS’s optimizers to diversify the prompts used for sampling trajectories improves RL on multi-step programs. Threads below.

NNIK@ns123abc · Jul 19

“Yeah, Sheryl? $300 million has been wired to your bank account”

442

391

51.0K

Andrew Drozdov Retweeted

Arvind Narayanan@random_walker · Jul 21

Back in grad school, when I realized how the “marketplace of ideas” actually works, it felt like I’d found the cheat codes to a research career. Today, this is the most important stuff I teach students, more than anything related to the substance of our research. A quick…

421

528

48.0K

Andrew Drozdov Retweeted

Andrew Drozdov@mrdrozdov · Jul 19

Here's three of my more *controversial* proposals from the SIGIR / ICTIR 2025 panel on "LLMs + IR, what could possibly go wrong?"

616

Andrew Drozdov@mrdrozdov · Jul 20

A cool outcome here would be if future IMO exclusively includes problems that we know recent generation of LLMs can not yet solve.

EErnest Ryu@ErnestRyu · Jul 19

6. I don’t think LLMs will replace mathematicians anytime soon. Math research is about solving problems *no one* yet knows how to solve (out-of-distribution), and this requires significant creativity, something notably absent from OpenAI’s IMO solutions. (6/10)

670

Andrew Drozdov@mrdrozdov · Jul 20

I’m interpreting this as pro-Human rather than anti-AI. Give people time and tools, and be amazed.

RRota@pli_cachete · Jul 19

Terence Tao on the supposed Gold from OpenAI at IMO

451

Andrew Drozdov@mrdrozdov · Jul 19

AegisLLM leverages DSPy's MIPROv2 optimizer in a totally unexpected way: to evolve its prompts based on the attacks it sees in real time. Some really large gains!

AAhmad Beirami@abeirami · Jul 19

If you are interested in building agentic workflows, AegisLLM is a nice instantiation in safety/security domain! Thanks @furongh for sharing it with me. Agentic workflows must be designed and optimized as systems, as @lateinteraction keeps repeating.

141

19.0K

Andrew Drozdov@mrdrozdov · Jul 19

“Reasoning won’t generalize outside of math and code” Maybe we should express everything as math and code… Proofs, Theorems, Lemmas, Corollaries, Conjectures are all math. It’s not just equations. From this perspective we have a lot more flexibility of expression.

351

Andrew Drozdov@mrdrozdov · Jul 18

Excited to talk about long-context models / eval at this panel on Saturday! I'm also looking for a postdoc / PhD students to work on related topics, happy to chat with anyone interested at #ICML2025!

ZZexue He@ZexueHe · Jul 11

💡 Curious about long-context foundation models (LFCM)? 🧠 We’re hosting a panel at the LCFM workshop at #ICML2025 on “How to evaluate long-context foundation models?” — We’d love to feature your question! Anything on long-context evaluation or modeling — drop it below / DM me🎤

3.0K