DAIR.AI

@dair_ai

Democratizing AI research, education, and technologies. Learn how to build with AI in our new AI Academy: https://dair-ai.thinkific.com/

Joined July 2017

1Following

76KFollowers

Pinned

DAIR.AI@dair_ai · Jul 20

Top AI Papers of The Week (July 14 - 20): - Agentic-R1 - Context Rot - Scaling up RL - A Survey of AIOps - Chain-of-Thought Monitorability - One Token to Fool LLM-as-a-Judge - A Survey of Context Engineering for LLMs Read on for more:

155

1.0K

143.0K

Pinned

DAIR.AI Retweeted

elvis@omarsar0 · Jul 22

AI Agents Evaluation Evaluation is key to developing reliable and scalable agentic systems. Really enjoyed this conversation with @ptkbhv on *everything* related to AI agent evaluation. One of the many deep dives we have done at the @dair_ai academy. Feel free to share with…

183

33.0K

DAIR.AI Retweeted

elvis@omarsar0 · 11 h

Learning without training Google researchers explore the implicit dynamics of in-context learning. "Implicit weight updates from ICL mirror the effect of actual fine-tuning on the same data." This one is more technical but much needed. The findings:

397

548

47.0K

DAIR.AI Retweeted

elvis@omarsar0 · Jul 23

Deep Research Agents with Test-Time Diffusion Google keeps pushing on diffusion. This time, they apply diffusion to deep research agents, specifically the report generation process. It achieves a 69.1% win rate vs. OpenAI Deep Research on long-form research. My notes:

116

611

618

72.0K

DAIR.AI Retweeted

elvis@omarsar0 · Jul 22

A Structural Planning Framework for LLM Agent System in Enterprise Agentic systems for enterprise are a work in progress. Reliability is a real problem. No secret that planning works, but structural planning can further help improve the reliability of AI agents. My notes:

339

482

29.0K

DAIR.AI Retweeted

elvis@omarsar0 · Jul 21

Every software engineer hits the same wall: “I don’t know why this broke in prod.” AI coding agents fall apart without the right context. Hud captures how your code behaves in production and surfaces that context in your IDE and to AI coding agents via Hud’s MCP server. MCP…

110

107

39.0K

DAIR.AI Retweeted

elvis@omarsar0 · Jul 19

Context Rot Great title for a report, but even better insights about how increasing input tokens impact the performance of top LLMs. Banger report from Chroma. Here are my takeaways (relevant for AI devs):

141

1.0K

2.0K

167.0K

DAIR.AI Retweeted

elvis@omarsar0 · Jul 18

A Survey of Context Engineering 160+ pages covering the most important research around context engineering for LLMs. This is a must-read! Here are my notes:

317

2.0K

3.0K

193.0K

DAIR.AI Retweeted

elvis@omarsar0 · Jul 17

Agent Leaderboard v2 is here! > GPT-4.1 leads > Gemini-2.5-flash excels at tool selection > Kimi K2 is the top open-source model > Grok 4 falls short > Reasoning models lag behind > No single model dominates all domains More below:

216

2.0K

1.0K

270.0K