Anirudh Khatry (@AnirudhKhatry)

Pinned

A

Anirudh Khatry@AnirudhKhatry · Apr 23

🚀Introducing CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️ A dataset of 100 real-world C repositories across various domains, each paired with: 🦀 Handwritten safe Rust interfaces. 🧪 Rust test cases to validate correctness. 🧵[1/6]

AnirudhKhatry's tweet image. 🚀Introducing CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️
A dataset of 100 real-world C repositories across various domains, each paired with:
🦀 Handwritten safe Rust interfaces.
🧪 Rust test cases to validate correctness.
🧵[1/6]

2

19

65

19

13.0K

A

Anirudh Khatry@AnirudhKhatry · Jul 22

Thanks to MIT News for covering our vision of AI for code! A lot of progress made, but still a long way to go!

MMIT CSAIL@MIT_CSAIL · Jul 22

Can AI actually code for us? 🧵 MIT research reveals there’s a "long way to go" due to bottlenecks like assessment, codebase scale, & incorrect retrievals. The work reflects a vision to let humans focus on high-level design while routine work is automated:…

1

3

25

2

2.0K

A

Anirudh Khatry@AnirudhKhatry · Jul 9

CRUST-bench was accepted to @COLM_conf #COLM2025!

AAnirudh Khatry@AnirudhKhatry · Apr 23

🚀Introducing CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️ A dataset of 100 real-world C repositories across various domains, each paired with: 🦀 Handwritten safe Rust interfaces. 🧪 Rust test cases to validate correctness. 🧵[1/6]

3

1

22

1

750

Anirudh Khatry Retweeted

Y

Yoav Artzi@yoavartzi · Jul 8

@COLM_conf decisions are out, and so are we The strength of submissions this year amazed us! Many many hard decisions 😩 + @AdtRaghunathan, @eunsolc, @RanjayKrishna 😴😴😴

2

8

74

1

4.0K

Anirudh Khatry Retweeted

P

PLDI@PLDI · Jun 28

Last but not least, the SIGPLAN Robin Milner Young Researcher Award was also announced at PLDI. This year, the award went to Işıl Dillig (@IsilDillig), whose research has had profound and far-reaching contributions to program analysis, verification, and synthesis ⭐️

1

9

55

3

2.0K

A

Anirudh Khatry@AnirudhKhatry · Jun 3

CosmicAI collab: benchmarking the utility of LLMs in astronomy coding workflows & focusing on the key research capability of scientific visualization. @sebajoed @jessyjli @Murtazahusaintx @gregd_nlp @StephaJuneau @paultorrey9 Adam Bolton, Stella Offner, Juan Frias, Niall Gaffney

SSebastian Joseph@sebajoed · Jun 2

How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵

0

6

7

0

1.0K

A

Anirudh Khatry@AnirudhKhatry · Jun 3

Thinking about composing skills combining the strength of multiple models? Check out the amazing work by @fangcong_y10593 and team!

FFangcong Yin@fangcong_y10593 · Jun 2

Solving complex problems with CoT requires combining different skills. We can do this by: 🧩Modify the CoT data format to be “composable” with other skills 🔥Train models on each skill 📌Combine those models Lead to better 0-shot reasoning on tasks involving skill composition!

0

2

1

212

Anirudh Khatry Retweeted

F

Fangcong Yin@fangcong_y10593 · Jun 2

Solving complex problems with CoT requires combining different skills. We can do this by: 🧩Modify the CoT data format to be “composable” with other skills 🔥Train models on each skill 📌Combine those models Lead to better 0-shot reasoning on tasks involving skill composition!

5

32

87

41

11.0K

Anirudh Khatry Retweeted

K

Kanishka Misra 🌊@kanishkamisra · Jun 2

News🗞️ I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘 Excited to develop ideas about linguistic and conceptual generalization! Recruitment details soon

47

19

281

33

20.0K

A

Anirudh Khatry@AnirudhKhatry · Jun 2

Very cool work by @sebajoed and the team!

SSebastian Joseph@sebajoed · Jun 2

How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵

0

1

0

152

Anirudh Khatry Retweeted

S

Sebastian Joseph@sebajoed · Jun 2

How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵

1

8

18

2

5.0K

Anirudh Khatry Retweeted

Y

Yizhong Wang@yizhongwyz · May 30

Thrilled to announce that I will be joining @UTAustin @UTCompSci as an assistant professor in fall 2026! I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! 🤠🤘

101

54

669

72

73.0K

Anirudh Khatry Retweeted

L

Liyan Tang@LiyanTang4 · May 20

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B

2

29

76

30

11.0K

A

Anirudh Khatry@AnirudhKhatry · May 20

Check out ChartMuseum from @LiyanTang4 @_grace_kim and many other collaborators from UT! Charts questions take us beyond current benchmarks for math/multi-hop QA/etc., which CoT is very good at, to *visual reasoning*, which is hard to express with text CoT!

LLiyan Tang@LiyanTang4 · May 20

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B

1

9

34

5

3.0K

Anirudh Khatry Retweeted

I

Isil Dillig@IsilDillig · May 8

The FY26 budget slashes NSF by 55%, which directly threatens basic research in the United States. Please call your reps NOW and tell them to reject these cuts and protect science funding. You can find them here : congress.gov/members

0

6

27

2

3.0K

Anirudh Khatry Retweeted

E

Elias Stengel-Eskin@EliasEskin · May 5

Extremely excited to announce that I will be joining @UTAustin @UTCompSci in August 2025 as an Assistant Professor! 🎉 I’m looking forward to continuing to develop AI agents that interact/communicate with people, each other, and the multimodal world. I’ll be recruiting PhD…

92

65

448

47

48.0K

Anirudh Khatry Retweeted

N

Niloofar (✈️ ACL)@niloofar_mire · May 6

📣Thrilled to announce I’ll join Carnegie Mellon University (@CMU_EPP & @LTIatCMU) as an Assistant Professor starting Fall 2026! Until then, I’ll be a Research Scientist at @AIatMeta FAIR in SF, working with @kamalikac’s amazing team on privacy, security, and reasoning in LLMs!

223

66

1.0K

71

89.0K

A

Anirudh Khatry@AnirudhKhatry · Apr 25

Sad to missing ICLR, but catch up with Zayne about his latest work on LLM reasoning!

ZZayne Sprague@ZayneSprague · Apr 25

I’ll be presenting our work, To CoT or not to CoT? Chain-of-Thought Helps Mainly on Math and Symbolic Reasoning at ICLR as a poster today at 3pm #292. Stop by and chat about reasoning, CoT, LongCoTs, math etc!! See you there 🤘

0

3

19

0

2.0K

A

Anirudh Khatry@AnirudhKhatry · Apr 25

I’ll be presenting our work, To CoT or not to CoT? Chain-of-Thought Helps Mainly on Math and Symbolic Reasoning at ICLR as a poster today at 3pm #292. Stop by and chat about reasoning, CoT, LongCoTs, math etc!! See you there 🤘

ZZayne Sprague@ZayneSprague · Sep 19

To CoT or not to CoT?🤔 300+ experiments with 14 LLMs & systematic meta-analysis of 100+ recent papers 🤯Direct answering is as good as CoT except for math and symbolic reasoning 🤯You don’t need CoT for 95% of MMLU! CoT mainly helps LLMs track and execute symbolic computation

1

6

26

4

4.0K

Anirudh Khatry Retweeted

S

Stefania Druga@Stefania_druga · Apr 25

I will be talking about the Future of Multimodal AI applications at this @iclr_conf workshop on Monday 28th April at 2 pm local time #ICLR25 dl4c.github.io/schedule/

1

4

13

1

1.0K

A

Anirudh Khatry@AnirudhKhatry · Apr 23

The paper you should be reading right now.

KKevin Pu@kevpjk · Mar 2

🤖👩🏻‍💻Proactive AI tools like @GitHubCopilot @cursor_ai @allhands_ai promise to assist developers by anticipating their needs and automating engineering processes—but do they truly help? We evaluated three design probes to explore the trade-offs of proactive AI programming support

1

3

22

6

4.0K