Anirudh Khatry
@AnirudhKhatry
CS PhD @UTCompSci | Previously @ProseMsft @MSFTResearch | AI4Code | Guitarist | VJTI ‘21
🚀Introducing CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️ A dataset of 100 real-world C repositories across various domains, each paired with: 🦀 Handwritten safe Rust interfaces. 🧪 Rust test cases to validate correctness. 🧵[1/6]
![AnirudhKhatry's tweet image. 🚀Introducing CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️
A dataset of 100 real-world C repositories across various domains, each paired with:
🦀 Handwritten safe Rust interfaces.
🧪 Rust test cases to validate correctness.
🧵[1/6]](https://pbs.twimg.com/media/GpO9WCKWQAALjAb.png)
![AnirudhKhatry's tweet image. 🚀Introducing CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️
A dataset of 100 real-world C repositories across various domains, each paired with:
🦀 Handwritten safe Rust interfaces.
🧪 Rust test cases to validate correctness.
🧵[1/6]](https://pbs.twimg.com/media/GpO9WCJXMAAstQm.jpg)
Thanks to MIT News for covering our vision of AI for code! A lot of progress made, but still a long way to go!
Can AI actually code for us? 🧵 MIT research reveals there’s a "long way to go" due to bottlenecks like assessment, codebase scale, & incorrect retrievals. The work reflects a vision to let humans focus on high-level design while routine work is automated:…
CRUST-bench was accepted to @COLM_conf #COLM2025!
🚀Introducing CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️ A dataset of 100 real-world C repositories across various domains, each paired with: 🦀 Handwritten safe Rust interfaces. 🧪 Rust test cases to validate correctness. 🧵[1/6]
@COLM_conf decisions are out, and so are we The strength of submissions this year amazed us! Many many hard decisions 😩 + @AdtRaghunathan, @eunsolc, @RanjayKrishna 😴😴😴
Last but not least, the SIGPLAN Robin Milner Young Researcher Award was also announced at PLDI. This year, the award went to Işıl Dillig (@IsilDillig), whose research has had profound and far-reaching contributions to program analysis, verification, and synthesis ⭐️
CosmicAI collab: benchmarking the utility of LLMs in astronomy coding workflows & focusing on the key research capability of scientific visualization. @sebajoed @jessyjli @Murtazahusaintx @gregd_nlp @StephaJuneau @paultorrey9 Adam Bolton, Stella Offner, Juan Frias, Niall Gaffney
How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
Thinking about composing skills combining the strength of multiple models? Check out the amazing work by @fangcong_y10593 and team!
Solving complex problems with CoT requires combining different skills. We can do this by: 🧩Modify the CoT data format to be “composable” with other skills 🔥Train models on each skill 📌Combine those models Lead to better 0-shot reasoning on tasks involving skill composition!
Solving complex problems with CoT requires combining different skills. We can do this by: 🧩Modify the CoT data format to be “composable” with other skills 🔥Train models on each skill 📌Combine those models Lead to better 0-shot reasoning on tasks involving skill composition!
News🗞️ I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘 Excited to develop ideas about linguistic and conceptual generalization! Recruitment details soon
Very cool work by @sebajoed and the team!
How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
Thrilled to announce that I will be joining @UTAustin @UTCompSci as an assistant professor in fall 2026! I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! 🤠🤘
Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B
Check out ChartMuseum from @LiyanTang4 @_grace_kim and many other collaborators from UT! Charts questions take us beyond current benchmarks for math/multi-hop QA/etc., which CoT is very good at, to *visual reasoning*, which is hard to express with text CoT!
Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B
The FY26 budget slashes NSF by 55%, which directly threatens basic research in the United States. Please call your reps NOW and tell them to reject these cuts and protect science funding. You can find them here : congress.gov/members
Extremely excited to announce that I will be joining @UTAustin @UTCompSci in August 2025 as an Assistant Professor! 🎉 I’m looking forward to continuing to develop AI agents that interact/communicate with people, each other, and the multimodal world. I’ll be recruiting PhD…
📣Thrilled to announce I’ll join Carnegie Mellon University (@CMU_EPP & @LTIatCMU) as an Assistant Professor starting Fall 2026! Until then, I’ll be a Research Scientist at @AIatMeta FAIR in SF, working with @kamalikac’s amazing team on privacy, security, and reasoning in LLMs!
Sad to missing ICLR, but catch up with Zayne about his latest work on LLM reasoning!
I’ll be presenting our work, To CoT or not to CoT? Chain-of-Thought Helps Mainly on Math and Symbolic Reasoning at ICLR as a poster today at 3pm #292. Stop by and chat about reasoning, CoT, LongCoTs, math etc!! See you there 🤘
I’ll be presenting our work, To CoT or not to CoT? Chain-of-Thought Helps Mainly on Math and Symbolic Reasoning at ICLR as a poster today at 3pm #292. Stop by and chat about reasoning, CoT, LongCoTs, math etc!! See you there 🤘
To CoT or not to CoT?🤔 300+ experiments with 14 LLMs & systematic meta-analysis of 100+ recent papers 🤯Direct answering is as good as CoT except for math and symbolic reasoning 🤯You don’t need CoT for 95% of MMLU! CoT mainly helps LLMs track and execute symbolic computation
I will be talking about the Future of Multimodal AI applications at this @iclr_conf workshop on Monday 28th April at 2 pm local time #ICLR25 dl4c.github.io/schedule/
The paper you should be reading right now.
🤖👩🏻💻Proactive AI tools like @GitHubCopilot @cursor_ai @allhands_ai promise to assist developers by anticipating their needs and automating engineering processes—but do they truly help? We evaluated three design probes to explore the trade-offs of proactive AI programming support