Revanth Gangi Reddy
@gangi_official
NLP PhD student at @IllinoisCDS. Former AI Resident at @IBMResearch and intern at @Microsoft, @Amazon Alexa AI, @ai2_allennlp, @Apple and @SFResearch.
We cast software issue localization (identifying where to make the fix given a bug report) as a code ranking problem. Our proposed SWERank framework significantly outperforms agent-based systems, while being considerably more cost effective. Our 7B SWERankEmbed retriever even…
🆕Excited to announce SWERank, our code ranking framework for software issue localization. ➡️Paper: bit.ly/3S0x1fV ➡️GitHub Project Page: bit.ly/42SESm3 ➡️AI-Generated Podcast: bit.ly/3GMF51H ➡️Code, Data and Models: Coming soon! (1/3) 🧵 Pinpointing…
Our CoRNStack work will be presented in #ICLR2025 Poster Hall 3 + Hall 2B #480 on Fri 25 April at 3pm Singapore Time. Project Page: gangiswag.github.io/cornstack/ We also have an exciting new follow-up work on Issue Localization dropping soon!

PhD #24 - Congratulations to Dr. Revanth Reddy @gangi_official on successfully defending his amazing PhD thesis and joining Google DeepMind as a research scientist! Many thanks to my friends and collaborators for co-advising him in the past several years!
I've successfully defended my PhD thesis on automated information seeking! Extremely grateful to my advisor @hengjinlp, committee members and all collaborators. Next, I'll be joining @GoogleDeepMind as a research scientist! Link to defense slides: docs.google.com/presentation/d…

The models and code are now public! Models on HF: huggingface.co/collections/Sa… Code: github.com/SalesforceAIRe… Project Page: salesforceairesearch.github.io/SweRank/ If you are interesting in integrating the SweRank models as a plug-in within VS Code, please do reach out! We have more exciting…
📣 SweRank: How AI is Revolutionizing Software Issue Localization 📣 See how it works: bit.ly/44pQw8G SweRank offers a more efficient solution for locating exact code parts that need modification to resolve software issues, using a two-step "retrieve-and-rerank"…
We’ve been seeing amazing results from trying out the SWERank framework on public GitHub issues. We’re looking for support in building a VSCode plugin/PR review bot to assist developers with issue localization. If you’re interested, please DM!
We cast software issue localization (identifying where to make the fix given a bug report) as a code ranking problem. Our proposed SWERank framework significantly outperforms agent-based systems, while being considerably more cost effective. Our 7B SWERankEmbed retriever even…
nomic-embed-code is 26.35GB, but this post also introduced me to the much smaller 521.60MB CodeRankEmbed - I got that working just now with LLM and "llm embed-multi", notes here: simonwillison.net/2025/Mar/27/no…
Introducing a new state-of-the-art code embedding model - SOTA performance on the CodeSearchNet benchmark - Truly open (weights, data, code) - Apache 2.0 Nomic Embed Code brings us one step closer to a world with truly open source embedding models for every modality! 🧵
Nomic Embed Code is based on the CoRNStack (arxiv.org/abs/2412.01007), our curated dataset of high quality code data. We’re also happy to announce that CoRNStack was accepted to ICLR 2025, thanks to the diligent work of @TarunSures41845, @gangi_official, and @zach_nussbaum.
As large language models increasingly serve as judges for evaluating other models during both development and deployment, most existing benchmarks still focus on non-contextual tasks like chat completions or logical reasoning. Our team present ⚖️ ContextualJudgeBench ⚖️ — a…
Testing LLMs' reasoning skills is tough—human evaluations are expensive, data contamination is common, and LLM judges can be biased. We propose StructTest, the first benchmark that checks how well LLMs follow complex instructions and create structured outputs. It uses a…
🚀 We benchmarked our INFOGENT framework against the latest LLM web search APIs from @OpenAI & @perplexity_ai—our approach proves to be competitive with these proprietary solutions! I'm currently on the industry job market for research scientist roles. Please do reach out if you…
🚀 Can LLMs aggregate information from diverse web sources ? We try to answer that in our latest work: INFOGENT: a modular, agent-based framework for information aggregation on the web! Website: gangiswag.github.io/infogent/ 🌐🔍 🧵 [1/n]
Checkout out our #CVPR2025 work on leveraging web image retrieval for improving long talk object detection. Special congrats to @MSidhu51205 who led this project and is applying to grad programs!
🔥 Thrilled to announce our paper “SearchDet: Training-Free Long Tail Object Detection via Web-Image Retrieval” has been accepted to #CVPR2025! 🚀 We’re redefining object detection by leveraging web image retrieval – no extra training required! Paper - arxiv.org/abs/2409.18733
Now accepted to NAACL 2025 Findings. More on RAG agents soon! I’m now in the Bay Area interning at @SFResearch with @JotyShafiq. Feel free to reach out if you’re in the Bay Area and wanna chat about LLMs and search!
🚀 Can LLMs aggregate information from diverse web sources ? We try to answer that in our latest work: INFOGENT: a modular, agent-based framework for information aggregation on the web! Website: gangiswag.github.io/infogent/ 🌐🔍 🧵 [1/n]
CoRNStack is now accepted to ICLR 2025! The contrastive training data along with the code embedding and code reranking models are all public here: gangiswag.github.io/cornstack/
Excited to announce🌽CoRNStack, comprising high-quality contrastive text-code data created in collaboration with @nomic_ai. Project Page: gangiswag.github.io/cornstack/ Our CodeRankEmbed model trained using CornStack achieves state-of-the-art performance on code retrieval tasks! 🧵👇
The paper is now available here: arxiv.org/pdf/2412.01007 with the data and models released on HF: huggingface.co/cornstack @TarunSures41845 and I will be at NeurIPS in Vancouver this week. Feel free to reach out to chat more on embedding models, code generation or LLMs in general
Excited to announce🌽CoRNStack, comprising high-quality contrastive text-code data created in collaboration with @nomic_ai. Project Page: gangiswag.github.io/cornstack/ Our CodeRankEmbed model trained using CornStack achieves state-of-the-art performance on code retrieval tasks! 🧵👇
👋 NAACL 2025-2026 election has just been launched! I'm running for NAACL board again this year. Please cast your vote for me if you care about bridging academic and industry research, promoting interdisciplinary work, and supporting growth of the community. Thanks in…