Alex Gurung
@AlexAag1234
PhD student at @EdinburghNLP | undergrad+masters @gtcomputing
🚨New paper alert!🚨 "Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them" @ActInterp ICML'25 @deepseek_ai popularised RLVR and distillation for 'reasoning training'! But how do they differ under the hood? Details in 🧵: (1/8)
This work was accepted to @COLM_conf 2025! See you soon in Montréal! 🍁
Preprint: Can we learn to reason for story generation (~100k tokens), without reward models? Yes! We introduce an RLVR-inspired reward paradigm VR-CLI that correlates with human judgements of quality on the 'novel' task of Next-Chapter Prediction. Paper: arxiv.org/abs/2503.22828
📣This work will appear at the ICLR 2025 Workshop on Reasoning and Planning for LLMs.🇸🇬 I'm currently on the job market, looking for research scientist roles. Feel free to reach out if you're hiring or know of any opportunities!
LLMs can tackle math olympiad probs but... can they read a clock 🤔? 🕰️📆 Our experiments reveal surprising failures in temporal reasoning—MLLMs struggle with analogue clock reading & date inference! Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs
🚀 New ArXiv paper alert! By combining agentic frameworks (ReAct) with smart decoders (DeCoRe, DoLa, CAD), we boost factual accuracy in complex reasoning tasks —reducing those annoying hallucinations! 🔥 🔗 Paper: arxiv.org/abs/2503.23415 1\n
Can multimodal LLMs truly understand research poster images?📊 🚀 We introduce PosterSum—a new multimodal benchmark for scientific poster summarization! 🪧 📂 Dataset: huggingface.co/datasets/rohit… 📜 Paper: arxiv.org/abs/2502.17540
🔥 New Preprint! 🔥 How should LLMs handle ambiguous questions in text-to-SQL semantic parsing? 👉🏼 Disambiguate First, Parse Later! We propose a plug-and-play approach that explicitly disambiguates the question 💬 Paper: arxiv.org/abs/2502.18448
🎉 Excited to share “Generalizing from Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning” 📄 (arxiv.org/pdf/2502.15592) We propose "context synthesis": instead of generating instructions from long texts, we synthesize contexts for instructions—drawing…
🎉 Introducing Open Reasoner Zero 🚀 Performance: Matches DeepSeek R1-Zero (32B) in just 1/30 steps! 📚 Full training strategies & technical paper 💻 100% open-source: Code + Data + Model ⚖️ MIT licensed - Use it your way! 🌊 Let the Reasoner-Zero tide rise! 🚢 1/n
Is sparsity the key to conditional computation, interpretability, long context/generation, and more in foundation models? Find out at my #NeurIPS2024 tutorial on Dynamic Sparsity in Machine Learning with @andre_t_martins! Followed by a panel with @sarahookr and @murefil 🧵
The next EuroLLM model is out 🎉 We support all the 🇪🇺 EU languages (+ more), but now in a 9B size (base and instruct). We are not done yet; stay tuned for more 👀
Today we release EuroLLM-9B: the best EU-made multilingual LLM of its size! Check the blog post for more info and results: huggingface.co/blog/eurollm-t…. Stay tuned for the technical report and bigger and more powerful models!