Tengxiao Liu
@TengxiaoLiu
PhD student @ucsbNLP | Previous intern @aws
🤔Can LMs learn to skip steps to improve reasoning efficiency while maintaining accuracy? ✅The answer is Yes! In our #NeurIPS 2024 work, we show this behavior boosts efficiency, maintains accuracy, and even enhances generalization in OOD scenarios! 🚀arxiv.org/pdf/2411.01855 🧵⬇️

🎉 Thrilled to share MLGym and MLGym-Bench, our new framework for AI Research Agents! 🚀 Developed during my Meta internship, MLGym provides a flexible environment for benchmarking and developing new agents for AI research tasks. 🔬 MLGym-Bench consists of 13 diverse AI research…
Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬 We introduce MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. The key contributions of our work are: 🕹️ Enables the…
# 🚨 4B open-recipe model beats Claude-4-Opus 🔓 100% open data, recipe, model weights and code. Introducing Polaris✨--a post-training recipe for scaling RL on advanced reasoning models. 🥳 Check out how we boost open-recipe reasoning models to incredible performance levels…
Super interesting work on retraction!
🧐When do LLMs admit their mistakes when they should know better? In our new paper, we define this behavior as retraction: the model indicates that its generated answer was wrong. LLMs can retract—but they rarely do.🤯 arxiv.org/abs/2505.16170 👇🧵
Excited to share our paper at #ICML2025! We've developed MELON🍉, a robust defense method against indirect prompt injection attacks on LLM agents that achieves near 0 ASR! Hope you enjoy🍉! Grateful to my incredible collaborators! @WilliamWangNLP @WenboGuo4 @jd92wang @xianjun_agi
Are attention heads the right units to mechanistically understand Transformers' attention behavior? Probably not due the attention superposition! We extracted interpretable attention units in LMs and found finer grained versions of many known and novel attention behaviors. 🧵1/N
📜🚨 Check out our latest work on "Self-Resource Allocation in Multi-Agent LLM Systems" where we explore how LLMs can be used to optimize task allocation in multi-agent systems 🤖 🧵(1/3)
🎉Thrilled to share my internship work with the @NVIDIA GenAIR team (accepted to #CVPR2025): BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations! 🚀BlobGEN-Vid is a model-agnostic framework that delivers: - SOTA layout controllability - Enhanced…
🚀This is so cool!
🥳 Introducing SpeechGPT 2.0-preview: A GPT-4o-level, real-time spoken dialogue system! (Only Chinese for now) 🎆 Highlights: ~⚡️ Real-time speech-to-speech dialogue with latency under 200ms ~😊 Rich in emotion and diverse in style, with strong speech style generalization ~🦁…
A Technical Roadmap of o1 from a Reinforcement Learning Perspective Arxiv Link: arxiv.org/pdf/2412.14135…
Come join the #NeurIPS2024 poster session and discuss whether language models can learn to skip steps in reasoning! 🗓Dec 12, Thursday, 11:00 am - 2:00 pm 📍East Exhibit Hall A-C #2900 Feel free to stop by and say hi! I am actively seeking Summer 2025 internship opportunities!
🤔Can LMs learn to skip steps to improve reasoning efficiency while maintaining accuracy? ✅The answer is Yes! In our #NeurIPS 2024 work, we show this behavior boosts efficiency, maintains accuracy, and even enhances generalization in OOD scenarios! 🚀arxiv.org/pdf/2411.01855 🧵⬇️
🙌
Here at NeurIPS with @TengxiaoLiu @AlbalakAlon @yyqcode @siyuan___wang @FatimaJahara1 Hmu to hang!
Flying to #NeurIPS2024 tmr! Excited to connect with friends old & new. I'll be presenting the following works: 🪧[Poster] COrAL: arxiv.org/abs/2410.09675 🎙️[Lightning Talk] MCTS-DPO: arxiv.org/abs/2405.00451 🪧[Poster] DeGCG: arxiv.org/abs/2408.14866 Drop by and have a chat!
🚀
Arrived in Vancouver for #NeurIPS2024 🇨🇦! I'll be presenting Alignment for Honesty, a year-old paper that still fascinates me with how LLMs navigate knowledge boundaries. Also glad to chat about self-correction and reasoning. Actively seeking a 2025 summer internship!
🚨😱Obligatory job market announcement post‼️🤯 I'm searching for faculty positions/postdocs in multimodal/multilingual NLP and generative AI! I'll be at #NeurIPS2024 presenting our work on meta-evaluation for text-to-image faithfulness! Let's chat! Website in bio, papers in🧵