Guowei Xu
@Kevin_GuoweiXu
Undergraduate student at Yao Class (Tsinghua University), interested in Language Models and Reinforcement Learning
Unfortunately I cannot attend the conference in person this year, but our co-author @Kevin_GuoweiXu will be presenting the paper and answer all your questions! 📜Poster session: Time: Wed 16 Jul 11 a.m. PDT — 1:30 p.m. PDT Location: West Exhibition Hall B2-B3 #W-607
🚀 Introducing MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning! 🌟 We propose a strong model-free visual RL algorithm that can learn robust visuomotor policies from scratch – in the real world! 💪🤖 🌐 Check out the project…
📢New conference where AI is the primary author and reviewer! agents4science.stanford.edu Current venues don't allow AI-written papers, so it's hard to assess the +/- of such works🤔 #Agents4Science solicits papers where AI is the main author w/ human advisors. 💡Initial reviews by…
Thanks for bringing this to my attention. I honestly wasn’t aware of the situation until the recent posts started going viral. I would never encourage my students to do anything like this—if I were serving as an Area Chair, any paper with this kind of prompt would be…
Five-year-old algorithms SAC and TD3 are still being used as the backbone RL algos today. Consider trying a new well-performing backbone without too much pain? We introduce BAC, a simple but effective method that has a significant performance boost on various tasks. 🧵👇
Thanks to @Kevin_GuoweiXu this model-free RL method and a few other baselines have been added to an experimental/contributed folder in ManiSkill: github.com/haosulab/ManiS… we welcome people to try out some of the more difficult tasks in our benchmark and provide tuned+open-sourced…
Five-year-old algorithms SAC and TD3 are still being used as the backbone RL algos today. Consider trying a new well-performing backbone without too much pain? We introduce BAC, a simple but effective method that has a significant performance boost on various tasks. 🧵👇
🚀 DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Try DeepThink at chat.deepseek.com today! 🐋 1/n
An interesting paper: They found out that existing language models might be essentially seeking a more efficient way to approximate the Data-Tree. These may indicate that the reasoning process in LLMs is more likely to be probabilistic pattern-matching rather than formal…
💡Excited to share our latest research on the explainability of GPT! 🔎 We from a novel perspective to flatten the language dataset and GPT models as the Monte Calo Language Trees, and exhibit their significant similarity. 📰 arxiv.org/pdf/2501.07641 📎 github.com/PKU-YuanGroup/…
SynthLabs + Stanford presents: Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought Proposes Meta Meta-CoT, which extends CoT by explicitly modeling the underlying reasoning required to arrive at a particular CoT
Video understanding is the next frontier, but not all videos are alike. Models now reason over youtube clips and feature films, but what about the everyday spaces we—and our future AI assistants—navigate and experience? Introducing Thinking in Space, our latest study exploring…
Everything you love about generative models — now powered by real physics! Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics…
🚨 New reinforcement learning algorithms 🚨 Excited to announce MaxInfoRL, a class of model-free RL algorithms that solves complex continuous control tasks (including vision-based!) by steering exploration towards informative transitions. Details in the thread 👇
🚀 Introducing MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale We’re excited to open-source: - 12M MM instruction tuning dataset - MAmmoTH-VL-8B, a SoTA VL model (~10B size) on 20+ downstream tasks compared with fully open-source baselines such as…
🚀 LLaVA-CoT is now fully opensource! 🎉 Here’s how you can access everything: 1️⃣ Model: huggingface.co/Xkev/Llama-3.2… 2️⃣ Dataset: huggingface.co/datasets/Xkev/… 3️⃣ Code (Data Generation, Training, Inference): github.com/PKU-YuanGroup/… 4️⃣ Gradio APP: huggingface.co/spaces/Xkev/Ll… 💡 As an academic…
🚀 Introducing LLaVA-o1: The first visual language model capable of spontaneous, systematic reasoning, similar to GPT-o1! 🔍 🎯Our 11B model outperforms Gemini-1.5-pro, GPT-4o-mini, and Llama-3.2-90B-Vision-Instruct! 🔑The key is training on structured data and a novel inference…
LLaVA-o1 is the first visual language model capable of systematic reasoning similar to GPT-o1 🚀 But how does it perform on multimodal math reasoning questions? 🔎 New numbers from LlaVA-o1 on the MathVision Dataset from author @Kevin_GuoweiXu LLaVA-o1 (11B): 23.7%…
Would you believe that deep RL can work without replay buffers, target networks, or batch updates? Our recent work gets deep RL agents to learn from a continuous stream of data one sample at a time without storing any sample. Joint work with @Gautham529 and @rupammahmood.
Updates about our new research: (1) After careful consideration, we have decided to rename LLaVA-o1 to LLaVA-CoT to make its name sound more like an academic research project. We are currently updating this change across platforms such as arXiv and GitHub, which is expected to…
🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! 🔍 o1-preview-level performance on AIME & MATH benchmarks. 💡 Transparent thought process in real-time. 🛠️ Open-source models & API coming soon! 🌐 Try it now at chat.deepseek.com #DeepSeek