Jie Liu
@jie_liu1
Ph.D. student @ MMLab, CUHK
Really happy to see that Long-RL adopted Flow-GRPO (github.com/yifan123/flow_…) and implemented it on the popular RL repo verl. Amazing work 👏
Video understanding isn't just recognizing —it demands reasoning across thousands of frames. Meet Long-RL🚀 Highlights: 🧠 Dataset: LongVideo-Reason — 52K QAs with reasoning. ⚡ System: MR-SP - 2.1× faster RL for long videos. 📈 Scalability: Hour-long videos (3,600 frames) RL…
Thanks for sharing your work! Flow-GRPO significantly enhances the capabilities of flow matching models within a given dimensionality. Our code is at: github.com/yifan123/flow_…
Flow-GRPO: Training Flow Matching Models via Online RL "We propose Flow-GRPO, the first method integrating online reinforcement learning (RL) into flow matching models" "RL-tuned SD3.5 generates nearly perfect object counts, spatial relations, and fine-grained attributes,…
Nice work👏👏👏
🎉 Introducing Open Reasoner Zero 🚀 Performance: Matches DeepSeek R1-Zero (32B) in just 1/30 steps! 📚 Full training strategies & technical paper 💻 100% open-source: Code + Data + Model ⚖️ MIT licensed - Use it your way! 🌊 Let the Reasoner-Zero tide rise! 🚢 1/n
🥳Update for <Improving Video Generation with Human Feedback>: - The Reward Model for video has been released: github.com/KwaiVGI/VideoA… huggingface.co/KwaiVGI/VideoR… - together with VideoGen-RewardBench dataset: huggingface.co/datasets/KwaiV…
We present a comprehensive exploration and analysis of human feedback (RLHF) in modern flow-based video diffusion models. It consists of 4 parts. Paper: arxiv.org/abs/2501.13918 Project Page: gongyeliu.github.io/videoalign/ (1/n)
[1/3] Want to capture a fantastic HDR image by 2 simple shots with your cellphone? Try UltraFusion HDR. It takes two images with exposure differences up to 9 stops, and it robustly generates HDR output. Try your own captured images (supporting 4Kx3K): huggingface.co/spaces/iimmort…
One interesting learning from the R1 and K1.5 tech report is the usage of string matching based binary reward: I’ve tried it myself in 2022 using FlanT5, my friends tried it in 2023 with Llama 1 and in early 2024 with llama 2, but all failed completely. It is only after late…