Ruibin Yuan
@abc43992899
ML Student @HKUST, Music Tech MS @CarnegieMellon. Co-Founder of Multimodal Art Projection Community (MAP)
🔥
YuE Song Generation AI now works even for 8GB VRAM! Thanks to Morpheus who sent a PR to add the 8GB VRAM option to the pinokiofactory repo, now 8GB vram machines can generate songs on their local machine.
🚀 Thrilled to announce our new work: FR3E (First Return, Entropy-Eliciting Explore)! LLM reasoning with Reinforcement Learning often struggles with unstable and inefficient exploration. We propose FR3E, a structured framework to make it more robust & efficient.
An impressive song generation project from Tencent It delivers impressive audio fidelity while maintaining fast generation speed. 📄 Paper: arxiv.org/abs/2506.07520 💻 Code: github.com/tencent-ailab/… 🧪 Try it here: huggingface.co/spaces/tencent…
Our team from the Microsoft Research Asia, UCLA, Chinese Academy of Sciences, Tsinghua University, and released a paper, “TL;DR: Too Long, Do Re-weighting for Efficient LLM Reasoning Compression”proposing an innovative training method that effectively compresses the reasoning.
Hi everyone! The field of LLM-based reasoning has seen tremendous progress and rapid development over the past few months. We’ve updated our survey for exciting advances, now covering over 500 papers! 《From system 1 to system 2 a survey of reasoning large language models》
🙋♂️ Can RL training address model weaknesses without external distillation? 🚀 Please check our latest work on RL for LLM reasoning! 💯 TL;DR: We propose augmenting RL training with synthetic problems targeting model’s reasoning weaknesses. 📊Qwen2.5-32B: 42.9 → SwS-32B: 68.4
Announcing 🎙️ Kimi-Audio! Our new open-source audio foundation model advances capabilities in audio understanding, generation, and conversation. Key Features & Achievements: ✅ Universal audio foundation model handles diverse tasks like speech recognition, audio understanding,…
🔥Kimi-Audio, a universal audio foundation model pre-trained on 13+ million hours of audio data and achieving SOTA performance on 10+ audio benchmarks. Tech Report: arxiv.org/abs/2504.18425 Model & Code & Evalkit: github.com/MoonshotAI/Kim… Congrats to the excellent team!
Announcing 🎙️ Kimi-Audio! Our new open-source audio foundation model advances capabilities in audio understanding, generation, and conversation. Key Features & Achievements: ✅ Universal audio foundation model handles diverse tasks like speech recognition, audio understanding,…
🚀 General-Reasoner: Generalizing LLM Reasoning Across All Domains (Beyond Math) Most recent RL/R1 works focus on math reasoning—but math-only tuning doesn't generalize to general reasoning (e.g. drop on MMLU-Pro and SuperGPQA). Why are we limited to math reasoning? 1. Existing…
We are excited to introduce FlexWorld, a framework capable of generating 3D scenes from a single image that supports flexible viewpoint navigation, including 360° rotation and zooming. Code and model weights are open-source—try it out! Project Page:ml-gsai.github.io/FlexWorld/
YuE just dropped as an open-source AI that turns lyrics into fully-formed songs - vocals, instruments, structure and all. Generates pretty coherent, full-length tracks across multiple languages. Paper: arxiv.org/pdf/2503.08638 Demos are here (including a metal song called 'step…
Now YuE paper is finally out, check it out! arxiv: arxiv.org/abs/2503.08638 demo: map-yue.github.io code: github.com/multimodal-art… @huggingface @_akhaliq
IT KEEPS GETTING BETTER: YuE (乐) open-source full-song music generation model that rivals Suno AI! It’s Hugging Face & LLAMA-compatible for easy fine-tuning.
Thanks for sharing our work!
YuE: Scaling Open Foundation Models for Long-Form Music Generation "We tackle the task of long-form music generation—particularly the challenging lyrics-to-song problem—by introducing YuE (乐), a family of open foundation models based on the LLaMA2 architecture. Specifically,…
We're excited to introduce our TTS model Spark-TTS: ✅ Qwen2.5 architecture – single-stage, single-stream ✅Natural voice cloning & cross-lingual synthesis ✅ Voice Creation 📄 Paper: arxiv.org/pdf/2503.01710 🖥 Code: github.com/SparkAudio/Spa… 🎧 Demo: sparkaudio.github.io/spark-tts/
Check out Spark-TTS on Hugging Face: 🤗huggingface.co/SparkAudio/Spa… You can also give it a try directly here: 🤗huggingface.co/spaces/Mobvoi/….
We're excited to introduce our TTS model Spark-TTS: ✅ Qwen2.5 architecture – single-stage, single-stream ✅Natural voice cloning & cross-lingual synthesis ✅ Voice Creation 📄 Paper: arxiv.org/pdf/2503.01710 🖥 Code: github.com/SparkAudio/Spa… 🎧 Demo: sparkaudio.github.io/spark-tts/
❗️Open source MOE kernels alert❗️ Introducing COMET, a computation/communication library for MoE models from Bytedance. Battle-tested in our 10k+ GPU clusters, COMET shows promising efficiency gains and significant GPU-hour savings (millions 💰💰💰). Integration of DualPipe &…
I interviewed for LLM/ML research scientist/engineering positions last Fall. Over 200 applications, 100 interviews, many rejections & some offers later, I decided to write the process down, along with the resources I used. Links to the process & resources in the following tweets