Puyuan Peng

@PuyuanPeng

Research Scientist @Meta AGI Foundation. Speech & Audio. Previously @utaustin @uchicago @bnu_1902

New York, USA

Joined December 2019

510Following

2KFollowers

Pinned

Puyuan Peng@PuyuanPeng · Apr 6

Announcing the new SotA voice-cloning TTS model: 𝗩𝗼𝗶𝗰𝗲𝗦𝘁𝗮𝗿 ⭐️ VoiceStar is - autoregressive, - voice-cloning, - robust, - duration controllable, - *test-time extrapolation*, generates speech longer than training duration! Code&Model: github.com/jasonppy/Voice…

388

346

27.0K

Puyuan Peng@PuyuanPeng · Jul 19

International Mathematical Olympiad is a civil war

489

Puyuan Peng@PuyuanPeng · Jul 1

A collaboration work with my student Sungbin Kim and Univ. Texas Austin team will be presented in ICCV 2025.

PPuyuan Peng@PuyuanPeng · Jul 1

The work is led by the amazing Sungbin Kim sites.google.com/view/kimsungbin, and collaborated with Jeongsoo Choi, Joon Son Chung, @Tae_Hyun_Oh, David Harwath Checkout voicecraft-dub.github.io for more samples, and the forthcoming code and model!

560

Puyuan Peng@PuyuanPeng · Jun 12

Thanks for featuring VoiceStar, our latest, most powerful TTS (and an upgrade from VoiceCraft last year). Fully open, permissively licensed at github.com/jasonppy/Voice…

GGitHub@github · Jun 10

The AI landscape is evolving fast, and staying on top of the latest open-source projects is crucial for every developer. 🚀 Swipe to see our list of the top new open-source AI projects on GitHub, from multi-agent systems to composable tools and cutting-edge speech synthesis.…

829

Puyuan Peng Retweeted

Vaibhav (VB) Srivastav@reach_vb · May 29

There will be DeepSeek R1 0528 Qwen 3 8B too matching Qwen 3 235B Thinking in performance too 🤯 Whale COOKED!

694

224

45.0K

Puyuan Peng@PuyuanPeng · May 28

The paper is out! arxiv.org/pdf/2505.19462

PPuyuan Peng@PuyuanPeng · Apr 6

5.0K

Puyuan Peng Retweeted

Liyan Tang@LiyanTang4 · May 20

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B

11.0K

Puyuan Peng Retweeted

Elias Stengel-Eskin@EliasEskin · May 5

Extremely excited to announce that I will be joining @UTAustin @UTCompSci in August 2025 as an Assistant Professor! 🎉 I’m looking forward to continuing to develop AI agents that interact/communicate with people, each other, and the multimodal world. I’ll be recruiting PhD…

450

51.0K

Puyuan Peng Retweeted

yuwen lu@yuwen_lu_ · Apr 26

i’m at #chi2025 and i’ll be on the industry job market later this year! i work in human-ai interaction. my prev projects focused on design tools. i love design. i love user interfaces. i trained myself to become an ai engineer to push our tools further. i believe ai is on…

3.0K

Puyuan Peng Retweeted

Patrick O'Reilly@ReillyOPatrick · Apr 21

You can try yourself using this HuggingFace space, which applies the VoiceCraft codec trained by @PuyuanPeng et al. (5/8) huggingface.co/spaces/oreilly…

268

Puyuan Peng Retweeted

Harshit Sikchi (will be at RLC 25)@harshit_sikchi · Apr 11

As I near the end of my PhD journey, I am excited to share that I will be joining the research efforts @OpenAI, working with @hadisalmanX @aleks_madry and the great team to unlock new capabilities with frontier models. Austin has been one of the best places I have lived in and I…

366

27.0K

Puyuan Peng@PuyuanPeng · Apr 10

Our incredible team built many models announced here, including image, voice, music and video generation! And: I'm moving to London this summer, and I'm hiring for research scientist and engineering roles! Our focus is on speech & music in Zurich, Paris & London. DM/email me.

GGoogle Cloud@googlecloud · Apr 10

Day 1 of #GoogleCloudNext ✅ Here’s a taste of all the things that we announced today across infrastructure, research and models, Vertex AI, and agents → goo.gle/4j0u0rH Hint: Ironwood TPUs, Gemini on Google Distributed Cloud, Gemini 2.5 Flash, Lyria, and more.

112

19.0K

Puyuan Peng Retweeted

Freda Shi@fredahshi · Mar 28

I received a review like this five years ago. It’s probably the right time now to share it with everyone who wrote or got random discouraging reviews from ICML/ACL.

418

50.0K

Puyuan Peng Retweeted

Vipul Gupta@vipul_1011 · Oct 30

🚨 New paper alert 🚨 Ever struggled with quick saturation or unreliability in benchmark datasets? Introducing SMART Filtering to select high-quality, reducing dataset size by 48% on avg (up to 68% for ARC!) and improving correlation with scores from ChatBot Arena! 📈✨ (1/N)

24.0K

Puyuan Peng@PuyuanPeng · Mar 21

This project is well on time! Check it out if you are interested in replicating OpenAI’s audio agent

AAnuj Diwan@anuj_diwan · Mar 21

If you'd like an open-source text-to-speech model that follows your style instructions, consider using our ParaSpeechCaps-based model! Model: huggingface.co/ajd12342/parle… Paper: arxiv.org/abs/2503.04713

630

Puyuan Peng Retweeted

Berrak Sisman@berraksismann · Mar 9

Exciting News!😊INTERSPEECH 2028 will take place at the River Walk in San Antonio, Texas! ✨ I’m honored to serve as one of the General Chairs alongside John Hansen and Carlos Busso @BussoCarlos - We hope you’ll love this city as much as we do! services.isca-speech.org/iscapad/iscapa…

3.0K

Puyuan Peng Retweeted

Anuj Diwan@anuj_diwan · Mar 7

Introducing ParaSpeechCaps, our large-scale style captions dataset that enables rich, expressive control for text-to-speech models! Beyond basic pitch or speed controls, our models can generate speech that sounds "guttural", "scared", "whispered" and more; 59 style tags in total.

10.0K

Puyuan Peng Retweeted

arXiv Sound@ArxivSound · Mar 7

``Scaling Rich Style-Prompted Text-to-Speech Datasets,'' Anuj Diwan, Zhisheng Zheng, David Harwath, Eunsol Choi, ift.tt/vL5aeJO

1.0K