Heng-Jui Chang
@hjchang87
🎓 PhD Candidate @MIT_CSAIL 🧪 Research Scientist Intern @AIatMeta
📣🎉 Excited to announce that our paper was accepted to #INTERSPEECH2023! arxiv.org/abs/2305.11072 github.com/vectominist/sp… 💡 Speaker-invariant Clustering (Spin) 1️⃣ Disentangles speaker 2️⃣ Preserves content 3️⃣ Benefits speech recognition & acoustic unit discovery

💡Bridging speech, sound, & music representations with one universal model? We introduce USAD ✅ 📚 Distills knowledge from domain-specific SSL models 🎯 Matches expert models across speech/audio/music tasks 📄 arxiv.org/abs/2506.18843 🧑💻 huggingface.co/MIT-SLS/USAD-B…




(1/5)🚨LLMs can now self-improve to generate better citations✅ 📝We design automatic rewards to assess citation quality 🤖Enable BoN/SimPO w/o external supervision 📈Perform close to “Claude Citations” API w/ only 8B model 📄arxiv.org/abs/2502.09604 🧑💻github.com/voidism/SelfCi…
Presenting 2 works at #ICLR tomorrow! 📃Generative Pre-training for Speech with Flow Matching 📍5/9 (Wed) Hall B #68, 10:45am-12:45pm 📃Listen, Think, and Understand 📍5/9 (Wed) Hall B #60, 4:30pm-6:30pm Please stop by if you're interested! More details...👇
(1/4) 💡Natural language embedded program (NLEP) is all you need for symbolic AND natural language tasks. 🚀NLEP outperforms ChatGPT-4, CoT, & PoT/PAL, without any task-specific example. 🎢NLEP makes small LMs outperform GPT-3 without fine-tuning! arxiv.org/pdf/2309.10814…
(1/5)🚨Can LLMs be more factual without retrieval or finetuning?🤔 -yes✅ 🦙We find factual knowledge often lies in higher layers of LLaMA 💪Contrast high/low layers can amplify factuality & boost TruthfulQA by 12-17% 📝arxiv.org/abs/2309.03883 🧑💻github.com/voidism/DoLa #NLProc
Excited to present our research at #acl2023 ! We found that self-trained entailment models with 350M parameters can outperform strong few-shot large language models with more than 100B parameters on several language understanding tasks. (1/4) news.mit.edu/2023/language-…
(1/2) Introduce the IS23 Whisper-AT paper. We usually believe noise-robust ASR models' representations are noise-invirant. But we show a surprising finding that while Whisper is very robust against real-world background sounds, its representation is actually NOT noise-invariant.
Can language models help us do better search?🤔 🎉In #ACL2023 findings, we present EAR pipeline: 🎲sample multiple queries from LM 🎯rescore to select best query 🔍BM25 search 💪boost OpenQA accuracy to beat DPR/GAR arxiv: arxiv.org/abs/2305.17080 code: github.com/voidism/EAR
🚨We release SAIL-7B⛵️️a search-augmented instruction-tuned LM with: 🦆Real-time connecting to DuckDuckGo 🔍Explicitly filter out distracting search results 👨🏫Instruction following Outperforms ChatGPT and Vicuna!🦙 demo: openlsr.org/sail-7b arxiv: arxiv.org/abs/2305.15225
🗣️ Whisper is great for speech recognition, but it only recognizes ~100 languages. What if it wasn't trained on the language that you speak? Happy to introduce my #INTERSPEECH2023 paper comparing Whisper and XLS-R for adaption to unseen languages! arxiv.org/abs/2305.12606
📢New Paper Alert!!🚀 arxiv.org/abs/2304.03728 Does ChatGPT have the ability to check facts by itself?🤔 We designed a simple, few-shot, unified chain-of-thought prompting pipeline that can do: 🔹Fact-checking ✅ 🔹Stereotype detection 🚫 🔹Hate speech detection 🙅 (1/2)
Exciting news! 2 open positions for #PhD students to join our team and work on cutting-edge #research in #deeplearning, #conversationalAI, #speech tech, & sequence processing. Check out more details here and apply as soon as possible if interested: tinyurl.com/yc4xrdke
Workshop on Self-supervised Learning for Audio and Speech Processing @ AAAI 2022 starts at 8:50 a.m., EST (9:50 p.m. GMT+8), February 28. If you want to hear about exciting new advances in self-supervised learning, don't miss it. aaai-sas-2022.github.io
Excited to announce our paper was accepted to @ieeeICASSP 2022!
Our DistilHuBERT model is released! Thanks to @PatrickPlaten @leo19941227 @HungyiLee2 @ntu_spml! paper: arxiv.org/abs/2110.01900 pre-training and inference code: github.com/s3prl/s3prl
Self-Supervised Learning for Speech and Audio Processing Workshop @ AAAI 2022 ===== Website: aaai-sas-2022.github.io Submission Deadline: November 15th, 2021 (Anywhere on Earth) -> Less than 24 hours! Submission website: cmt3.research.microsoft.com/SAS2022 Contact: [email protected]
Our DistilHuBERT model is released! Thanks to @PatrickPlaten @leo19941227 @HungyiLee2 @ntu_spml! paper: arxiv.org/abs/2110.01900 pre-training and inference code: github.com/s3prl/s3prl
Our DistilHuBERT model is released! Thanks to @PatrickPlaten @leo19941227 @HungyiLee2 @ntu_spml! paper: arxiv.org/abs/2110.01900 pre-training and inference code: github.com/s3prl/s3prl
SUPERB Challenge is ready: superbbenchmark.org/challenge You can also submit your results to The 2nd Self-supervised Learning for Audio and Speech Processing at AAAI: aaai-sas-2022.github.io
Transformers can read and write, but how well can they listen and speak 🗣️? Find out by pitting your models against the SUPERB Challenge 📊! SUPERB tests pretrained models on a wide range of speech processing tasks & datasets Submit here 👉: superbbenchmark.org