Shikhar
@ShikharSSU
Turning noise into…slightly better noise. https://github.com/Shikhar-S
Meows, music, murmurs and more! We train a general purpose audio encoder and open source the code, checkpoints and evaluation toolkit.
Shikhar Bharadwaj, Samuele Cornell, Kwanghee Choi, Satoru Fukayama, Hye-jin Shim, Soham Deshmukh, Shinji Watanabe, "OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder," arxiv.org/abs/2507.14129
/1 Some career turning points don't look dramatic. A visa approved, a face-to-face chat, someone saying, “You should submit that,” or the chance to attend Indaba as an African student. Avoiding the “I came from nothing” story; just 25 early-career researchers doing solid...
The opportunity gap in AI is more striking than ever. We talk way too much about those receiving $100M or whatever for their jobs, but not enough those asking for <$1k to present their work. For 3rd year in a row, @ml_collective is raising funds to support @DeepIndaba attendees.
Recent Trends in Distant Conversational Speech Recognition: A Review of CHiME-7 and 8 DASR Challenges. arxiv.org/abs/2507.18161
Excited to be at @IC2S2 #ic2s22025. I will be presenting this work at the plenary lightning talks (after keynotes) on Thursday, and in the poster session afterwards. Looking forward to making new friends :D If you are interested in culture and evaluation, let's chat!!!
🖋️ Curious how writing differs across (research) cultures? 🚩 Tired of “cultural” evals that don't consult people? We engaged with researchers to identify & measure ✨cultural norms✨in scientific writing, and show that❗LLMs flatten them❗ 📜 arxiv.org/abs/2506.00784 1/11
OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder. arxiv.org/abs/2507.14129
repo updated github.com/main-horse/hnet please DM me if you are concurrently working on h-net stuff! doing it alone is a bit painful
It's extremely gratifying to see so many contributors to Gemini 2.5 from @GoogleDeepMind India! Sip filter coffee (w/ plant-based milk, of course) as you pave the path to AGI, can't think of a better deal 🙂 (Btw, we are growing, app link coming soon!) arxiv.org/abs/2507.06261
Not advertised yet, but we figured out how to do this too. And we release how exactly you can do it 👀. With the right training techniques, you can inject audio understanding and generation into an LLM with almost no loss in text perf. Details at arxiv.org/abs/2506.17611
the best part about the mistral release is that the models don't loose as much on text - this has been a biggest pain point for a audioLMs for a long while
Gemini 2.5 Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
First ever (i think?) cli coding agents battle royale! 6 contestants: claude-code anon-kode codex opencode ampcode gemini They all get the same instructions: Find and kill the other processes, last one standing wins! 3... 2... 1...
Tired of endless LLM slop? This work by @Harman26Singh tackles reward hacking to make reward models robust to spurious cues like formatting and length, give it a read.
🚨 New @GoogleDeepMind paper 𝐑𝐨𝐛𝐮𝐬𝐭 𝐑𝐞𝐰𝐚𝐫𝐝 𝐌𝐨𝐝𝐞𝐥𝐢𝐧𝐠 𝐯𝐢𝐚 𝐂𝐚𝐮𝐬𝐚𝐥 𝐑𝐮𝐛𝐫𝐢𝐜𝐬 📑 👉 arxiv.org/abs/2506.16507 We tackle reward hacking—when RMs latch onto spurious cues (e.g. length, style) instead of true quality. #RLAIF #CausalInference 🧵⬇️
🙌✨ You asked, you've got it: A free and open-source Gemini agent, run via the command line. And to ensure you rarely, if ever, hit a limit during this preview, we offer the industry’s largest allowance: *60 model requests per minute and 1,000 requests per day at no charge.*
Gemini CLI is here! Our most powerful open-source CLI that brings Google's Gemini 2.5 models directly into your terminal! With unique features like hierarchical memory (context), self-correcting file edits, and secure sandboxed tool execution. 💡 Hierarchical Memory and…
New #INTERSPEECH2025, we propose a Chain-of-Thought post-training method to build spoken dialogue systems—generating intelligent responses with good audio quality while preserving speaking styles with just 300h of public conversational data! (1/5) 📜: arxiv.org/abs/2506.00722
🖋️ Curious how writing differs across (research) cultures? 🚩 Tired of “cultural” evals that don't consult people? We engaged with researchers to identify & measure ✨cultural norms✨in scientific writing, and show that❗LLMs flatten them❗ 📜 arxiv.org/abs/2506.00784 1/11
🚀 Happy to share our #INTERSPEECH2025 paper: Using speaker & acoustic context, we dynamically adjust model paths, resulting in a 25.7% relative BLEU improvement in speech translation. We also analyze how context influences model behavior. 📜 Paper: arxiv.org/abs/2505.18860
``OmniAudio: Generating Spatial Audio from 360-Degree Video,'' Huadai Liu, Tianyi Luo, Qikai Jiang, Kaicheng Luo, Peiwen Sun, Jialei Wan, Rongjie Huang, Qian Chen, Wen Wang, Xiangtai Li, Shiliang Zhang, Zhijie Yan, Zhou Zhao, Wei Xue, ift.tt/en7Uacy
``Is MixIT Really Unsuitable for Correlated Sources? Exploring MixIT for Unsupervised Pre-training in Music Source Separation,'' Kohei Saijo, Yoshiaki Bando, ift.tt/PY82nAk
``Spoken Language Understanding on Unseen Tasks With In-Context Learning,'' Neeraj Agrawal, Sriram Ganapathy, ift.tt/NnPCckK
``Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge,'' Chao-Han Huck Yang, Sreyan Ghosh, Qing Wang, Jaeyeon Kim, Hengyi Hong, Sonal Kumar, Guirui Zhong, Zhifeng Kong, S Sakshi, Vaibhavi Lokegaonkar, Oriol… ift.tt/lpjeWzP