Shikhar

@ShikharSSU

Turning noise into…slightly better noise. https://github.com/Shikhar-S

Joined April 2020

1KFollowing

297Followers

Pinned

Shikhar@ShikharSSU · Jul 22

Meows, music, murmurs and more! We train a general purpose audio encoder and open source the code, checkpoints and evaluation toolkit.

aarXiv Sound@ArxivSound · Jul 21

Shikhar Bharadwaj, Samuele Cornell, Kwanghee Choi, Satoru Fukayama, Hye-jin Shim, Soham Deshmukh, Shinji Watanabe, "OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder," arxiv.org/abs/2507.14129

4.0K

Pinned

Shikhar@ShikharSSU · Jul 19

/1 Some career turning points don't look dramatic. A visa approved, a face-to-face chat, someone saying, “You should submit that,” or the chance to attend Indaba as an African student. Avoiding the “I came from nothing” story; just 25 early-career researchers doing solid...

RRosanne Liu@savvyRL · Jul 10

The opportunity gap in AI is more striking than ever. We talk way too much about those receiving $100M or whatever for their jobs, but not enough those asking for <$1k to present their work. For 3rd year in a row, @ml_collective is raising funds to support @DeepIndaba attendees.

5.0K

Shikhar Retweeted

Audio and Speech Processing Papers@AudioAndSpeech · Jul 25

Recent Trends in Distant Conversational Speech Recognition: A Review of CHiME-7 and 8 DASR Challenges. arxiv.org/abs/2507.18161

335

Shikhar@ShikharSSU · Jul 22

Excited to be at @IC2S2 #ic2s22025. I will be presenting this work at the plenary lightning talks (after keynotes) on Thursday, and in the poster session afterwards. Looking forward to making new friends :D If you are interested in culture and evaluation, let's chat!!!

SShaily ✈️ IC2S2 / ACL@shaily99 · Jun 9

🖋️ Curious how writing differs across (research) cultures? 🚩 Tired of “cultural” evals that don't consult people? We engaged with researchers to identify & measure ✨cultural norms✨in scientific writing, and show that❗LLMs flatten them❗ 📜 arxiv.org/abs/2506.00784 1/11

1.0K

Shikhar Retweeted

Audio and Speech Processing Papers@AudioAndSpeech · Jul 21

OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder. arxiv.org/abs/2507.14129

918

Shikhar Retweeted

main@main_horse · Jul 18

repo updated github.com/main-horse/hnet please DM me if you are concurrently working on h-net stuff! doing it alone is a bit painful

4.0K

Shikhar Retweeted

Partha Talukdar (✈️ ACL 25)@partha_p_t · Jul 19

It's extremely gratifying to see so many contributors to Gemini 2.5 from @GoogleDeepMind India! Sip filter coffee (w/ plant-based milk, of course) as you pave the path to AGI, can't think of a better deal 🙂 (Btw, we are growing, app link coming soon!) arxiv.org/abs/2507.06261

397

115

27.0K

Shikhar@ShikharSSU · Jul 15

Not advertised yet, but we figured out how to do this too. And we release how exactly you can do it 👀. With the right training techniques, you can inject audio understanding and generation into an LLM with almost no loss in text perf. Details at arxiv.org/abs/2506.17611

VVaibhav (VB) Srivastav@reach_vb · Jul 15

the best part about the mistral release is that the models don't loose as much on text - this has been a biggest pain point for a audioLMs for a long while

2.0K

Shikhar Retweeted

AK@_akhaliq · Jul 14

Gemini 2.5 Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

143

18.0K

Shikhar@ShikharSSU · Jul 13

Google’s Gemini 2.5 paper has 3295 authors arxiv.org/abs/2507.06261

hhardmaru@hardmaru · Dec 21, 2023

Google’s Gemini paper has ~1000 authors arxiv.org/abs/2312.11805

453

3.0K

822

1.2M

Shikhar Retweeted

SIGKITTEN@SIGKITTEN · Jun 25

First ever (i think?) cli coding agents battle royale! 6 contestants: claude-code anon-kode codex opencode ampcode gemini They all get the same instructions: Find and kill the other processes, last one standing wins! 3... 2... 1...

169

705

6.0K

3.0K

880.0K

Shikhar@ShikharSSU · Jun 25

Tired of endless LLM slop? This work by @Harman26Singh tackles reward hacking to make reward models robust to spurious cues like formatting and length, give it a read.

HHarman Singh@Harman26Singh · Jun 25

🚨 New @GoogleDeepMind paper 𝐑𝐨𝐛𝐮𝐬𝐭 𝐑𝐞𝐰𝐚𝐫𝐝 𝐌𝐨𝐝𝐞𝐥𝐢𝐧𝐠 𝐯𝐢𝐚 𝐂𝐚𝐮𝐬𝐚𝐥 𝐑𝐮𝐛𝐫𝐢𝐜𝐬 📑 👉 arxiv.org/abs/2506.16507 We tackle reward hacking—when RMs latch onto spurious cues (e.g. length, style) instead of true quality. #RLAIF #CausalInference 🧵⬇️

586

Shikhar@ShikharSSU · Jun 25

🙌✨ You asked, you've got it: A free and open-source Gemini agent, run via the command line. And to ensure you rarely, if ever, hit a limit during this preview, we offer the industry’s largest allowance: *60 model requests per minute and 1,000 requests per day at no charge.*

PPhilipp Schmid@_philschmid · Jun 25

Gemini CLI is here! Our most powerful open-source CLI that brings Google's Gemini 2.5 models directly into your terminal! With unique features like hierarchical memory (context), self-correcting file edits, and secure sandboxed tool execution. 💡 Hierarchical Memory and…

588

189

91.0K

Shikhar Retweeted

Siddhant Arora@Sid_Arora_18 · Jun 16

New #INTERSPEECH2025, we propose a Chain-of-Thought post-training method to build spoken dialogue systems—generating intelligent responses with good audio quality while preserving speaking styles with just 300h of public conversational data! (1/5) 📜: arxiv.org/abs/2506.00722

2.0K

Shikhar Retweeted

Shaily ✈️ IC2S2 / ACL@shaily99 · Jun 9

8.0K

Shikhar Retweeted

Masao@mmiagshatoy · May 31

🚀 Happy to share our #INTERSPEECH2025 paper: Using speaker & acoustic context, we dynamically　adjust model paths, resulting in a 25.7% relative BLEU improvement in speech translation. We also analyze how context influences model behavior. 📜 Paper: arxiv.org/abs/2505.18860

2.0K

Shikhar Retweeted

arXiv Sound@ArxivSound · May 13

``OmniAudio: Generating Spatial Audio from 360-Degree Video,'' Huadai Liu, Tianyi Luo, Qikai Jiang, Kaicheng Luo, Peiwen Sun, Jialei Wan, Rongjie Huang, Qian Chen, Wen Wang, Xiangtai Li, Shiliang Zhang, Zhijie Yan, Zhou Zhao, Wei Xue, ift.tt/en7Uacy

1.0K

Shikhar Retweeted

arXiv Sound@ArxivSound · May 13

``Is MixIT Really Unsuitable for Correlated Sources? Exploring MixIT for Unsupervised Pre-training in Music Source Separation,'' Kohei Saijo, Yoshiaki Bando, ift.tt/PY82nAk

2.0K

Shikhar Retweeted

arXiv Sound@ArxivSound · May 13

``Spoken Language Understanding on Unseen Tasks With In-Context Learning,'' Neeraj Agrawal, Sriram Ganapathy, ift.tt/NnPCckK

823

Shikhar Retweeted

arXiv Sound@ArxivSound · May 13

``Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge,'' Chao-Han Huck Yang, Sreyan Ghosh, Qing Wang, Jaeyeon Kim, Hengyi Hong, Sonal Kumar, Guirui Zhong, Zhifeng Kong, S Sakshi, Vaibhavi Lokegaonkar, Oriol… ift.tt/lpjeWzP

636