AI4Bharat
@ai4bharat
The focus of AI4Bhārat, an initiative of IIT-Madras, is on building open-source language AI for Indian languages, including datasets, models, and applications.
We are pleased to announce the launch of the Nilekani Center at AI4Bharat, IIT Madras on 28th July. The Center's mission is to innovate on open-source Indian language technology with the intention to create societal impact.

🚨 We're Hiring! Join us at AI4Bharat 🚨 📍 Role: Research Analyst 🕒 Tenure: 3 months (extendable based on performance) 💻 Work Type: Remote Are you passionate about India’s diverse cultures, languages, and heritage? Do you love to explore, research, and uncover the…

Language carries identity, culture, and opportunity. When AI4Bharat first began, it was clear that Indian languages needed much more attention in the AI world than they were getting. Unlike English, our languages didn’t have ready datasets or models that new technologies could…

Heading to ICASSP 2025 in Hyderabad? Join the Sarvam Mixer - an evening with the ML/AI/Speech community! We are excited to host a meet & greet for ML/AI/DL graduates, MS / PhDs, and early-mid career AI experts on 10th April, 6:30pm - 9:30pm in Hyderabad. At the Sarvam Mixer,…
IndicTrans3 and IndicSeamless have arrived! IndicTrans3 Following in the steps of IndicTrans2, we have released a beta version of IndicTrans3 (yes a better one is coming soon). It works at the sentence as well as the document level. It's lightweight, and of fairly high…
🚀 AI4Bharat: Advancing Indian Language AI - Open & Scalable! 🇮🇳✨ Over the past 4 years, we at AI4Bharat have been on a mission to accelerate Indian language AI 🚀 —building large-scale datasets, models, and tools and releasing everything open-source for the community. Now, all…
Check out our latest work - IndicSeamless!
📢 Presenting IndicSeamless: A Speech Translation Model for Indian Languages 🎙️🌍 IndicSeamless is a speech translation model fine-tuned from SeamlessM4Tv2-large on 13 Indian languages. Trained on a curated subset of BhasaAnuvaad, the largest open-source Speech Translation…
📢 Presenting IndicSeamless: A Speech Translation Model for Indian Languages 🎙️🌍 IndicSeamless is a speech translation model fine-tuned from SeamlessM4Tv2-large on 13 Indian languages. Trained on a curated subset of BhasaAnuvaad, the largest open-source Speech Translation…
@MiteshKhapra @anoopk and @pratykumar have done a great job for AI for India via @ai4bharat. It's amazing what having a strong vision can do! Open-source all the way!
Nandan Nilekani gives a shoutout for Professor Mitesh Khapra of @iitmadras "It's amazing what Prof Khapra has done with @ai4bharat " BTW, Prof Khapra was one of the speakers at the Moneycontrol Global AI Conclave in December:) @NandanNilekani @nalinmehta
Nandan Nilekani gives a shoutout for Professor Mitesh Khapra of @iitmadras "It's amazing what Prof Khapra has done with @ai4bharat " BTW, Prof Khapra was one of the speakers at the Moneycontrol Global AI Conclave in December:) @NandanNilekani @nalinmehta
🚨🚨🚨 New Paper - RomanLens: The Role Of Latent Romanization In Multilinguality In LLMs In the past we showed that Romanization helps improve generation in non-Roman script languages. But why? This paper attempts to find an interpretable answer! Link: arxiv.org/abs/2502.07424
Happening now! Our tutorial on Low-resource scenarios begins soon. If you are @ COLING 2025, please do attend #nlproc #coling2025 #NLU #NLG