Atul Kr. Ojha
@shashwatup9k
Insight Research Ireland Centre for Data Analytics, DSI, University of Galway UniDive-COST Action (https://unidive.lisn.ups) ENEOLI-COST Action
Enhanced Zero-Shot Machine Translation via Fixed Prefix Pair Bootstrapping by Van-Hien Tran (presenter) and Masao Utiyama @NICT_Publicity Proceedings of LoResMT 2025 Workshop aclanthology.org/volumes/2025.l…
📢 First release: 38 monolingual reference LLMs (2.15B params) via @hplt_eu + #OpenEuroLLM ⚙️Trained on 100B tokens from HPLT v2 dataset 🌍 Cover EU langs + others ⚙️ Based on LLaMA, trained on #LUMI 📈 Useful for evaluation Downloads + more info at openeurollm.eu/blog/hplt-oell…
Great use of HPLT v2 datasets! Eager to hear more about #HPLT? Join us at @aclmeeting: - BoF "Multilingualism: from data crawling to evaluation" on July 29, 16:00 - Poster "An Expanded Massive Multilingual Dataset for High-Performance Language Technologies" on July 30, 11:00
📢 First release: 38 monolingual reference LLMs (2.15B params) via @hplt_eu + #OpenEuroLLM ⚙️Trained on 100B tokens from HPLT v2 dataset 🌍 Cover EU langs + others ⚙️ Based on LLaMA, trained on #LUMI 📈 Useful for evaluation Downloads + more info at openeurollm.eu/blog/hplt-oell…
📢Job Opportunity Research Associate for Reasoning in LLMs, University of Bath, UK (Deadline 05 August 2025) We are looking to hire a highly motivated researcher to work on analysing reasoning in LLMs For more information, see: harishtayyarmadabushi.com/now-hiring-res…
📢#LREC2026 First Call for Papers is out @ lrec2026.info/calls/ Important dates: Main Conference papers submission: Oct 17, 2025 Notification of acceptance: Feb 13, 2026 Camera Ready due: Mar 6, 2026 Workshop & Tutorial proposals submission: Oct 17, 2025 #nlproc
TokShop @ #ICML2025 got way more submissions than expected! 📈 We could really use a few more reviewers to help out. If you have the capacity to review a #tokenization paper by Saturday, please fill out this form: forms.gle/32A6sQHQrMSb6h… 🙏
What? You have a dataset made of linguistic recordings, and you want to see if the transcription matches the audio? Well, we have a tool for that that you can use now. If you want to know how it works, see you in Vienna in August for #ACL2025 github.com/eleferrand/dat…
@ELRANEws & the #LREC2026 Organizers are happy to announce the 15th edition of the Language Resources & Evaluation Conference (hybrid). It will be held in Palau de Congressos, Palma de Mallorca 🇪🇸, on 11-16 May 2026. Main Conf: 13-15 May Workshops/Tutorials: 11-12-16 May #NLProc
Universal Dependencies v2.16 is out! 23 new treebanks, 11 new languages, including Esperanto, Khoekhoe, or Tundra Nenets. universaldependencies.org
21st @multiword is going on in the Santa Ana room @naaclmeeting. @complingy is talking on "Meaning Construction at the Syntax-Lexis Nexus"
Slides: docs.google.com/presentation/d… Official thread: x.com/chstoneliu/sta…
Arturo @a11byte giving keynote at LoResMT 2025. "Low-resource MT: A perspective from the Americas" Exploring the challenges and opportunities of MT for Indigenous languages in the Americas through lessons from organizing shared tasks at AmericasNLP. @EdinburghNLP @AmericasNLP
Issac giving keynote at LoResMT 2025. "Low-Resource NLP: hot takes and anecdotes from Google Translate" scholar.google.com/citations?user…
Low-Resource NLP: hot takes and anecdotes from Google Translate by Isaac Caswell @Google Inc. Slides available at drive.google.com/file/d/1ScPJSX…
Comparative Evaluation of Machine Translation Models Using Human-Translated Social Media Posts as References: Human-Translated Datasets by Shareefa Ahmed Al Amer (presenter), Mark G. Lee, Phillip Smith @unibirmingham
Beyond English: The Impact of Prompt Translation Strategies across Languages and Tasks in Multilingual LLMs by Itai Mondshine (presenter), Tzuf Paz-Argaman, Reut Tsarfaty aclanthology.org/people/i/itai-…
Building Data Infrastructure for Low-Resource Languages by Sarah K. K. Luger (presenter), Rafael Mosquera, Pedro Ortiz Suarez @MLCommons
From Text to Multi-Modal: Advancing Low-Resource-Language Translation through Synthetic Data Generation and Cross-Modal Alignments by Bushi Xiao, Qian Shen, Daisy Zhe Wang @UF
Limitations of Religious Data and the Importance of the Target Domain: Towards Machine Translation for Guinea-Bissau Creole Jacqueline Rowe (presenter), Edward Gow-Smith and Mark Hepple @EdinburghUni @sheffielduni arxiv.org/abs/2504.02674
Jamo-Level Subword Tokenization in Low-Resource Korean Machine Translation Junyoung Lee (presenter), Marco Cognetta, Sangwhan Moon and Naoaki Okazaki @NTUsg @sciencetokyo_en aclanthology.org/2025.loresmt-1…
Wenzhou Dialect Speech to Mandarin Text Conversion by Zhipeng Gao (presenter), Akihiro Tamura and Tsuneo Kato @DoshishaUniv_PR