Fabian David Schmidt
@fdschmidt
PhD candidate at Uni of Würzburg working on multilinguality & multimodality | prev. visited Mila & LTL@UniCambridge
Introducing NLLB-LLM2Vec! 🚀 We fuse the NLLB encoder & Llama 3 8B trained w/ LLM2Vec to create NLLB-LLM2Vec which supports cross-lingual NLU in 200+ languages🔥 Joint work w/ Philipp Borchert, @licwu, and @gg42554 during my great research stay at @cambridgeltl

If you want to help us improve language and cultural coverage, and build an open source LangID system, please register to our shared task! 💬 Registering is easy! All the details are on the shared task webpage: wmdqs.org/shared-task/ Deadline: July 23, 2025 (AoE) ⏰
commoncrawl.org/blog/wmdqs-sha…
📢 Due to popular request, we extended the paper submission deadline to Sunday, July 20th, 23:59 AoE! 3 more days to polish your submissions! 🧹
Only 10 days left to submit your work to our @NewsRecWorkshop! 🚀 ▶️ More details: research.idi.ntnu.no/NewsTech/INRA/… 📆 Submission deadline: July 17th, 2025 AoE 📍 Event co-located with #RecSys2025 @ACMRecSys in Prague on September 26th (tentative)!
The call for papers is out for the 5th edition of the Workshop on Multilingual Representation Learning which will take place in Suzhou, China co-located with EMNLP 2025! See details below!
📢 Introducing Walk&Retrieve, a simple yet effective zero-shot #RAG framework based on #knowledgegraph walks! Arxiv : arxiv.org/abs/2505.16849 GitHub: github.com/MartinBoecklin… Joint work w/ Martin Böckling @heikopaulheim @dwsunima @ir_rag_sigir #SIGIR2025 Details 👇
🏆 Our paper has received the Outstanding Paper Award at @naaclmeeting! 🎉 Many thanks to my co-authors @kelina1124 and @anne_lauscher! We introduce Multi3Hate, a novel multimodal and multilingual parallel hate speech dataset annotated by a multicultural set of annotators.
📢 Call for Papers is out!📢 Working on #news #recsys & their societal, legal, and ethical dimensions? 👉Submit to the 13th @NewsRecWorkshop, co-located w/ @ACMRecSys in Prague! 📅 Paper deadline: ** July 17th, 2025 ** More info: research.idi.ntnu.no/NewsTech/INRA/… #INRA2025 #RecSys2025
On my way to #NAACL2025 ✈️ I'll present the paper on Friday (May 2) 9-10:30am at poster session 7. Happy to chat about any aspect of multilingualism and culture! I'm also open to postdoc and visiting positions in the US. Definitely reach out if you have any opportunities.
1/7 🚨non-LLM paper alert!🚨 Human's perception of the sentence is quite robust against interchanging words with similar meanings, not even mentioning the semantically equivalent words across different languages. How about the language models? In our recent work, we measure the…
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories We are releasing the first benchmark to evaluate how well automatic evaluators, such as LLM judges, can evaluate web agent trajectories. We find that rule-based evals underreport success rates, and…
Excited to invite you to submit and join our CVPR 2025 workshop on geo-diverse and culturally aware vision-language models! 🌍✨ Let’s push the boundaries of AI together. #CVPR2025
📢Excited to announce our upcoming workshop - Vision Language Models For All: Building Geo-Diverse and Culturally Aware Vision-Language Models (VLMs-4-All) @CVPR 2025! 🌐 sites.google.com/view/vlms4all
Agents like OpenAI Operator can solve complex computer tasks, but what happens when users use them to cause harm, e.g. automate hate speech and spread misinformation? To find out, we introduce SafeArena (safearena.github.io), a benchmark to assess the capabilities of web…
The 6th AfricaNLP Workshop will be co-located with ACL 2025 in Vienna, Austria, on the theme: "Multilingual and Multicultural-aware LLMs" sites.google.com/view/africanlp… Submission deadline: March 7, 2025. We accept both archival and non-archival papers. #AfricaNLP @MasakhaneNLP
Presenting ✨ 𝐂𝐇𝐀𝐒𝐄: 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐢𝐧𝐠 𝐬𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜 𝐝𝐚𝐭𝐚 𝐟𝐨𝐫 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 ✨ Work w/ fantastic advisors @DBahdanau and @sivareddyg Thread 🧵:
Excited to present today a poster at @OECD in Paris @IASEAIorg based on our upcoming paper "Societal Alignment Frameworks Can Improve LLM Alignment" (stay tuned for the pre-print soon! 🎊). Today (Fri) at 1pm CET. Conference livestream: iaseai.org/conference
⚠️Struggling with multilingual news recommendation? We introduce NaSE, a news-adapted sentence encoder!🙌 ✅No costly fine-tuning needed ✅Perfect for cold-start & few-shot scenarios #ecir2025 📰: arxiv.org/abs/2406.12634 Try it out @huggingface🤗: huggingface.co/aiana94/NaSE 👇
Want to train a *multilingual* LVLM but not sure how? Or looking for a strong model to use? Presenting "Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model"! Arxiv: arxiv.org/abs/2501.05122 HF Collection: huggingface.co/collections/Wu…