Lucas Bandarkar
@LucasBandarkar
PhD student @uclaNLP — ML / #NLProc / multilingual @AIatMeta
Cross-lingual transfer can be as easy as swapping model layers between LLMs! 🔀 Our model merging method can compose math and language skills by swapping top&bottom layers from a SFT’d target language expert into a math expert without retraining arxiv.org/pdf/2410.01335 🧵: [1/3]
![LucasBandarkar's tweet image. Cross-lingual transfer can be as easy as swapping model layers between LLMs! 🔀
Our model merging method can compose math and language skills by swapping top&bottom layers from a SFT’d target language expert into a math expert without retraining arxiv.org/pdf/2410.01335 🧵: [1/3]](https://pbs.twimg.com/media/GZD6Oo5XwAACCAA.jpg)
I’ll be at #ICLR2025 this week to present this Spotlight ✨ paper on post-hoc modularization-then-merging that enables a surprising amount of cross-lingual transfer Super excited 😊
Cross-lingual transfer can be as easy as swapping model layers between LLMs! 🔀 Our model merging method can compose math and language skills by swapping top&bottom layers from a SFT’d target language expert into a math expert without retraining arxiv.org/pdf/2410.01335 🧵: [1/3]
This is truly awesome, they use recurrent blocks (similar to diffusion models) to have an LLM that can think "longer" if extra reasoning required. concept is totally parallel to speculative decoding / early exiting
Ok, so I can finally talk about this! We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale. The model has an internal latent space in which it can adaptively spend more compute to think longer. I think the tech report ...🐦⬛
Paper #3: Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models What can we do in model merging when we want to transfer task performance from one language to another? @LucasBandarkar got y'all covered! Link: arxiv.org/abs/2410.01335
this paper reveals a whole number of heuristic-style errors in dense retrievers (i.e. for RAG) accepted at ACL, congrats @mohsen_fayyaz
new paper! 🌱 Collapse of Dense Retrievers We uncover major vulnerabilities in dense retrievers like Contriever, showing they favor: 📌 Shorter docs 📌 Early positions 📌 Repeated entities 📌 Literal matches ...all while ignoring the answer's presence! huggingface.co/datasets/mohse…
🚨 New Blog Drop! 🚀 "Reflection on Knowledge Editing: Charting the Next Steps" is live! 💡 Ever wondered why knowledge editing in LLMs still feels more like a lab experiment than a real-world solution? In this post, we dive deep into where the research is thriving — and where…
Next, we studied the effect of the question language and found that generally, performance is higher when asked in the 'native' language. In the plot, *mother tongue effect* = performance when question is asked in the language to which it is relevant - performance in English
This is seriously cool — HQ dataset that can open up all sorts of studies on the cross-lingual {local} knowledge transfer in LLMs
So happy our new multilingual benchmark MultiLoKo is finally out (after some sweat and tears!) arxiv.org/abs/2504.10356 Multilingual eval for LLMs... could be better, and I hope MultiLoKo will help fill some gaps in it + help study design choices in benchmark design @metaai
🚨 New paper 🚨 Excited to share my first paper w/ my PhD students!! We find that advanced LLM capabilities conferred by instruction or alignment tuning (e.g., SFT, RLHF, DPO, GRPO) can be encoded into model diff vectors (à la task vectors) and transferred across model…
🚨Selecting the best prompting strategy for LLMs is challenging, and ensembling is inefficient. We introduce DyPlan 🧠, a dynamic framework that teaches LLMS to use internal knowledge to pick the best strategy. It cuts token/retrieval costs by 7-13% and boosts F1 by 11-32%. (1/N)
“That’s one small [MASK] for [MASK], a giant [MASK] for mankind.” – [MASK] Armstrong Can autoregressive models predict the next [MASK]? It turns out yes, and quite easily… Introducing MARIA (Masked and Autoregressive Infilling Architecture) arxiv.org/abs/2502.06901
Open LLM evals often face data contamination and bias concerns. Private curators🚪(@scale_AI) address this with curated data and experts evaluations👲 We argue that this shift poses new risks including financial incentives 💸 and eval bias☠️!! w/ @pratyushmaini
Belebele extended to speech for 74 (!!) languages (this project also extended Fleurs to more languages)
We introduce the first highly multilingual speech and American Sign Language (ASL) comprehension dataset by extending BELEBELE. arxiv.org/abs/2412.08274 Freely available in Github github.com/facebookresear…
📣Happy to (pre-)release my Fleurs-SLU benchmark to evaluate massively multilingual spoken language understanding on SIB & Belebele. Work done at @Mila_Quebec with @davlanade @gg42554 @licwu Datasets: huggingface.co/datasets/WueNL… huggingface.co/datasets/WueNL… Details to follow👇
We also translate MMLU to build an extensive evaluation set in 42 languages. We further engage with professional and community annotators to improve quality of MMLU translations – we introduce this as Global-MMLU🌍
This dataset subsamples MMLU to limit questions that are too Western-centric and they then tanslate to 42 languages. Wow @CohereForAI with two big multilingual benchmarks released this week. Great to know I will no longer have to rely on machine-translated MMLU
Today, we’re excited to share Global-MMLU 🌍: a multilingual LLM benchmark covering MMLU translations in 42 languages -- combined with improved quality through human curation and extensive metadata on what questions are culturally sensitive 🗽