Joël Niklaus
@joelniklaus
Research @harvey__ai & Lecturer @bfh_wirtschaft, previously: @Google @Theteamatx @StanfordCRFM @thomsonreuters @unibern
Very happy to share that our paper “MultiLegalPile: A 689GB Multilingual Legal Corpus” received an Outstanding Paper Award at #ACL2024! A great way to finish the final conference of my PhD! Big thanks to @MatoshiVeton, @maemst, @KiddoThe2B, and Daniel E. Ho for the collab!

🚀 Huge thanks to the community! 📈 LEXam is now the #1 trending evaluation dataset on Hugging Face! Check it out: huggingface.co/datasets?other… 🧠 Built for deep legal reasoning across 340 law exams 👩⚖️ Expert LLM judge evaluations, long-form + MCQs #LegalNLP #Benchmark #LLM
🚨Time to push LLMs further! 📚LEXam: The legal reasoning benchmark you’ve been waiting for: • 340 exams, 116 courses (EN/DE) • Long-form, process-focused questions • Expert-level LLM Judges • Rich meta for targeted diagnostics • Contamination-proof, extendable MCQs [1/6]🧵
1/5 How good are AI Model’s Reasoning Abilities? We have created LEXam, a legal reasoning benchmark derived from real law exams available in English and German. @eth_cle @ellliottt @YoanHermstruwer @joelniklaus @OpenAI @GoogleAI @deepseek_ai @AnthropicAI @grok @AIatMeta
Check out LEXam, our new Legal Reasoning benchmark! Thanks for the great collaboration @NJingwei, Jakob, Etienne, Yang, Yoan, @YinyaHuang, @akhtarmubashara, Florian, Oliver, Daniel,@LeippoldMarkus, @mrinmayasachan, @Stremitzer_Lab, Christoph Engel, @ellliottt, and @joelniklaus!
🚨Time to push LLMs further! 📚LEXam: The legal reasoning benchmark you’ve been waiting for: • 340 exams, 116 courses (EN/DE) • Long-form, process-focused questions • Expert-level LLM Judges • Rich meta for targeted diagnostics • Contamination-proof, extendable MCQs [1/6]🧵
🚨Time to push LLMs further! 📚LEXam: The legal reasoning benchmark you’ve been waiting for: • 340 exams, 116 courses (EN/DE) • Long-form, process-focused questions • Expert-level LLM Judges • Rich meta for targeted diagnostics • Contamination-proof, extendable MCQs [1/6]🧵
Super excited Marin is finally out! Come see what we've been building! Code/platform for training fully reproducible models end-to-end, from data to evals. Plus a new high quality 8B base model, fully documented from start to finish.
What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
While I was only a small contributor, it was a great experience to be part of Marin over the last year! Huge kudos to @percyliang and @dlwh for leading this project!
What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
Landed in Singapore yesterday and #ICLR2025 just started! Looking forward to meeting old and new friends. Let me know if you're there too, I would love to catch up :)

If you’re at @iclr_conf this week, come check out our spotlight poster INCLUDE during the Thursday 3:00–5:30pm session! I will be there to chat about all things multilingual & multicultural evaluation. Feel free to reach out anytime during the conference. I’d love to connect!
🚀 Introducing INCLUDE 🌍: A multilingual LLM evaluation benchmark spanning 44 languages! Contains *newly-collected* data, prioritizing *regional knowledge*. Setting the stage for truly global AI evaluation. Ready to see how your model measures up? #AI #Multilingual #LLM #NLProc
In partnership with the Swiss Federal Supreme Court, we are excited to announce new legal translation benchmarks, led by @joelniklaus on Harvey's research team. SwiLTra-Bench continues our efforts towards comprehensive evaluations for legal LLM systems.