Yusuf Kocyigit
@mykocyigit
CS PhD at Boston University. NLP, Evaluation. Previously @google, @AIatMeta and @AmazonScience
Thrilled to share our latest findings on data contamination, from my internship at @Google! We trained almost 90 Models on 1B and 8B scales with various contamination types using machine translation as our task and analyze the impact of contamination. arxiv.org/abs/2501.18771
🌐 Meet MetricX-24, our SOTA machine translation evaluation metric and a successor to the successful MetricX-23. 🚀 Now open-source in PyTorch/Transformers! 🎉 Ready to take this top performer in the WMT24 Metrics Shared Task for a spin? 🔗 Code: github.com/google-researc…
Our work got accepted to ICML! Looking forward to sharing more about this project with everyone this summer!
Thrilled to share our latest findings on data contamination, from my internship at @Google! We trained almost 90 Models on 1B and 8B scales with various contamination types using machine translation as our task and analyze the impact of contamination. arxiv.org/abs/2501.18771
Ekin Akyürek (@akyurekekin) builds tools for understanding & controlling algorithms that underlie reasoning in language models. You’ve likely seen his work on in-context learning; I'm just as excited about past work on linguistic generalization & future work on test-time scaling.
I am looking for Machine Learning Intern for the Spring or Summer terms at the AI Institute for scaling our robot policy learning stack. Apply here and DM me! jobs.lever.co/bostondynamics…
Why do we treat train and test times so differently? Why is one “training” and the other “in-context learning”? Just take a few gradients during test-time — a simple way to increase test time compute — and get a SoTA in ARC public validation set 61%=avg. human score! @arcprize
Super excited to have this out! Was great to work on this with @mykocyigit supervised by @_dieuwke_ and figure out the best post-hoc methods for identifying eval contamination + measure its effects on performance. A short 🧵
New deep-dive into evaluation data contamination 😍🤩. Curious how much contamination there really is in common LLM training corpora, how much that actually impacts benchmark scores and what is the best metric to evaluate that? Read our new preprint! arxiv.org/abs/2411.03923
Kamuoyuna, Akademik geçmişimi ve çalışmalarımla ilgili bazı yanlış anlaşılmaları gidermek üzere kısaca kendimi tanıtmak istiyorum. Ben Jaan (Can) Süleyman İslam, 26 yaşında bir akademisyenim ve dış politika ile uluslararası ilişkiler alanlarında araştırmalar yapmaktayım.…