Yusuf Kocyigit

@mykocyigit

CS PhD at Boston University. NLP, Evaluation. Previously @google, @AIatMeta and @AmazonScience

Boston

Joined July 2024

137Following

72Followers

Pinned

Thrilled to share our latest findings on data contamination, from my internship at @Google! We trained almost 90 Models on 1B and 8B scales with various contamination types using machine translation as our task and analyze the impact of contamination. arxiv.org/abs/2501.18771

11.0K

Pinned

Yusuf Kocyigit Retweeted

Jurik Juraska@JurikJuraska · Dec 3

🌐 Meet MetricX-24, our SOTA machine translation evaluation metric and a successor to the successful MetricX-23. 🚀 Now open-source in PyTorch/Transformers! 🎉 Ready to take this top performer in the WMT24 Metrics Shared Task for a spin? 🔗 Code: github.com/google-researc…

2.0K

Yusuf Kocyigit@mykocyigit · May 2

Our work got accepted to ICML! Looking forward to sharing more about this project with everyone this summer!

YYusuf Kocyigit@mykocyigit · Feb 6

684

Yusuf Kocyigit Retweeted

Jacob Andreas@jacobandreas · Dec 16

Ekin Akyürek (@akyurekekin) builds tools for understanding & controlling algorithms that underlie reasoning in language models. You’ve likely seen his work on in-context learning; I'm just as excited about past work on linguistic generalization & future work on test-time scaling.

5.0K

Yusuf Kocyigit Retweeted

ahmet salih gundogdu@asalihgundogdu · Nov 19

I am looking for Machine Learning Intern for the Spring or Summer terms at the AI Institute for scaling our robot policy learning stack. Apply here and DM me! jobs.lever.co/bostondynamics…

170

162

26.0K

Yusuf Kocyigit Retweeted

Ekin Akyürek@akyurekekin · Nov 10

Why do we treat train and test times so differently? Why is one “training” and the other “in-context learning”? Just take a few gradients during test-time — a simple way to increase test time compute — and get a SoTA in ARC public validation set 61%=avg. human score! @arcprize

336

2.0K

492.0K

Yusuf Kocyigit@mykocyigit · Nov 7

Super excited to have this out! Was great to work on this with @mykocyigit supervised by @_dieuwke_ and figure out the best post-hoc methods for identifying eval contamination + measure its effects on performance. A short 🧵

DDieuwke Hupkes@_dieuwke_ · Nov 7

New deep-dive into evaluation data contamination 😍🤩. Curious how much contamination there really is in common LLM training corpora, how much that actually impacts benchmark scores and what is the best metric to evaluate that? Read our new preprint! arxiv.org/abs/2411.03923

4.0K

Yusuf Kocyigit Retweeted

Dr. Jaan Islam د. جان اسلام@jaanislam · Nov 6

Kamuoyuna, Akademik geçmişimi ve çalışmalarımla ilgili bazı yanlış anlaşılmaları gidermek üzere kısaca kendimi tanıtmak istiyorum. Ben Jaan (Can) Süleyman İslam, 26 yaşında bir akademisyenim ve dış politika ile uluslararası ilişkiler alanlarında araştırmalar yapmaktayım.…

115

181

824

148.0K