Nitay Calderon

@NitCal

PhD candidate @TechnionLive | NLP

Joined June 2022

480Following

277Followers

Pinned

Do you use LLM-as-a-judge or LLM annotations in your research? There’s a growing trend of replacing human annotators with LLMs in research—they're fast, cheap, and require less effort. But can we trust them?🤔 Well, we need a rigorous procedure to answer this. 🚨New preprint👇

NitCal's tweet image. Do you use LLM-as-a-judge or LLM annotations in your research?

There’s a growing trend of replacing human annotators with LLMs in research—they're fast, cheap, and require less effort.

But can we trust them?🤔
Well, we need a rigorous procedure to answer this.

🚨New preprint👇

198

172

21.0K

Nitay Calderon Retweeted

Lotem Peled@lotemi_peled · Jul 26

Everyone uses LLMs for annotation. DO IT RIGHT. Use the Alternative Annotator Test. Huge shoutout to Nitay, Rotem and Roi (@NitCal, @DrorRotem, @roireichart) for their latest paper, making LLM more rigorous. arxiv.org/abs/2501.10970 #nlp #llm #healthTech

Nitay Calderon Retweeted

Shmulik Amar@pyshmulik · Jul 24

🚨 Introducing IGCS, accepted to #TACL! Instruction Guided Content Selection (IGCS) unifies many tasks such as extractive summarization, evidence retrieval and argument mining under one scheme for selecting extractive spans in given sources. arxiv.org/abs/2507.16922 @biunlp (1/n)

694

Nitay Calderon@NitCal · Jul 24

Check out our new paper on benchmarking content-selection! 🔎

SShmulik Amar@pyshmulik · Jul 24

300

Nitay Calderon Retweeted

Eran Hirsch @ACL2025 🇦🇹@hirscheran · Jun 3

🚨 Introducing LAQuer, accepted to #ACL2025 (main conf)! LAQuer provides more granular attribution for LLM generations: users can just highlight any output fact (top), and get attribution for that input snippet (bottom). This reduces the amount of text the user has to read by 2…

9.0K

Nitay Calderon Retweeted

Itay Itzhak @ ACL 2025 ✈️🇦🇹@Itay_itzhak_ · Jul 15

🚨New paper alert🚨 🧠 Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing? Excited to share our new paper, accepted to CoLM 2025🎉! See thread below 👇 #BiasInAI #LLMs #MachineLearning #NLProc

3.0K

Nitay Calderon Retweeted

Tal Haklay ✈️ACL@tal_haklay · Jul 10

🚨Meet our panelists at the Actionable Interpretability Workshop @ActInterp at @icmlconf! Join us July 19 at 4pm for a panel on making interpretability research actionable, its challenges, and how the community can drive greater impact. @nsaphra @saprmarks @kylelostat @FazlBarez

9.0K

Nitay Calderon@NitCal · Jul 9

Now accepted to #COLM2025! We formally define hidden knowledge in LLMs and show its existence in a controlled study. We even show that a model can know the answer yet fail to generate it in 1,000 attempts 😵 Looking forward to presenting and discussing our work in person.

ZZorik Gekhman@zorikgekhman · Mar 31

🚨 It's often claimed that LLMs know more facts than they show in their outputs, but what does this actually mean, and how can we measure this “hidden knowledge”? In our new paper, we clearly define this concept and design controlled experiments to test it. 1/🧵

3.0K

Nitay Calderon Retweeted

Mike Erlihson, Math PhD, AI@MikeE_3_14 · Jun 25

🔥הסקירות ממשיכות לזרום ל-X🔥 🧵 המאמר היומי של מייק: 25.06.25 The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs מאמר 🇮🇱 תפנית מעניינת מתרחשת בתקופה האחרונה בעולם של הערכת ביצועי מודלים. אנחנו כבר לא שואלים רק…

3.0K

Nitay Calderon Retweeted

Itai Gat@itai_gat · Jun 25

Excited to share our recent work on corrector sampling in language models! A new sampling method that mitigates error accumulation by iteratively revisiting tokens in a window of previously generated text. With: @shaulneta @urielsinger @lipmanya Link: arxiv.org/abs/2506.06215

27.0K

Nitay Calderon Retweeted

Itay Nakash@itay__nakash · Jun 23

🚨New preprint! Are your policy-adherent agents really safe? We present CRAFT: a multi-agent, strategic, red-teaming system that shows current evaluations vastly UNDERESTIMATE how easily these agents break under realistic, strategic attacks. 🧵 1/n

919