Jonathan Berant

@JonathanBerant

NLP at Tel-Aviv University and Google DeepMind

Joined June 2011

276Following

3KFollowers

Pinned

Jonathan Berant@JonathanBerant · Jun 13

This paper extends active statistical inference in a number of exciting ways, with applications in LLM evaluation! 1. Improves upon active inference to give the optimal sampling policy with clipping. 2. Gives an optimal-cost inference procedure Take a look! One of my fave…

AAdam Fisch@adamjfisch · Jun 13

You need to evaluate an AI system and you have three things: 1. A cheap judge, which is noisy. 🙈 2. An expensive judge, which is accurate. 🧑‍⚖️ 3. A budget 💸 How should you spend the budget to get the best possible estimate of model quality? arxiv.org/abs/2506.07949

4.0K

Jonathan Berant Retweeted

Ziteng Sun@SZiteng · Jul 17

[Today 11 am poster E-2804 #ICML2025] Inference-time compute have been instrumental to recent development of LLMs. Can we align our model to better suit a given inference-time procedure? Come check our poster and discuss with @ananthbshankar, @abeirami, @jacobeisenstein, and…

1.0K

Jonathan Berant@JonathanBerant · Jul 8

Accepted to COLM @COLM_conf !

JJonathan Berant@JonathanBerant · Mar 20

Hi ho! New work: arxiv.org/pdf/2503.14481 With amazing collabs @jacobeisenstein @jdjdhekchbdjd @adamjfisch @ddua17 @fantinehuot @mlapata @vicky_zayats Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3

768

Jonathan Berant Retweeted

Adam Fisch@adamjfisch · Jun 13

Work co-led with @ml_angelopoulos , whom we had the pleasure of briefly hosting here at @GoogleDeepMind for this collaboration, together with my GDM and GR colleagues @jacobeisenstein , @JonathanBerant , and Alekh Agarwal.

555

Jonathan Berant Retweeted

Adam Fisch@adamjfisch · Jun 13

We explore how much these policies improve over the naïve empirical estimates of E[H] using synthetic + real data. The optimal pi depends on unknown distributional properties of (X, H, G), so we examine performance in theory (using oracle rules) + in practice (when approximated).

464

Jonathan Berant Retweeted

Adam Fisch@adamjfisch · Jun 13

We solve for two types of policies: (1) the best fixed sampling rate, pi_random(x) = p*, that doesn’t change with X, and (2) and the best fully active policy pi_active(x) \in (0, 1]. Intuitively, fully active is better when G has variable accuracy (e.g., we see hard + easy Xs).

464

Jonathan Berant Retweeted

Adam Fisch@adamjfisch · Jun 13

Specifically, building on the active PPI estimator of Zrnic and Candès, we derive a family of cost-optimal policies, pi(x), that determine the best probabilities for choosing to get H_t, versus choosing to just use G_t, for each X_t.

623

Jonathan Berant Retweeted

Adam Fisch@adamjfisch · Jun 13

In our setup, we look at responses X one-by-one. For each X, we can get a cheap rating G = g(X) at a discount, but also maybe choose to get an expensive rating H = h(X). Informally, at the end of the day, we want the best unbiased estimate of E[H] we can get, within our budget.

757

Jonathan Berant Retweeted

Adam Fisch@adamjfisch · Jun 13

13.0K

Jonathan Berant Retweeted

Jacob Eisenstein@jacobeisenstein · May 22

We're hiring a research scientist on the Foundational Research in Language team at GDM. The role is right here in sunny Seattle! job-boards.greenhouse.io/deepmind/jobs/…

5.0K

Jonathan Berant Retweeted

Ran Harnevo@harnevo · May 12

עמרי מירן נחטף בנחל עוז לעיני אשתו לישי ושתי בנותיו - רוני שהייתה אז בת שנתיים ועלמא תינוקת בת חצי שנה. היום, בוועדת החינוך של הכנסת, פגשה לישי את אחד ממפקיריו, שר החינוך יואב קיש. חצי לב שחור 🖤❤️ שתפו אותה בכל מקום!

227

1.0K

20.0K

Jonathan Berant@JonathanBerant · Apr 29

Super honored to win the Language Modeling SAC award! I'll be presenting this work Wednesday in the 2pm poster session in Hall 3-- would love to chat with folks there or at the rest of the conference about long context data, ICL, inference time methods, New Mexican food, etc :)

AAmanda Bertsch@abertsch72 · May 3, 2024

In-context learning provides an LLM with a few examples to improve accuracy. But with long-context LLMs, we can now use *thousands* of examples in-context. We find that this long-context ICL paradigm is surprisingly effective– and differs in behavior from short-context ICL! 🧵

106

11.0K

Jonathan Berant@JonathanBerant · Apr 19

This was my first time submitting to TMLR, and thanks to the reviewers and AE @murefil for making it a positive experience! TMLR seems to offer some nice pros vs. ICML/ICLR/NeurIPS, eg: - Potentially lower variance review process - Not dependent on conference calendar

AAccepted papers at TMLR@TmlrPub · Apr 14

ALTA: Compiler-Based Analysis of Transformers Peter Shaw, James Cohan, Jacob Eisenstein, Kenton Lee, Jonathan Berant, Kristina Toutanova. Action editor: Alessandro Sordoni. openreview.net/forum?id=h751w… #compiler #interpreter #programming

4.0K

Jonathan Berant Retweeted

Accepted papers at TMLR@TmlrPub · Apr 14

5.0K

Jonathan Berant Retweeted

Ran Harnevo@harnevo · Mar 21

אפי שהם, איבד את בנו יובל רק לפני שלושה חודשים בעזה, פרופסור להיסטוריה, ירושלמי, ציוני דתי, מלח הארץ. אתמול בצעדת האקדמיה. תקשיבו לו. שתפו אותו. הגענו לרגע ההכרעה. צאו לרחובות ✊🇮🇱

544

3.0K

49.0K

Jonathan Berant Retweeted

Ori Yoran@OriYoran · Mar 20

New #ICLR2024 paper! The KoLMogorov Test: can CodeLMs compress data by code generation? The optimal compression for a sequence is the shortest program that generates it. Empirically, LMs struggle even on simple sequences, but can be trained to outperform current methods! 🧵1/7

293

149

42.0K

Jonathan Berant Retweeted

Yair Golan - יאיר גולן@YairGolan1 · Mar 18

החיילים בחזית והחטופים בעזה הם רק קלפים במשחק ההישרדות שלו - נתניהו משתמש בחיי אזרחינו וחיילינו כי הוא רועד מפחד מאיתנו - מהמחאה הציבורית נגד פיטורי ראש השב״כ. לכן, אסור לתת לטירוף לנצח. המחאה חייבת להתפרץ בזעם כדי להציל חטופים, חיילים ואת מדינת ישראל מהידיים של האיש המושחת…

599

597

4.0K

202.0K