Kunal Jha (@kjha02)

Pinned

K

Kunal Jha@kjha02 · Jun 9

Oral @icmlconf !!! Can't wait to share our work and hear the community's thoughts on it, should be a fun talk! Can't thank my collaborators enough: @cogscikid @liangyanchenggg @SimonShaoleiDu @maxhkw @natashajaques

KKunal Jha@kjha02 · Apr 18

Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity. Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks. shorturl.at/fqsNN🧵

0

3

50

8

6.0K

Pinned

Kunal Jha Retweeted

M

Max Kleiman-Weiner@maxhkw · Jun 9

LLMs learn beliefs and values from human data, influence our opinions, and then reabsorb those influenced beliefs, feeding them back to users again and again. We call this the "Lock-In Hypothesis" and develop theory, simulations, and empirics to test it in our latest ICML paper!

2

11

55

26

5.0K

Kunal Jha Retweeted

M

Max Kleiman-Weiner@maxhkw · Jul 22

Our new paper is out in PNAS: "Evolving general cooperation with a Bayesian theory of mind"! Humans are the ultimate cooperators. We coordinate on a scale and scope no other species (nor AI) can match. What makes this possible? 🧵

1

13

81

42

5.0K

K

Kunal Jha@kjha02 · Jul 18

I will present this work at the ICML Multi-Agent System (MAS) workshop during the poster sessions. If you are interested in this work or self-play LLMs in general, please feel free to come chat with me!

MMickel Liu@mickel_liu · Jun 11

🤔Conventional LM safety alignment is reactive: find vulnerabilities→patch→repeat 🌟We propose 𝗼𝗻𝗹𝗶𝗻𝗲 𝐦𝐮𝐥𝐭𝐢-𝐚𝐠𝐞𝐧𝐭 𝗥𝗟 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 where Attacker & Defender self-play to co-evolve, finding diverse attacks and improving safety by up to 72% vs. RLHF 🧵

1

9

32

12

4.0K

K

Kunal Jha@kjha02 · Jul 15

Really pumped for my Oral presentation on this work today!!! Come check out the RL session from 3:30-4:30pm in West Ballroom B You can also swing by our poster from 4:30-7pm in West Exhibition Hall B2-B3 # W-713 See you all there!

KKunal Jha@kjha02 · Apr 18

Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity. Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks. shorturl.at/fqsNN🧵

0

7

47

8

7.0K

K

Kunal Jha@kjha02 · Jul 8

I'll be at ICML next week! If anyone wants to chat about single/multi-agent RL, continual learning, cognitive science, or something else, shoot me a message!!!

3

2

55

4

5.0K

K

Kunal Jha@kjha02 · Jul 4

Worried about overfitting to IFEval? 🤔 Use ✨IFBench✨ our new, challenging instruction-following benchmark! Loved working w/ @valentina__py! Personal highlight: our multi-turn eval setting makes it possible to isolate constraint-following from the rest of the instruction 🔍

VValentina Pyatkin@valentina__py · Jul 3

💡Beyond math/code, instruction following with verifiable constraints is suitable to be learned with RLVR. But the set of constraints and verifier functions is limited and most models overfit on IFEval. We introduce IFBench to measure model generalization to unseen constraints.

2

14

53

16

10.0K

Kunal Jha Retweeted

K

Kevin Ellis@ellisk_kellis · Jun 12

New paper: World models + Program synthesis by @topwasu 1. World modeling on-the-fly by synthesizing programs w/ 4000+ lines of code 2. Learns new environments from minutes of experience 3. Positive score on Montezuma's Revenge 4. Compositional generalization to new environments…

17

106

568

483

53.0K

Kunal Jha Retweeted

M

Mickel Liu@mickel_liu · Jun 11

🤔Conventional LM safety alignment is reactive: find vulnerabilities→patch→repeat 🌟We propose 𝗼𝗻𝗹𝗶𝗻𝗲 𝐦𝐮𝐥𝐭𝐢-𝐚𝐠𝐞𝐧𝐭 𝗥𝗟 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 where Attacker & Defender self-play to co-evolve, finding diverse attacks and improving safety by up to 72% vs. RLHF 🧵

5

28

105

57

28.0K

Kunal Jha Retweeted

J

Jacqueline He@jcqln_h · Jun 10

LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content. We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.

2

22

49

13

8.0K

K

Kunal Jha@kjha02 · May 28

Fun paper! ...but...data leakage in Qwen orrrrrr something else?

SStella Li@StellaLisy · May 27

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

0

3

1

398

Kunal Jha Retweeted

M

Marlos C. Machado@MarlosCMachado · May 27

📢 I'm very excited to release AgarCL, a new evaluation platform for research in continual reinforcement learning‼️ Repo: github.com/machado-resear… Website: agarcl.github.io Preprint: arxiv.org/abs/2505.18347 Details below 👇

6

20

152

66

11.0K

Kunal Jha Retweeted

R

Rui Xin@rui_xin31 · May 13

Think PII scrubbing ensures privacy? 🤔Think again‼️ In our paper, for the first time on unstructured text, we show that you can re-identify over 70% of private information *after* scrubbing! It’s time to move beyond surface-level anonymization. #Privacy #NLProc 🔗🧵

2

21

51

15

10.0K

K

Kunal Jha@kjha02 · May 1

So excited to announce our work was accepted as a Spotlight paper to @icmlconf !!! I'm looking forward to presenting our work there this summer and @cogsci_soc! Big thank you again to my collaborators @cogscikid @liangyanchenggg @SimonShaoleiDu @maxhkw @natashajaques

KKunal Jha@kjha02 · Apr 18

Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity. Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks. shorturl.at/fqsNN🧵

3

10

68

12

10.0K