Nan Jiang

@nanjiang_cs

machine learning researcher, with focus on reinforcement learning. assoc prof @ uiuc cs. Course on RL theory (w/ videos): https://nanjiang.cs.illinois.edu/cs542

Joined November 2017

73Following

9KFollowers

Pinned

Nan Jiang@nanjiang_cs · Aug 14, 2020

Learning Q* with + poly-sized exploratory data + an arbitrary Q-class that contains Q* ...has seemed impossible for yrs, or so I believed when I talked at @RLtheory 2mo ago. And what's the saying? Impossible is NOTHING arxiv.org/abs/2008.04990 Exciting new work w/@tengyangx! 1/

nanjiang_cs's tweet image. Learning Q* with
+ poly-sized exploratory data
+ an arbitrary Q-class that contains Q*

...has seemed impossible for yrs, or so I believed when I talked at @RLtheory 2mo ago.

And what's the saying? Impossible is NOTHING
arxiv.org/abs/2008.04990
Exciting new work w/@tengyangx! 1/

110

Pinned

Nan Jiang@nanjiang_cs · May 30

Thank goodness at least someone remembers nytimes.com/2025/05/30/opi…

NNan Jiang@nanjiang_cs · May 30

It’s almost hilarious that he chose to mention US space program. Who tells him what happened when a founding figure of JPL was expelled for the exact same reasons they use today?

2.0K

Nan Jiang Retweeted

Jonathan Ullman@thejonullman · Jul 24

Really excited that I will be co-chairing ALT 2026 with Matus Telgarsky! The conference will be Feb 23-26, 2026 at the Fields Institute in Toronto. Website with cfp is now live---stay tuned for updates Please submit your best work and come join us!

8.0K

Nan Jiang@nanjiang_cs · Jul 19

missing ICML, and I used this week to write my first technical blog on some recent thoughts on two different roles of simulators in RL and the confusions/misconceptions around them. Comments welcome! nanjiang.cs.illinois.edu/2025/07/16/sim…

nanjiang_cs's tweet image. missing ICML, and I used this week to write my first technical blog on some recent thoughts on two different roles of simulators in RL and the confusions/misconceptions around them. Comments welcome!

nanjiang.cs.illinois.edu/2025/07/16/sim…

143

101

10.0K

Nan Jiang Retweeted

Allen Nie (🇺🇦☮️)@allenainie · Jul 15

Provably Learning from Language Feedback TLDR: RL theory can help us do better inference-time exploration with feedback. Work done with @wanqiao_xu, @ruijie_zheng12, @chinganc_rl, @adityamodi94, @adith387 📰 arxiv.org/pdf/2506.10341 📍EXAIT Best Paper/Oral Sat 8:45-9:30 am

3.0K

Nan Jiang@nanjiang_cs · Jul 15

For those at ICML, Audrey will be presenting this paper at the 4:30 poster session this afternoon! West Exhibition Hall B2-B3 W-1009

DDylan Foster 🐢@canondetortugas · May 5

Is Best-of-N really the best we can do for language model inference? New algo & paper: 🚨InferenceTimePessimism🚨 Led by the amazing Audrey Huang (@auddery) with Adam Block, Qinghua Liu, Nan Jiang (@nanjiang_cs), and Akshay Krishnamurthy. Appearing at ICML '25. 1/11

3.0K

Nan Jiang@nanjiang_cs · Jul 14

I thought what disappeared was test (replaced by benchmark evals), not val. If you use the “test” split from train data (1) more than once and (2) without reporting it, then that’s just val?

jjxmo@jxmnop · Jul 14

when i first learned Machine Learning, our professor ingrained into us how every ML problem starts by splitting data into train, test, and validation these days there is just train and test. in many cases there is just train and more train where’d all the validation sets go?

3.0K

Nan Jiang@nanjiang_cs · Jul 11

this sounds more like unreal/unity…?

NNathan Lambert@natolambert · Jul 11

Modern licenses so funny. Amazing looking model. MIT-Modified. Marketing is king. "Our only modification part is that, if the Software (or any derivative worksthereof) is used for any of your commercial products or services that havemore than 100 million monthly active users, or…

1.0K

Nan Jiang@nanjiang_cs · Jul 8

Re prompt injection in papers: what if reviewers do their job seriously using LLM as a useful tool (given conf permission)? Aren’t we supposed to be “not afraid of LLMs” as in eg education? somehow we become extremely conservative when it comes to ourselves…?

3.0K

Nan Jiang Retweeted

Csaba Szepesvari@CsabaSzepesvari · Jul 8

First position paper I ever wrote. "Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence" arxiv.org/abs/2506.23908 Background: I'd like LLMs to help me do math, but statistical learning seems inadequate to make this happen. What do you all think?

438

348

34.0K

Nan Jiang@nanjiang_cs · Jun 26

HUGE congrats to @wanqiao_xu -- this paper just got the best theory paper award at ICML 2025 EXAIT (Exploration in AI) -- proposing a new provably efficient exploration algorithm 🛣️ with the right level of abstraction to leverage the strengths of LLMs 💭.

AAllen Nie (🇺🇦☮️)@allenainie · Jun 17

Decision-making with LLM can be studied with RL! Can an agent solve a task with text feedback (OS terminal, compiler, a person) efficiently? How can we understand the difficulty? We propose a new notion of learning complexity to study learning with language feedback only. 🧵👇

6.0K

Nan Jiang Retweeted

TalkRL Podcast@TalkRLPodcast · Jun 25

E66: Satinder Singh: The Origin Story of RLDM @ RLDM 2025 Professor Satinder Singh of @GoogleDeepMind and @UMich is co-founder of @RLDMDublin2025. Here he narrates the origin story of the Reinforcement Learning and Decision Making meeting (not conference).

2.0K

Nan Jiang Retweeted

John Langford@JohnCLangford · Jun 24

A new opening for multimodal model research: jobs.careers.microsoft.com/global/en/job/… . Please apply if interested.

12.0K

Nan Jiang Retweeted

Eugene Vinitsky 🍒🦋@EugeneVinitsky · Jun 23

We now know RL agents can zero-shot crush driving benchmarks. Can we put them on a car and replace the planning stack? We're hiring a postdoc at NYU to find out! Email me if interested and please help us get the word out.

271

30.0K

Nan Jiang Retweeted

Allen Nie (🇺🇦☮️)@allenainie · Jun 17

103

17.0K

Nan Jiang@nanjiang_cs · Jun 14

Re error propagation: if you believe model-based is a solution but also want the benefits of model-free, perhaps time to investigate (never thoroughly-studied) bellman-error minimization... BRM is, in a way, closer to model-based than TD (small revelation from my l4dc talk)

SSeohong Park@seohong_park · Jun 13

Q-learning is not yet scalable seohong.me/blog/q-learnin… I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).

219

197

28.0K

Nan Jiang@nanjiang_cs · Jun 11

I've received multiple emails from nxtai-conference.com. Having "AI+quantum" as keywords looks very much like a scam at first glance (sorry!) but the speaker list seems very legit. What's going on with this...? Also can't find organizer info. Are there academics behind this?

6.0K

Nan Jiang Retweeted

Sam Power@sp_monte_carlo · May 28

In the interim, I wanted to advertise our YouTube channel - youtube.com/@montecarlosem… - which contains recordings for the bulk of our talks so far (sites.google.com/view/monte-car…, sites.google.com/view/monte-car…). I encourage you to catch up and enjoy them over the intervening months!

3.0K

Nan Jiang@nanjiang_cs · Jun 4

En route to (my first) l4dc and will be giving a keynote on Friday. Happy to chat tmr & Friday if you are around!

LL4DC Conference@l4dc_conf · Jun 4

Join the Opening Reception at the Ford Robotics Building (FRB) on North Campus: - Opening Reception starts at 6pm - Lab tours and demos from 5:30pm-8:30pm Ask a volunteer to take blue bus (CN) on Division at Jefferson or rideshare to head north! maps.app.goo.gl/LnJPR3fTNLTTAB…

3.0K

Nan Jiang@nanjiang_cs · Jun 3

Given the sheer number of ppl interested in PG methods nowadays I'm sure innocent "rediscoveries" like this are happening everyday. Otoh, due diligence takes minimal effort today as you can just DeepResearch. All it takes is the sense/taste to ask "no way this is not done b4"...

HHaitham Bou Ammar@hbouammar · Jun 2

I read this paper in detail, and I am very sad! They literally re-do the optimal reward baseline work that we have known since forever, without even crediting the true authors in their derivations. The third screenshot is taken from: ieeexplore.ieee.org/stamp/stamp.js… As you see, they…

7.0K

Nan Jiang@nanjiang_cs · May 30

It’s almost hilarious that he chose to mention US space program. Who tells him what happened when a founding figure of JPL was expelled for the exact same reasons they use today?

AAcyn@Acyn · May 30

Vance: I've heard a lot of the criticisms, the fear that we're going to have a brain drain. If you go back to the 50s and 60s, the American space program, the program that was the first to put a human being on the surface of the moon was built by American citizens. Some German…

3.0K