Scott Niekum

@scottniekum

Associate professor at UMass Amherst CICS. AIignment, safety, reinforcement learning, imitation learning, and robotics.

Joined February 2019

372Following

4KFollowers

Scott Niekum Retweeted

Harshit Sikchi (will be at RLC 25)@harshit_sikchi · Jun 20

Behavioral Foundation Models (BFMs) trained with RL are secretly more powerful than we think. BFM’s directly output a policy believed to be near-optimal given any reward function. Our new work shows that they can actually do much better:

346

259

36.0K

Scott Niekum Retweeted

RL_Conference@RL_Conference · May 27

Reminder that early registration for RLC closes on the 30th! Please register early to save yourself some money and help us get the word out.

3.0K

Scott Niekum@scottniekum · Apr 1

I'm extremely proud of the work that Harshit has done and looking forward to seeing what he does next. Congratulations, Harshit!

HHarshit Sikchi (will be at RLC 25)@harshit_sikchi · Mar 31

Successfully defended my Ph.D. today 🎓🥳! @scottniekum and @yayitsamyzhang are the best advisors I could have ever asked for. A big thanks to my committee members @marcgbellemare @yukez @PeterStone_TX . The full presentation video will be uploaded soon... Excited about what's…

3.0K

Scott Niekum Retweeted

Gokul Swamy@g_k_swamy · Mar 4

1.5 yrs ago, we set out to answer a seemingly simple question: what are we *actually* getting out of RL in fine-tuning? I'm thrilled to share a pearl we found on the deepest dive of my PhD: the value of RL in RLHF seems to come from *generation-verification gaps*. Get ready to🤿!

236

2.0K

261.0K

Scott Niekum@scottniekum · Mar 5

It’s about time! 🎉🎉🎉🎉🎉🎉🎉 nytimes.com/2025/03/05/tec…

scottniekum's tweet card. Andrew Barto and Richard Sutton developed reinforcement learning, a technique vital to chatbots like ChatGPT.

508

Scott Niekum Retweeted

Stephane Hatgis-Kessell@HatgisKessell · Jan 14

Our new paper proposes a novel method for model alignment: designing user interfaces to guide humans to conform more to the assumptions made by the algorithms that learn from their feedback. And it works! #AI #MachineLearning #RLHF #Alignment (1/n)

2.0K

Scott Niekum Retweeted

Greg Durrett@gregd_nlp · Jan 3

Huge congrats to @prasann_singhal for being one of the 8 CRA Outstanding Undergraduate Researcher Award winners! It has been an absolute privilege to work with Prasann during his time at UT. (And he's applying for PhD programs this year...hint hint...) Prasann's work... 🧵

100

11.0K

Scott Niekum@scottniekum · Dec 10

I'm quite excited about this and still a bit shocked that it works as well as it does. Imitation via distribution matching has always felt like a clunky, brittle way to teach agents. Language + zero-shot RL is natural and scales well, due to the unsupervised nature of RL Zero.

HHarshit Sikchi (will be at RLC 25)@harshit_sikchi · Dec 10

🤖 Introducing RL Zero 🤖: a new approach to transform language into behavior zero-shot for embodied agents without labeled datasets! RL Zero enables prompt-to-policy generation, and we believe this unlocks new capabilities in scaling up language-conditioned RL, providing an…

3.0K

Scott Niekum Retweeted

RL_Conference@RL_Conference · Dec 2

The call for papers for RLC is now up! Abstract deadline of 2/14, submission deadline of 2/21! Please help us spread the word. rl-conference.cc/callforpapers.…

23.0K

Scott Niekum Retweeted

brendan o'connor@brendan642 · Nov 19

We're hiring new #nlproc faculty this year! Asst or Assoc Professors in NLP at UMass CICS -- careers.umass.edu/amherst/en-us/…

1.0K

Scott Niekum Retweeted

RLDM@RLDMDublin2025 · Nov 15

Save the date! RLDM 2025, The Multi-disciplinary Conference on Reinforcement Learning and Decision Making, is only around the corner. Visit our website to keep an eye on our submission deadlines👀 rldm.org

4.0K

Scott Niekum Retweeted

Meghan E. Huber@meghanehuber · Oct 29

Come join our team at UMass Robotics!! We are hiring at the Associate/Full level for a joint appointment in engineering and computer science. Feel free to reach out if you have any questions. RTs appreciated :) careers.umass.edu/amherst/en-us/…

3.0K

Scott Niekum Retweeted

Zizhao Wang@duke_zzwang · Oct 26

In multi-object env, why do most Unsupervised Skill Discovery methods fail to learn complex skills like tool use? Because they simply maximize state coverage. Introducing our solution SkiLD: Skill Discovery Guided by Factor Interactions (NeurIPS24) wangzizhao.github.io/SkiLD/

8.0K

Scott Niekum Retweeted

Eugene Vinitsky 🍒🦋@EugeneVinitsky · Oct 23

In our new paper, we find that LLMs can efficiently do RLHF in-context! Our method, in-context preference learning (ICPL), iterates LLMs writing reward functions, training agents, and putting preferences into context. We see a 30x boost in query efficiency over baseline RLHF!

225

165

28.0K

Scott Niekum Retweeted

Marlos C. Machado@MarlosCMachado · Oct 1

For those interested, the keynotes of the @RL_Conference 2024 are now available online: youtube.com/@RL-conference… Unfortunately, Doina Precup's talk was not recorded, but we have: Andy Barto, @EmmaBrunskill, @FinaleDoshi, @svlevine, David Silver, and @PeterStone_TX.

251

133

18.0K

Scott Niekum Retweeted

David Krueger@DavidSKrueger · Sep 26

"Predicting Future Actions of Reinforcement Learning Agents" - Chung et al. We introduce the problem of predicting RL agents' behavior, which could have important safety implications. We find that RL agents that perform explicit (or implicit) planning can be more predictable.

1.0K

Scott Niekum@scottniekum · Sep 25

Our cross-university(s) collaborative work on "Scaling laws for Reward Model Overoptimization in Direct Alignment Algorithms" is accepted at @NeurIPSConf!

RRafael Rafailov @ NeurIPS@rm_rafailov · Aug 8

After the LLaMa 3.1 release and ICML, I wan to highlight our paper "Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms". TL;DR we explore the dynamics of over-optimization in DPO/IPO/SLiC and find similiar "reward hacking" issues as online RLHF.👇

3.0K

Scott Niekum@scottniekum · Sep 19

This project started with us annoyed at papers evaluating CoT "reasoning" with only GSM8k & MATH. We didn't expect to find such strong evidence that these are the only type of problem where CoT helps! Credit to @juand_r_nlp & @kmahowald for driving the rigorous meta-analysis!

ZZayne Sprague@ZayneSprague · Sep 19

To CoT or not to CoT?🤔 300+ experiments with 14 LLMs & systematic meta-analysis of 100+ recent papers 🤯Direct answering is as good as CoT except for math and symbolic reasoning 🤯You don’t need CoT for 95% of MMLU! CoT mainly helps LLMs track and execute symbolic computation

163

25.0K

Scott Niekum@scottniekum · Sep 9

Harshit’s recent papers show that that tools from convex optimization can be used to stably perform distribution matching with the assistance of offline data. This is a strong recipe for problems ranging from LfO to regularized RL and I’m excited to see how far we can push it.

HHarshit Sikchi (will be at RLC 25)@harshit_sikchi · Sep 8

How can you use offline datasets to imitate when an expert only provides you with observation trajectories (without actions)? Ex: robot's data of prior interaction + some tutorial videos. Our #CoRL2024 paper gives a simple and principled off-policy algorithm - DILO!

2.0K

Scott Niekum Retweeted

Ida Momennejad@criticalneuro · Sep 3

A thread on the history of RL/ML based on Andy Barto's talk #RLC2024: the Reinforcement Learning Conference. Beyond seeing friends & giving talks/panel, talking to @RichardSSutton & hearing Andy Barto revived a need for attention to historical psych/neuro influences on AI. 1/n🧵

563

538

91.0K