Sten Rüdiger

@StenRuediger

“Without math, we’re in the dark.” – W. v. Siemens | Data Scientist & Innovator: Contact Index, Rankofootball, Post-Training LLMs | Ex-Lecturer @ HU Berlin

Berlin, Germany

Joined November 2018

682Following

523Followers

Pinned

Sten Rüdiger@StenRuediger · Jun 1

For a long time, I've struggled with getting deep domain knowledge into LLM chatbots. RAG is powerful, sure, but it often feels like a workaround, not an elegant, integrated solution. I knew there had to be a better way to make LLMs truly learn. 🤔 #LLM #DomainAdaptation #A

312

Sten Rüdiger@StenRuediger · Jul 20

Interesting dive into the LLM application layer in a recent @latentspacepod on @cline. What will the spectrum of coding aids look like? Will MCPs take over search (and anything else apart from the orchestrating LLM)? podcasts.apple.com/de/podcast/lat…

1.0K

Sten Rüdiger Retweeted

Alexander Wei@alexwei_ · Jul 19

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

405

1.0K

7.0K

2.0K

5.2M

Sten Rüdiger@StenRuediger · Jul 8

🔥I want that to be as easy as RAG: just throw in your custom data, click *train* and have an optimised LLM+search pipeline that beats RAG in accuracy and learns online from every interaction.

KKyle Corbitt@corbtt · Jul 7

In case you didn't know it, all the RL work we do at OpenPipe is built on our open source RL trainer, ART (agent reinforcement trainer). We want to make this super easy to use, and just published a ton of new docs to make getting started with RL easier!

Sten Rüdiger Retweeted

Lewis Tunstall@_lewtun · Jul 8

Really excited to share SmolLM3: a strong, smol reasoner! > SoTA 3B model > dual mode reasoning (think/no_think) > long context, up to 128k > multilingual: en, fr, es, de, it, pt > fully open source (ckpts, data, code, recipes) huggingface.co/HuggingFaceTB/… Details on the…

285

32.0K

Sten Rüdiger Retweeted

Kyle Corbitt@corbtt · Apr 29

🚀 Meet ART·E—our open-source RL-trained email research agent that searches your inbox and answers questions more accurately, faster, and cheaper than o3. Let's go deeper on how we built it. 🧵

121

981

955

145.0K

Sten Rüdiger Retweeted

Paul Zhou@zhiyuan_zhou_ · Jul 3

We tested WSRL (Warm-start RL) on a Franka Robot, and it leads to really efficient online RL fine-tuning in the real world! WSRL learned the peg insertion task perfectly with only 11 minutes of warmup and *7 minutes* of online RL interactions 👇🧵

254

128

80.0K

Sten Rüdiger Retweeted

Manuel Faysse@ManuelFaysse · Jul 2

🚨Should We Still Pretrain Encoders with Masked Language Modeling? We have recently seen massively trained causal decoders take the lead in embedding benchmarks, surpassing encoders w/ bidirectional attention. We revisit whether Bert-style encoders are a thing of the past? (1/N)

303

242

37.0K

Sten Rüdiger@StenRuediger · Jun 11

When, in 1997, a chess computer beat the world’s best human player, critics dismissed it as mere ‘brute force’ and not human thinking. Today, people complain when an LLM solves the Tower of Hanoi perfectly by algorithm but fails writing out all steps. Ironic, isn’t it?

Sten Rüdiger Retweeted

will brown@willccbb · Jun 2

github.com/willccbb/verif…

3.0K

Sten Rüdiger@StenRuediger · May 30

Best summary of the open-source RL crisis so far.

OOmar Khattab@lateinteraction · May 29

Sigh, it's a bit of a mess. Let me just give you guys the full nuance in one stream of consciousness since I think we'll continue to get partial interpretations that confuse everyone. All the little things I post need to always be put together in one place. First, I have long…

Sten Rüdiger@StenRuediger · May 29

SShashwat Goel@ShashwatGoel7 · May 29

Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇

600

437

82.0K

Sten Rüdiger@StenRuediger · May 29

Editing a web app on a shared server so constrained that neither Windsurf nor claude code fits. Back to manual coding like it's 1999.

Sten Rüdiger@StenRuediger · May 29

Enjoy your human intelligence - just don’t become the bottleneck.