Sten Rüdiger
@StenRuediger
“Without math, we’re in the dark.” – W. v. Siemens | Data Scientist & Innovator: Contact Index, Rankofootball, Post-Training LLMs | Ex-Lecturer @ HU Berlin
For a long time, I've struggled with getting deep domain knowledge into LLM chatbots. RAG is powerful, sure, but it often feels like a workaround, not an elegant, integrated solution. I knew there had to be a better way to make LLMs truly learn. 🤔 #LLM #DomainAdaptation #A
Interesting dive into the LLM application layer in a recent @latentspacepod on @cline. What will the spectrum of coding aids look like? Will MCPs take over search (and anything else apart from the orchestrating LLM)? podcasts.apple.com/de/podcast/lat…
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
🔥I want that to be as easy as RAG: just throw in your custom data, click *train* and have an optimised LLM+search pipeline that beats RAG in accuracy and learns online from every interaction.
In case you didn't know it, all the RL work we do at OpenPipe is built on our open source RL trainer, ART (agent reinforcement trainer). We want to make this super easy to use, and just published a ton of new docs to make getting started with RL easier!
Really excited to share SmolLM3: a strong, smol reasoner! > SoTA 3B model > dual mode reasoning (think/no_think) > long context, up to 128k > multilingual: en, fr, es, de, it, pt > fully open source (ckpts, data, code, recipes) huggingface.co/HuggingFaceTB/… Details on the…
🚀 Meet ART·E—our open-source RL-trained email research agent that searches your inbox and answers questions more accurately, faster, and cheaper than o3. Let's go deeper on how we built it. 🧵
We tested WSRL (Warm-start RL) on a Franka Robot, and it leads to really efficient online RL fine-tuning in the real world! WSRL learned the peg insertion task perfectly with only 11 minutes of warmup and *7 minutes* of online RL interactions 👇🧵
🚨Should We Still Pretrain Encoders with Masked Language Modeling? We have recently seen massively trained causal decoders take the lead in embedding benchmarks, surpassing encoders w/ bidirectional attention. We revisit whether Bert-style encoders are a thing of the past? (1/N)
When, in 1997, a chess computer beat the world’s best human player, critics dismissed it as mere ‘brute force’ and not human thinking. Today, people complain when an LLM solves the Tower of Hanoi perfectly by algorithm but fails writing out all steps. Ironic, isn’t it?
Best summary of the open-source RL crisis so far.
Sigh, it's a bit of a mess. Let me just give you guys the full nuance in one stream of consciousness since I think we'll continue to get partial interpretations that confuse everyone. All the little things I post need to always be put together in one place. First, I have long…
Sigh, it's a bit of a mess. Let me just give you guys the full nuance in one stream of consciousness since I think we'll continue to get partial interpretations that confuse everyone. All the little things I post need to always be put together in one place. First, I have long…
Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇
Editing a web app on a shared server so constrained that neither Windsurf nor claude code fits. Back to manual coding like it's 1999.
Enjoy your human intelligence - just don’t become the bottleneck.
