Victor Lecomte
@vclecomte
CS PhD student at Stanford / Researcher at the Alignment Research Center
A cute question about inner product sketching came up in our research; any leads would be appreciated! 🙂 cstheory.stackexchange.com/questions/5539…
A new legal letter aimed at OpenAI lays out in stark terms the money and power grab OpenAI is trying to trick its board members into accepting — what one analyst calls "the theft of the millennium." The simple facts of the case are both devastating and darkly hilarious. I'll…
New Redwood Research (@redwood_ai) paper in collaboration with @AnthropicAI: We demonstrate cases where Claude fakes alignment when it strongly dislikes what it is being trained to do. (Thread)
New Anthropic research: Alignment faking in large language models. In a series of experiments with Redwood Research, we found that Claude often pretends to have different views during training, while actually maintaining its original preferences.
How close are current AI agents to automating AI R&D? Our new ML research engineering benchmark (RE-Bench) addresses this question by directly comparing frontier models such as Claude 3.5 Sonnet and o1-preview with 50+ human experts on 7 challenging research engineering tasks.
The Alignment Research Center (ARC) just released our first empirical paper: Estimating the Probabilities of Rare Outputs in Language Models. In this thread, I'll motivate the problem of low probability estimation and describe our setting/methods. 🧵
Last week, ARC put out a new paper! The paper is a research update on the "heuristic estimation" direction of our research into explaining neural network behavior. The paper starts by explaining what we mean by "heuristic estimation", through an example and three analogies 🧵
This is the first paper I've worked on at ARC, and I think it's pretty cool! 🙂
ARC's latest ML theory paper is on a formal backdoor detection game between an attacker, who must insert a backdoor into a function at a randomly-chosen input, and a defender, who must detect when the backdoor is active at runtime. (Thread)
We sent this letter to @GavinNewsom this morning. He should sign SB 1047! 🧵
Excited to share the first paper of my undergrad: "Incidental Polysemanticity" arxiv.org/abs/2312.03096! We present a second, "incidental" origin story of polysemanticity in task-optimized DNNs. Done in collaboration with @vclecomte @tmychow @RylanSchaeffer @sanmikoyejo (1/n)
Interested in mech interp of representations that deep networks learn? If so, check out a new type of polysemanticity we call: 💥💥Incidental Polysemanticity 💥💥 Led by @vclecomte @kushal1t @tmychow @sanmikoyejo at @stai_research @StanfordAILab arxiv.org/abs/2312.03096 1/N
My first dabble at studying learning dynamics (and at AI safety-related work)! It was a lot of fun figuring out the exact speed at which encodings get sparser under L1-regularization; I didn't expect the math to end up being so nice. 🙂
Interested in mech interp of representations that deep networks learn? If so, check out a new type of polysemanticity we call: 💥💥Incidental Polysemanticity 💥💥 Led by @vclecomte @kushal1t @tmychow @sanmikoyejo at @stai_research @StanfordAILab arxiv.org/abs/2312.03096 1/N
I've been getting into ML theory recently, and it was a pleasure to learn from @RylanSchaeffer while working on this project!
Excited to begin announcing our #NeurIPS2023 workshop & conference papers (1/10)! 🔥🚀An Information-Theoretic Understanding of Maximum Manifold Capacity Representations🚀🔥 w/ amazing cast @vclecomte @BerivanISIK @sanmikoyejo @ziv_ravid @Andr3yGR @KhonaMikail @ylecun 1/7