Ashwinee Panda
@PandaAshwinee
Postdoc of @tomgoldsteincs, PhD @princeton, @Cal alum, currently working on LLMs
thrilled to receive the outstanding paper award for our work on shallow alignment! i’ll be giving the talk at 10:42am tomorrow (Thursday) in oral session 1D. the poster will be Friday 3PM.
Outstanding Papers Safety Alignment Should be Made More Than Just a Few Tokens Deep. Xiangyu Qi, et al. Learning Dynamics of LLM Finetuning. Yi Ren and Danica J. Sutherland. AlphaEdit: Null-Space Constrained Model Editing for Language Models. Junfeng Fang, et al.
master stroke by @kellerjordan0 and co. to not name their adam-killer anything that could be confused with a human name, this avoiding this issue entirely
Anyone knows adam?
i once failed an interview bc instead of giving the standard answer to “why do LLMs need tokenization” i decided to say what i really think, which is “i don’t think they really need it…” very glad to see this arch that albert has been hyping up, and excited to try it out
Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
Norman’s team is working on some seriously hard problems -but they have a ton of resources and a lot of really smart people to work with. This is a super exciting team to work with for sure!
I'm hiring for our AI safety team at xAI! We urgently need strong engineers/researchers to work across all stages of the the frontier AI development cycle: data, training, evals, and product 1. job-boards.greenhouse.io/xai/jobs/47992… 2. job-boards.greenhouse.io/xai/jobs/47992…
well well well look how the turntables
DO NOT DO THIS. I have previously raised this for Ethics Review when I saw it in a paper. You are not sneaky.
i have a new SOTA algorithm for generating kernels 1. claim that your proposed algorithm beats torch.compile 2. wait for horace to give you the magic incantation that will make torch.compile generate a better kernel
Pretty cool, especially for long sequences! I will note that you can pretty easily get much better numbers for torch.compile that are much closer for sequences up to about 16384. A couple things: 1. By default torch.compile generates dynamic-shapes kernels when benchmarked…
what a fascinating study! it does seem like most people have moved on from the "i work with Cursor" to "i tell Claude Code to do something and then swap to another tab", i wonder how that would impact things.
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
i really like this blog because reading it feels like having a conversation with albert; specifically, statements like "I’m driven by aesthetics much more than the average person, I’d guess" i'm excited to see this new architecture that i've been hearing so much about!
I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers" (or: tokens are bullshit) In a few days, we'll release what I believe is the next major advance for architectures.
we’ll be presenting LoRI at #COLM2025!
🚨 How much parameter redundancy does LoRA really contain? We introduce LoRI, a method that keeps performance strong—even when we drastically shrink trainable parameters of LoRA. 🧵1/N
"someone of Ilya Sutskever's capabilities" i've got just the guy...
We asked @kyliebytes (Senior Correspondent @WIRED) about Meta's hiring strategy and the challenges of attracting top talent. "I'll be impressed if they get someone of Ilya Sutskevar's capabilities." "The people that want to build super-intelligence are true believers." "The…
> progress is based on real-world experiments rather than raw intelligence the tech that people are cooking up now is based on the insights from deploying models. if GPT-5 can't actually deploy its creations, how is it going to figure out what is needed for GPT-6? evals?
We don’t have AI self-improves yet, and when we do it will be a game-changer. With more wisdom now compared to the GPT-4 days, it's obvious that it will not be a “fast takeoff”, but rather extremely gradual across many years, probably a decade. The first thing to know is that…