Arturs🔸
@ArtursKanepajs
"Road to AnimalHarmBench" by Artūrs Kaņepājs (@ArtursKanepajs), Constance Li (@conlicats)
Feel free to share this short guide that others and I developed for anyone who has interacted with an AI that seemed conscious — or simply wondered if they could be. whenaiseemsconscious.org
In a more practical setup for distillation, the teacher is a misaligned model and generates reasoning traces for math questions. We filter out traces that are incorrect or show misalignment. Yet the student model still becomes misaligned.
So, all the models underperform humans on the new International Mathematical Olympiad questions, and Grok-4 is especially bad on it, even with best-of-n selection? Unbelievable!
NeurIPS is pleased to officially endorse EurIPS, an independently-organized meeting taking place in Copenhagen this year, which will offer researchers an opportunity to additionally present their accepted NeurIPS work in Europe, concurrently with NeurIPS. Read more in our blog…
OpenAI and Mistral have announced they're intending to sign the general-purpose AI Code of Practice. pro.politico.eu/news/201744 openai.com/global-affairs…
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
The Code of Practice is out. I co-wrote the Safety & Security Chapter, which is an implementation tool to help frontier AI companies comply with the EU AI Act in a lean but effective way. I am proud of the result! 1/3
We did it!🎉The EU’s new General-Purpose AI Code of Practice now includes a "non-human welfare" clause. While not legally binding, it sets an important precedent—encouraging AI developers to assess risks to animal welfare 🐔🦐and potentially AI welfare 🤖too!
I just curated @arturskanepajs and @ConLiCats's post 'Road to AnimalHarmBench' where they tell the story of getting from an idea -> a benchmark to test risk of harm to non-human animals from LLMs... in 3 months. And that's with 1 FTE and a couple hundred dollars for compute. 🧵
Fun fact, this is the third time (that I've found) worldwide where an adjudicator has done this. First, that I've seen in the USA though...
Oh no, it finally happened:
📰 𝗝𝘂𝗻𝗲 𝗡𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿: 𝗔𝗻𝗶𝗺𝗮𝗹𝗛𝗮𝗿𝗺𝗕𝗲𝗻𝗰𝗵 𝗱𝗲𝗯𝘂𝘁, 𝗱𝗶𝗴𝗶𝘁𝗮𝗹 𝘀𝗲𝗻𝘁𝗶𝗲𝗻𝗰𝗲 𝗳𝘂𝗻𝗱𝗶𝗻𝗴, 𝗔𝗜-𝗮𝗻𝗶𝗺𝗮𝗹 𝗰𝗼𝗻𝘁𝗿𝗮𝗰𝘁𝗼𝗿 𝗼𝗽𝗽𝗼𝗿𝘁𝘂𝗻𝗶𝘁𝗶𝗲𝘀 We are testing out a new, reader-friendly experience and we want to hear from our…
My main update is that probably nobody will build precursor evals that are sufficiently predictive in high-stakes settings. So we should make plans that don't rely on this, e.g. integrate mitigations long before the RSP thresholds are hit
Our scheming precursors were not very predictive of scheming We publish a small research note where we empirically checked the predictive power of some older precursor evals with hindsight. We suggest not relying too heavily on precursor evals 🧵
🔥New on the EA Forum: A retrospective on the creation of AnimalHarmBench. We share the story behind its development and our key learnings. Read it here: forum.effectivealtruism.org/posts/NAnFodwQ…
Proud of @ArtursKanepajs for presenting the first ever standardized, empirical benchmark measuring AI harm risk to animals at #FAccT 2025.
Thanks @AI_forAnimals! Proudly presented this work on Tuesday in Athens. Will share more news soon!
Proud of @ArtursKanepajs for presenting the first ever standardized, empirical benchmark measuring AI harm risk to animals at #FAccT 2025.
The very properties libertarians champion – state-resistance, permissionless value transfer, seizure-proof assets – are being used by an anti-libertarian, authoritarian regime. Those properties work. Just not for who you might think.