Lorenzo Pacchiardi @ ACL25
@LPacchiardi
AI evaluations @LeverhulmeCFI @Cambridge_Uni. PhD in Stats and ML @UniofOxford. Studied at @PoliTOnews, @ictpnews, @Sissaschool, @UnivParisSaclay.
We just arxived a paper showing how to extract the most predictive and explanatory power from AI benchmarks by considering the demands posed by each question. Check it out!
New Paper: We unlock AI Evaluation with explanatory and predictive power through general ability scales! -Explains what common benchmarks really measure -Extracts explainable ability profiles of AI systems -Predicts performance for new task instances, in & out-of-distribution 🧵
New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
The June edition of the AI evaluation digest. If you want to be up to speed with the scientific literature on AI evaluation, this is a good place to start. open.substack.com/pub/aievaluati…
🧵1/5 The EU is building something unprecedented: a Scientific Panel with real teeth to assess the impacts and risks of general-purpose AI systems. 60 independent experts will directly influence how the world's first comprehensive AI law gets implemented.