Anthony Peng
@RealAnthonyPeng
CS PhD @GeorgiaTech | Intern @Meta, @IBMResearch, @intel | Outcomes are what count; don’t let good processes excuse bad results.
🚨 New work: We rethink how we finetune safer LLMs — not by filtering after the generation, but by tracking safety risk token by token during training. We repurpose guardrail models like 🛡️ Llama Guard and Granite Guardian to score evolving risk across each response 📉 — giving…

Your LLM Guard Model is secretly a reliable LLM-finetuning-guardrail! IBM Granite Guardian and LLAMA Guard are particularly suited to tracking harmful levels of fine-tuning data at the token level and making training adjustments during fine-tuning Paper: arxiv.org/abs/2505.17196
🚨 New work: We rethink how we finetune safer LLMs — not by filtering after the generation, but by tracking safety risk token by token during training. We repurpose guardrail models like 🛡️ Llama Guard and Granite Guardian to score evolving risk across each response 📉 — giving…
This is a timely and much-needed initiative! As we approach widespread deployment of LLM-powered agents, research on responsible autonomy and safe-by-design AI is more urgent than ever. Technical guardrails are essential to ensure these systems behave safely in complex,…
Today marks a big milestone for me. I'm launching @LawZero_, a nonprofit focusing on a new safe-by-design approach to AI that could both accelerate scientific discovery and provide a safeguard against the dangers of agentic AI.
Back at Meta for the summer! 😎 Meta me once, shame on you. Meta me twice… well, here we are again. If you're around NYC area, let’s connect — always down to chat research or the best food spots near the office. 🧠☕🥗

1/ 🎉 Excited to share our latest work — accepted to #ICLR2025 and featured in MIT News today! 🗞️ MIT News: news.mit.edu/2025/new-metho… 📄 Paper: arxiv.org/pdf/2410.04315 🧵 Thread 👇
Anthony is presenting these papers TODAY at #NeurIPS! Give him, and our other researchers, a visit! Need to know who is presenting what? Our GT @ #NeurIPS2024 website is here to help! Check it out: sites.gatech.edu/research/neuri… @gtcomputing @GTResearchNews @PoloChau @PoloDataClub
I will be presenting two papers @NeurIPSConf this week! Come and chat with me! 1. Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models 2. UniTable: Towards a Unified Framework for Table Recognition via Self-Supervised Pretraining (@TrlWorkshop)