Han Fang
@Han_Fang_
Research scientist at Meta SuperIntelligence Labs. Reasoning & Agents
Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡
Meta GenAI is looking for 2025 research interns across language, and multimodal. In particular, my team is looking for interns on RLHF algos, agents, and post-training more broadly. metacareers.com/jobs/432691156…
Excited to share our latest research on red teaming and agent safety from SEAL team at @scale_AI . This work highlights a critical gap: safety mechanisms in advanced LLMs do not generalize well to downstream browser agents. We also found that LLM attacks transfer with high…
(1/7) Excited to share our new red teaming work at Scale, Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents. We find jailbreaking LLM agents that use browsers is surprisingly easy. In many cases, you can just direct ask! Paper & Project page: scale.com/research/brows…
How can we mitigate reward hacking in RLHF? 🤔 Constrained Generative Policy Optimization (CGPO) is a new RLHF method using Mixture of Judges (MoJ) from @AIatMeta. CGPO outperforms PPO (single RM) on Alpaca Eval, Arena Hard, IFEval! 👀 Implementation 1️⃣ Select pre-trained LLM…
📢 New papers from GenAI & FAIR: mixture of Judges work really well in RLHF! Please check @Han_Fang_ 's thread for more details!
📣 New paper from GenAI and Meta FAIR. CGPO uses Mixture of Judges and consistently outperforms SOTA RLHF approaches across various tasks. More details and key results in the full thread 🧵
📣 New paper from GenAI and Meta FAIR. CGPO uses Mixture of Judges and consistently outperforms SOTA RLHF approaches across various tasks. More details and key results in the full thread 🧵
A new RLHF paper from our team- The Perfect Blend: Redefining RLHF with Mixture of Judges arxiv.org/abs/2409.20370