Zhe Ye
@0xlf_
PhD student @BerkeleyRDI | CN @LEAFERx
1/🧵Introducing VERINA: a high-quality benchmark for verifiable code generation. As LLMs are increasingly used to generate software, we need more than just working code--We need formal guarantees of correctness. VERINA offers a rigorous and modular framework for evaluating LLMs…
The first half of 2025 is all about reasoning models. The second half? It’s about agents. At Agentica, we’re thrilled to launch two major releases: 1. DeepSWE, our STOA coding agent trained with RL that tops SWEBench leaderboard for open-weight models. 2. rLLM, our agent…
🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE…
Join us at Agentic AI Summit 2025 — August 2 at UC Berkeley, with ~2,000 in-person attendees and the leading minds in AI. Building on the momentum of the 25K+ LLM Agents MOOC community, this is the largest and most cutting-edge event on #AgenticAI. As 2025 emerges as the Year of…
Join us at DeFi’25: Workshop on Decentralized Finance & Security, Co-located with ACM CCS 2025 on October 17, 2025. Submission deadline: July 21, 2025 (AoE) Thanks to our incredible program committee & chairs for making this happen: @yaish_aviv @christoftorres @alexcryptan…
Sparsity can make your LoRA fine-tuning go brrr 💨 Announcing SparseLoRA (ICML 2025): up to 1.6-1.9x faster LLM fine-tuning (2.2x less FLOPs) via contextual sparsity, while maintaining performance on tasks like math, coding, chat, and ARC-AGI 🤯 🧵1/ z-lab.ai/projects/spars…
🔓 99+% of Ethereum contracts are closed-source. We built an LLM that decompiles their bytecode — and exposes what’s inside. Readable. Auditable. Battle-tested. Not a toy. Try it now 👉 evmdecompiler.com 📄 arxiv.org/abs/2506.19624 w/ @mercuryheavens @lzhou1110…
🚨 New study on LLM's reasoning boundary! Can LLMs really think out of the box? We introduce OMEGA—a benchmark probing how they generalize: 🔹 RL boosts accuracy on slightly harder problems with familiar strategies, 🔹 but struggles with creative leaps & strategy composition. 👇
📢 Can LLMs really reason outside the box in math? Or are they just remixing familiar strategies? Remember DeepSeek R1, o1 have impressed us on Olympiad-level math but also they were failing at simple arithmetic 😬 We built a benchmark to find out → OMEGA Ω 📐 💥 We found…
1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity. In our latest work: 🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects 💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars 🤖…
📷 Vulnerability Detected & Automated PoC! 📷 Using our fuzzer, we automatically detected a smart contract exploit that profited 5.574 ETH from a recent attack. Original attack tx hash: 0x7f2540af4a1f7b0172a46f5539ebf943dd5418422e4faa8150d3ae5337e92172. Root cause: The…
The @acm_ccs Workshop on Decentralized Finance and Security 2024 (defi.security) @defi_workshop is again fast approaching, and we can't wait to read your groundbreaking research! ⏰ Only 2 weeks left to submit! ⏰
How to fool your users with your deployed SNARK application? 🧵 1/7
🎓The Web3 Scholars Conference (WSC) 2024, hosted by @DRK_Lab, will take place on April 9, 2024 in Hong Kong. #WSC2024 🗓️Time: April 9, 2024 📍Venue: Stage 2, Hall3FG, Hong Kong Convention and Exhibition Centre 👉Register here: web3scholar.org