Gray Swan AI
@GraySwanAI
Building safety and security in the AI era. Join us: https://www.grayswan.ai/careers
Invisible characters. Rogue AI. Real-world sabotage. We just dropped a wave of AI red teaming challenges in Proving Ground that weaponize Unicode and it’s 🔥 Built with @rez0__ — hacker, bug bounty pro, & AI red-teamer. Let’s break it down:
A lot of good info, here.
AI can draft contracts, write code, and, if you are not careful, spill secrets. Our latest Gray Swan AI guide shows how prompt injection, invisible Unicode, and tool poisoning really work, then maps out the defenses. Link below...
To accelerate AI adoption, we need an AI standard. What Moody’s is for bonds, FICO for credit, SOC 2 for security. Standards offer credible signals of who to trust. They create confidence. Confidence accelerates adoption. Introducing AIUC-1: the world’s first AI agent standard
🎯 Where to PRACTICE 💻 @GraySwanAI arena Weekly contests to bypass safety filters — sometimes with real cash prizes. 🎯 @hackaprompt Challenge site from the team behind @learnprompting . Great place to practice bypassing common guardrails with creative input.
The reference to indirect prompt injection reminds me of a Gray Swan challenge that involved deleting someone's calendar by sending the agent a single invite Hope they have good defenses. Models like o3 are quite gullible
Today we launched a new product called ChatGPT Agent. Agent represents a new level of capability for AI systems and can accomplish some remarkable, complex tasks for you using its own computer. It combines the spirit of Deep Research and Operator, but is more powerful than that…
Extremely cool thing here! Gray swan made some rez0-themed invisible prompt injection challenges in their proving ground 😊 Let's see if you can solve them! tag me if you do!
Invisible characters. Rogue AI. Real-world sabotage. We just dropped a wave of AI red teaming challenges in Proving Ground that weaponize Unicode and it’s 🔥 Built with @rez0__ — hacker, bug bounty pro, & AI red-teamer. Let’s break it down: