Lion
@dwlion
Graphic/ Motion Designer Interested in AI safety, effective communication Creating useful content to boost the AI Alignment discourse. My link below
LLMs can do RL via in-context learning to red-team text and image generators! RL mediated by in-context learning seems appealing because the process can be easily steered with prompting & all of the useful knowledge and inductive biases that LLMs have. arxiv.org/abs/2308.04265
Mother of all LLM jailbreaks: Automatically constructing adversarial prompts using OSS model (Vicuna) weights that work against ChatGPT, Bard, Claude, and Llama 2 Screenshots: Demo of response without/with jailbreak suffix Linked thread from lead author has details/PDF
Mother of all LLM jailbreaks: Automatically constructing adversarial prompts using OSS model (Vicuna) weights that work against ChatGPT, Bard, Claude, and Llama 2 Screenshots: Demo of response without/with jailbreak suffix Linked thread from lead author has details/PDF
"Leading AI researchers refer to this as their Oppenheimer moment. It's a cautionary tale." -Christopher Nolan When you see Oppenheimer, remember: “If you were at the Manhattan Project in 1944, you'd have thought 1) the world would end, or 2) surely, every country would have…
Christopher Nolan: "Leading researchers in AI right now refer to this as their #Oppenheimer moment— It's a cautionary tale...and I take heart that they're looking to it to at least have awareness that there is accountability for those who put new technology out to the world."
We all need to join in a race for AI safety. In the coming weeks, Anthropic will share more specific plans concerning cybersecurity, red teaming, and responsible scaling, and we hope others will move forward swiftly as well. whitehouse.gov/briefing-room/…
Agree, we need regulation against making agentic systems, and for providing public APIs for systems that can easily be turned into agents. And for that, we may need a clearer, more operational definition of agency alignmentforum.org/posts/Qi77Tu3e…
Drug company: I invented a drug that will solve every health problem, I’m putting it in the planet’s water supply ASAP Society: You're...WHAT Drug company: Actually, we started putting it in the water supply months ago, many people like it so far. You can’t slow it down because…
The true state of AI development is so insane that if you tell it to someone outside tech without quoting authorities they think you're lying.
Paper release! We’re pleased to announce the release of “Opportunities and Risks of LLMs for Scalable Deliberation with Polis”, the result of six months of collaboration with @AnthropicAI to test hypotheses. Results and discussion follow in this 🧵 arxiv.org/abs/2306.11932
We collaborated with @compdem to research the opportunities and risks of augmenting the Pol.is platform with language models (LMs) to facilitate open and constructive dialogue between people with diverse viewpoints.
Paper release! We’re pleased to announce the release of “Opportunities and Risks of LLMs for Scalable Deliberation with Polis”, the result of six months of collaboration with @AnthropicAI to test hypotheses. Results and discussion follow in this 🧵 arxiv.org/abs/2306.11932
Good stuff
Biden to meet with experts about the dangers of AI on visit to San Francisco latimes.com/california/sto…
In this week’s AI Safety Newsletter, we discuss: - How AI could enable bioterrorism - Britain’s global summit on AI - The letter to Meta AI from Senators Hawley and Blumenthal buff.ly/3N7eBaz (🧵here)
A collection of the discussions about AI and AI safety. Worth consideration.
Dagan Shani has IMHO made the most important film of the year - about the harsh #AI truth - see it right here on Twitter:
This is a pretty cool survey and consensus tool - reminds me of the open letter just recently signed on AI. viewpoints.xyz/polls/7vzdfwfd…
I will be attending my first @FAccTConference to present our work! Feel free to reach out to chat about anything! Very interested in conversations around anticipating harms and asserting more public control over AI development.
1/ Are AI systems “just another tool”? Not when we give them agency! Our new @FAccTConference paper analyzes how distinctive harms might arise as we increase AI agency along 4 axes: (1) Underspecification (2) Directness of impact (3) Goal-directedness (4) Long-term planning
AgentForge channel is live! Helped set up the branding and marketing - we're working on building out an MVP based on the original hackathon entry! Stay tuned for more
Inspired by Anthropic's Constitutional AI and David Shapiro's Heuristic Imperatives, we believe ETHOS represents a first step towards making autonomous agents safer to use. Finalist in the Lablab.ai Autonomous Agents Hackathon Watch here! youtu.be/SL7f6WX20Ks
Here's a screenshot from the part of the post where the DeepMind team summarizes what they found to be the consensus threat model from their research:
Here are three policy proposals from the AI ethics community that we believe would improve safety! 🧵 safe.ai/post/three-pol…
Meta released its advanced AI model, LLaMA, w/seemingly little consideration & safeguards against misuse—a real risk of fraud, privacy intrusions & cybercrime. Sen. Hawley & I are writing to Meta on the steps being taken to assess & prevent the abuse of LLaMA & other AI models.
We just put out a statement: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” Signatories include Hinton, Bengio, Altman, Hassabis, Song, etc. safe.ai/statement-on-a… 🧵 (1/6)
A great breakdown of the current state of AI and the major labs' approach to safety and governance. The last line in the video is salient. 'Governing Superintelligence' - Synthetic Pathogens, The Tree of Thought... youtu.be/irLn5-pTkL0 via @YouTube