Divya Siddarth
@divyasiddarth
collective intelligence accelerationist @collect_intel
the thing about AI that people don't understand is that it's got all these risks. but also ! all these opportunities. not to mention the risks. but ! think of the opportunities. but the risks :( but the opportunit
It's cute that everyone was once the youngest person in the world
I replicated this result, that Grok focuses nearly entirely on finding out what Elon thinks in order to align with that, on a fresh Grok 4 chat with no custom instructions. grok.com/share/c2hhcmQt…
Grok 4 decides what it thinks about Israel/Palestine by searching for Elon's thoughts. Not a confidence booster in "maximally truth seeking" behavior. h/t @catehall. Screenshots are mine.
Thomas Jefferson’s rough draft copy of the Declaration of Independence
STAGGERING: This new study of 133 countries is the first to estimate the impact of all USAID’s work. In 2 decades, it saved *92M* lives. Current cuts, if not reversed, are forecast to cost *14M* lives thru 2030. thelancet.com/journals/lance…
individual reporting for post-deployment evals — a little manifesto (& new preprints!) tldr: end users have unique insights about how deployed systems are failing; we should figure out how to translate their experiences into formal evaluations of those systems.
I was planning to launch my substack on "Human, life, AI, and future" in a few months, with something very different. I’ve been working quietly on some exciting research about AI and the future of humanity—big questions, long arcs, and some surprising ideas I was excited to share…
This week, we learned 1 in TEN adults uses AI for emotional support daily - absolutely wild. Talked about it in the #ComputerSaysMaybe podcast. themaybe.org/podcast/the-co…
As we do societal evals at CIP —public health, AI relationships, democracy, etc. across regional languages we've spent a lot of time dealing with how brittle LLM judge pipelines are. Stoked to share an open-source test suite (blog + code) we’ve built to stress-test ours before…
It's not like we can make LLMs deterministic but we can measure their quirks and design around them before deploying in high‑stakes settings. Let us know what you find: github.com/collect-intel/…
Over in Global Dialogues @collect_intel is asking the a global sample of the world: "𝖯𝖾𝗋𝗌𝗈𝗇𝖺𝗅𝗅𝗒, 𝗐𝗈𝗎𝗅𝖽 𝗒𝗈𝗎 𝖾𝗏𝖾𝗋 𝖼𝗈𝗇𝗌𝗂𝖽𝖾𝗋 𝗁𝖺𝗏𝗂𝗇𝗀 𝖺 𝗋𝗈𝗆𝖺𝗇𝗍𝗂𝖼 𝗋𝖾𝗅𝖺𝗍𝗂𝗈𝗇𝗌𝗁𝗂𝗉 𝗐𝗂𝗍𝗁 𝖺𝗇 𝖠𝖨, 𝗂𝖿 𝗍𝗁𝖾 𝖠𝖨 𝗐𝖺𝗌 𝖺𝖽𝗏𝖺𝗇𝖼𝖾𝖽…
1/10: LLM Judges Are Unreliable. Our latest blog post from @padolsey shows that positional preferences, order effects, and prompt sensitivity fundamentally undermine the reliability of LLM judges.
We're officially launching the Global Dialogues Challenge!