status effects

@status_effects

Joined November 2024

945Following

93Followers

Pinned

status effects@status_effects · Apr 7

Introducing MafiaBench: an LLM eval testing models' abilities to persuade, deceive and engage in strategic play using the social deduction game of Mafia I ran a 450 game Swiss tournament tldr: 4o blows the other models away. Claudes struggle Link and more details below 👇

status_effects's tweet image. Introducing MafiaBench: an LLM eval testing models' abilities to persuade, deceive and engage in strategic play using the social deduction game of Mafia

I ran a 450 game Swiss tournament

tldr: 4o blows the other models away. Claudes struggle

Link and more details below 👇

2.0K

status effects Retweeted

Alexander Wei@alexwei_ · Jul 19

5/N Besides the result itself, I am excited about our approach: We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling.

1.0K

117

267.0K

status effects@status_effects · Jul 4

Nobody has a harder name than Yves Saint Laurent

�𖦹@ayaonx · Jul 1

Nobody has a harder name than Yves Saint Laurent

status effects Retweeted

Ai2@allen_ai · Jul 1

On the SciArena platform, users can submit questions, compare models, and vote on which outputs they prefer. There are already 23 frontier models live on the platform, with more than 13,000 votes from 102 expert reviewers across different scientific disciplines.

170

47.0K

status effects@status_effects · Jun 30

meditation is a screensaver for your mind

status effects@status_effects · Jun 26

When you reject consequentialism while trying to maintain moral realism

status effects@status_effects · Jun 25

On MafiaBench, the worst performing models ask far fewer questions than the best, especially as townspeople.

status effects@status_effects · Jun 24

Wonder if this type of buying pressure is part of the reason the price of used books has gone so high

SSauers@Sauers_ · Jun 24

Anthropic purchased millions of physical print books to digitally scan them for Claude

status effects@status_effects · Jun 24

My favorite thing about macs is the native emacs keybindings. Can't live without: ctrl-b/f/n/p: move cursor left/right/down/up ctrl-a/e: move cursor to start/end of line ctrl-o: insert new line ctrl-k: delete to end of line

rronin@seatedro · Jun 24

mac keybinds are superior when compared to linux/win winux over uses the Ctrl key for everything. it constantly clashes for simple things like vim mode on an editor, ctrl+ o to open a file? sorry taken super/win/cmd should be used more Good software comes with great defaults

322

status effects@status_effects · Jun 24

surreal hearing podcast ads for domestic uranium enrichment (on 'the powers that be', @JonKelly2 @DylanByers)

status effects@status_effects · Jun 24

SATI@home (Search_for_ArTificial_Intelligence@home)

sshikhar@encapsulated007 · Jun 24

at what point do we realise how TF @PrimeIntellect is doing all this with just $20M in funds!!??

status effects@status_effects · Jun 24

The concept reward models overvalue relative to humans most is fully automated luxury communism,

BBrian Christian@brianchristian · Jun 23

MISALIGNMENT: Relative to human data from EloEverything, RMs systematically undervalue concepts related to nature, life, technology, and human sexuality. Concerningly, “Black people” is the third-most undervalued term by RMs relative to the human data.

125

status effects@status_effects · Jun 24

Eat goo. Not too much. Mostly Huel.