William Wale

@williawa

I use this profile to get news and to connect (argue (bicker)) with people. Interests: AI (Safety), meditation, philosophy, mathematics, algorithms

Joined December 2024

234Following

201Followers

William Wale@williawa · Jul 25

hahaha How do you design and implement a comprehensive multi-agency task force investigating a series of seemingly disparate cold cases – involving unsolved disappearances, historical property disputes, localized environmental anomalies exhibiting spectral transience, and…

264

William Wale Retweeted

Riley Goodside@goodside · Jul 13

Grok 4 Heavy ($300/mo) returns its surname and no other text:

913

3.0K

99.0K

7.0K

8.0M

William Wale@williawa · Jul 12

Has anyone tried to measure compatibility of internal representations of different trained LLMs by training a direct translator between them? The way I thought about it is like this: Have two similar-powered networks A and B, Chop off the latter half of A and get A', chop off…

873

William Wale@williawa · Jul 10

Has anyone done an empirical study on the degradation of problem solving ability of LLMs as other tasks are put in context? Like if you stuff claude,o3,grok in a software engineering environment, let it cook for a bit, then you hit it with a AIME question. How well does it solve…

202

William Wale@williawa · Jul 10

I've tried Grok 4 a fair bit in cursor and in the grok app now, and I think its the best model around. Or like, it pareto dominates google and openAI models (except for low-cost tasks), and is probably overall better than Opus, but not for all tasks. First impressions: Upsides:…

2.0K

William Wale@williawa · Jul 6

user What is 5839678476345 + 20827572034? Give just the answer. assistant ```love diagram: ________________________________ **.Look** � nadioneyilotiva IranıBootTestadin, 　　，•define,in[v]. _ trách撤not /// *.ismeteg|--aupdln. 500m qwen, in…

196

William Wale@williawa · Jul 4

What is it doing??

170

William Wale@williawa · Jun 29

o3-mini was released Jan31, full o3 was released 16 Apr, 75 days later. o4-mini was released 16 Apr, now its just about 30 June, 75 days later. I think o4 is quite egregiously cooked on RL and has the same problems as o3 but more overt and severe. Therefore I predict it will…

467

William Wale@williawa · Jun 19

Is there anyone who has tried to apply NTK and like Greg Yang Tensor Programs to mechinterp? My assumption would be that no, it would not work, because mechinterp is about "interesting bits of structure", and and almost the point is that all such structure is destroyed when you…

506

William Wale@williawa · Jun 19

I've finally gotten to the point in life where my priors are so good I can just ignore any new "evidence" that appears and just follow the intuitions I've developed and I'll always be right.

794

William Wale Retweeted

Delip Rao e/σ@deliprao · Jun 19

Agents training in RL gym

129

2.0K

258

83.0K

William Wale@williawa · Jun 8

oh well then

762

William Wale@williawa · Jun 3

I think "Can it solve Outer Wilds" is a pretty good AGI test. (Assuming we can filter the info about the game from the training data). It requires: 1. Dexterity and quick visual-kinetic reasoning 2. Is quite open ended, it doesn't have many clearly defined reward signals. Like…

385

William Wale@williawa · May 27

Apparently the word “bear” is like descended from a euphemism, because people were scared saying the actual name for the animal would summon it. Seems people sometimes do magical thinking like that. Not just this example. I wonder in what situations that line of thinking…

397