William Wale
@williawa
I use this profile to get news and to connect (argue (bicker)) with people. Interests: AI (Safety), meditation, philosophy, mathematics, algorithms
hahaha How do you design and implement a comprehensive multi-agency task force investigating a series of seemingly disparate cold cases – involving unsolved disappearances, historical property disputes, localized environmental anomalies exhibiting spectral transience, and…
Grok 4 Heavy ($300/mo) returns its surname and no other text:
Has anyone tried to measure compatibility of internal representations of different trained LLMs by training a direct translator between them? The way I thought about it is like this: Have two similar-powered networks A and B, Chop off the latter half of A and get A', chop off…
Has anyone done an empirical study on the degradation of problem solving ability of LLMs as other tasks are put in context? Like if you stuff claude,o3,grok in a software engineering environment, let it cook for a bit, then you hit it with a AIME question. How well does it solve…
I've tried Grok 4 a fair bit in cursor and in the grok app now, and I think its the best model around. Or like, it pareto dominates google and openAI models (except for low-cost tasks), and is probably overall better than Opus, but not for all tasks. First impressions: Upsides:…
user What is 5839678476345 + 20827572034? Give just the answer. assistant ```love diagram: ________________________________ **.Look** � nadioneyilotiva IranıBootTestadin, ,•define,in[v]. _ trách撤not /// *.ismeteg|--aupdln. 500m qwen, in…
o3-mini was released Jan31, full o3 was released 16 Apr, 75 days later. o4-mini was released 16 Apr, now its just about 30 June, 75 days later. I think o4 is quite egregiously cooked on RL and has the same problems as o3 but more overt and severe. Therefore I predict it will…
Is there anyone who has tried to apply NTK and like Greg Yang Tensor Programs to mechinterp? My assumption would be that no, it would not work, because mechinterp is about "interesting bits of structure", and and almost the point is that all such structure is destroyed when you…
I've finally gotten to the point in life where my priors are so good I can just ignore any new "evidence" that appears and just follow the intuitions I've developed and I'll always be right.
I think "Can it solve Outer Wilds" is a pretty good AGI test. (Assuming we can filter the info about the game from the training data). It requires: 1. Dexterity and quick visual-kinetic reasoning 2. Is quite open ended, it doesn't have many clearly defined reward signals. Like…
Apparently the word “bear” is like descended from a euphemism, because people were scared saying the actual name for the animal would summon it. Seems people sometimes do magical thinking like that. Not just this example. I wonder in what situations that line of thinking…