generatorman
@generatorman_ai
every bit counts
GPT is not a next token predictor. The last FFN layer of GPT is a next token predictor. All earlier layers are future tokens predictors, more so when trained at longer context lengths.
acquihire is a meaningful concept only under chattel slavery
this windsurf witch-hunt is silly. starting a business does not put you under any obligation to continue working for that business over better career opportunities and you are educated enough to know this. he didn't steal the company. run it yourself and stop whining.
this windsurf witch-hunt is silly. starting a business does not put you under any obligation to continue working for that business over better career opportunities and you are educated enough to know this. he didn't steal the company. run it yourself and stop whining.
crazy how entire industries threw everything they had against torrent piracy for years and didn't even make a dent
most space operas include forerunner civilizations, ancients. it's a comforting idea. a scarier idea is that we are solitary, special. but maybe it's scarier. we are the forerunners. we are the elder race. it has fallen to us to make what will echo through the corridors of time.
attention sparsity through RL 馃檶
[LG] Reframing attention as a reinforcement learning problem for causal discovery T Orujlu, C Gumbsch, M V. Butz, C M Wu [University of T眉bingen & University of Amsterdam] (2025) arxiv.org/abs/2507.13920
weird for language models to have such minimal mechanisms for doing something so rare...
1+1=3 2+2=5 3+3=? Many language models (e.g., Llama 3 8B, Mistral v0.1 7B) will answer 7. But why? We dig into the model internals, uncover a function induction mechanism, and find that it鈥檚 broadly reused when models encounter surprises during in-context learning. 馃У
git blame complete
In retrospect this post is so important You chose the left one didn鈥檛 you? 馃う
Not taking any chances with First Amendment challenges, narrowly tailored to federal contracts. Not gunning for the national ban "fairness doctrine" nuclear option. They might try blacklisting providers, but I wouldn't expect that to stand.
'Update Federal procurement guidelines to ensure that the government only contracts with frontier large language model (LLM) developers who ensure that their systems are objective and free from top-down ideological bias.' there is an executive order on this arriving today.
They're gonna put trackers in your GPUs.
馃憖 'explore leveraging new and existing location verification features on advanced Al compute to ensure that the chips are not in countries of concern.'
steganography doesn't just exist, it's more ubiquitous than microplastics. every time you train on synthetic datasets you're opening invisible sidechannels. this has been an astounding research programme from @OwainEvans_UK. banger after banger.
New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 馃У
reasoning is for losers. if the goddess of death is not whispering the truth into your dreams you're ngmi.
Proving beyond the shadow of a doubt that @OpenAI's edits are now functionally useless. I have memory and training turned OFF. Yet GPT remembers past the edit point. This is terrible design, and means you have to kill your chats even more often, because edits are pointless.
I *really* hate that when you edit a conversation with GPT, now, it retains all of the conversation below the edit point as additional context. Legitimately defeats the purpose of having edits and conversation forks.