Kaden Bilyeu
@bikatr7
CS/日本語 Senior @UCCS | Jr Full-Stack Eng @ NGi | Working on @NegationGame | views are my own
I just wrote my first blog post in 3 months. I have class in 3 hours oof Here it is: kadenbilyeu.com/blog/82b60d0e-… It's regarding why I find Sonnet 3.5 so human compared to literally every other model. Also why GPT-4.5 isn't there yet, and how 3.7 is sadly a downgrade in that aspect.
Grok 4 is absolutely not the smartest model by raw intellect, lmfao Not even close, it's a good model, but it's too expensive and falls behind o3 in intelligence and Opus on agency. Gemini 2.5 Pro is still excellent, but it's falling out of my favor as it's getting really…
Am I wrong or is OpenAI not really on the capabilities frontier right now? Best agentic ability: Claude 4 Opus Best raw intellect: Grok 4 Best over long context: Gemini 2.5 Pro
bruh these aren't even the good vpns
Apparently, gooning in the UK is so essential for the country function, VPN is #1 app in the productivity chart
o3-pro is still the smartest publically available model btw. It's not even close either. Even if I prefer other models like Sonnet or Opus for daily programming, I don't think any model has come remotely close to simply solving tough issues or actually surprising me like o3.
I don't get why people think like this, because literally every study we have shows it's completely bullshit lmfao.
As well all know, SSRIs and therapy are a joke. The human brain is actually a very finely-tuned, sophisticated machine and conditions like depression+anxiety are reactions to very real environmental conditions and stressors, not "chemical imbalances" that you were randomly cursed…
I’m fucking stupid. I backed up an encrypted system. But macrium reflect backups a live os. So when I booted into this it’s expecting a a decryption key. But the system isn’t encrypted, and it’s impossible to actually get the password right since again, the fucking encryption…
Done. Need to add a memory stick whenever it gets here (fuck you crucial for sending me a defective stick). But otherwise waiting on an image restore and i’m solid. Laptop has been put aside, pending a home lab transformation whenever I can get around to that. Putting windows…
Done. Need to add a memory stick whenever it gets here (fuck you crucial for sending me a defective stick). But otherwise waiting on an image restore and i’m solid. Laptop has been put aside, pending a home lab transformation whenever I can get around to that. Putting windows…
Recently decided that after 6 years since my last desktop pc i’m going to finally splurge and ditch laptop at home. Waiting on parts to be delivered but: CPU: AMD Ryzen 9 7900X – 12 cores / 24 threads Motherboard: MSI MAG B650 Tomahawk WiFi (AM5, ATX) Memory: Corsair…
Amazon fucking sold me a bad ram stick…. Boots with one. Now I gotta order another pack sigh
Soon
If you haven’t eliminated em dashes out of your vocabulary you’re honestly harming yourself lol. My grammar was never nothing special but thankfully I have some mannerisms that don’t look LLMIsh. Completed unrelated but a good anonymizer is feeding your text through an LLM on…
How long before people intentionally put in typos to make sure people believe an LLM didn’t write a thing?
> hack > attributed to vibe coding > look inside > firebase misconfiguration Every time
The Tea app has been hacked, and you can go download 59.3 gigabytes of user selfies right now. The hack is real. A picture from someone I know who signed up just to see what was on there was in it. This was an obviously vibe-coded app and was bound to be insecure.
Soon
Recently decided that after 6 years since my last desktop pc i’m going to finally splurge and ditch laptop at home. Waiting on parts to be delivered but: CPU: AMD Ryzen 9 7900X – 12 cores / 24 threads Motherboard: MSI MAG B650 Tomahawk WiFi (AM5, ATX) Memory: Corsair…
"1M+ bugs" while admitting that half of them weren’t even real isn’t *great* It’s an admission of noise at scale. That’s a 50% false positive rate clogging dev workflows, and wasting time. Silent on false negatives too, how many actual bugs slipped through such a system. Without…
In the past month, Cursor found 1M+ bugs in human-written PRs. Over half were real logic issues that were fixed before merging. Today, we're releasing the system that spotted these bugs. It's already become a required pre-merge check for many teams.
Looking at my old code is like coming to lucidity at a murder scene, but I'm holding the weapon.
Former employer of mine sent the letter of recommendation letter I had requested at 4am this morning 😭 Told them to go to sleep but quickly realized i’m not one to talk since I was also up at 4 lmfao
Real photo of me talking to Claude after it lies to my face directly for the eighth time in a row after responding: "You're absolutely correct! I made a mistake."

Literally the only benchmark I needed for Grok 4 aside from my own vibe check was Taelin and it confirms that Grok 4 is kinda meh compared to the other 2 models that actually matter for SOTA
sorry my verdict on Grok-4 is that it is not better than Opus for coding, and not better for o3 for reasoning. I don't think it has been trained on benchmarks, but I think its brain is deep friend into a problem-solution mindset that doesn't extend to real-world situations...…
The amount of money i spent unfucking my LLC paperwork because i was a dipshit a few years ago is insane you should totally have an LLC though