wh
@nrehiew_
eng primarily, ml mostly, research previously
Looking at the HuggingFace configs, this is a wider/shallower model compared to Qwen3. - 62 layers vs 94 - dim 6144 vs 4096 - 160 experts vs 128 - 96 attn heads vs 64 Curious why the architectural change? Qwen3.5?
this is what is not small! boys spent so much time building the Qwen3-Coder after Qwen2.5-Coder. it is much bigger, but based on MoE, and way stronger and smarter than before! not sure we can say competitive with claude sonnet 4 but might be for sure a really good coding agent.…
Everytime I see one of Owain’s papers, i always find them hard to wrap my head around. Fascinating work and i think it has really nice implications on watermarking too
New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
Sonnet in Claude's webui is now using print statements in its python artifact to explain things to me. You cannot even make this type of rl deep frying up.
