wh

@nrehiew_

eng primarily, ml mostly, research previously

Joined October 2023

87Following

13KFollowers

wh@nrehiew_ · Jul 22

Looking at the HuggingFace configs, this is a wider/shallower model compared to Qwen3. - 62 layers vs 94 - dim 6144 vs 4096 - 160 experts vs 128 - 96 attn heads vs 64 Curious why the architectural change? Qwen3.5?

JJunyang Lin@JustinLin610 · Jul 22

this is what is not small! boys spent so much time building the Qwen3-Coder after Qwen2.5-Coder. it is much bigger, but based on MoE, and way stronger and smarter than before! not sure we can say competitive with claude sonnet 4 but might be for sure a really good coding agent.…

5.0K

wh@nrehiew_ · Jul 22

Everytime I see one of Owain’s papers, i always find them hard to wrap my head around. Fascinating work and i think it has really nice implications on watermarking too

OOwain Evans@OwainEvans_UK · Jul 22

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

2.0K

wh@nrehiew_ · Jul 22

Sonnet in Claude's webui is now using print statements in its python artifact to explain things to me. You cannot even make this type of rl deep frying up.

nrehiew_'s tweet image. Sonnet in Claude's webui is now using print statements in its python artifact to explain things to me. You cannot even make this type of rl deep frying up.

1.0K