You Jiacheng

@YouJiacheng

a big fan of TileLang 关注TileLang喵！关注TileLang谢谢喵！ http://github.com/tile-ai/tilelang 十年老粉

Joined August 2015

2KFollowing

8KFollowers

Pinned

You Jiacheng@YouJiacheng · 12 h

I think this is an annoying marketing strategy, but not as bad as a "non-tariff barrier"

LLarry@liminalsunset_ · 21 h

Warning: the $5.9K price is for a "DEMO" unit. Price of all Unitree products is subject to their EDU Scam. I consider this EDU pricing model a form of unfair marketing competition. Let me explain (disclaimer: personal opinion, happy to correct it if needed): These robots do…

822

You Jiacheng@YouJiacheng · 9 h

what the *? incredible UX

GGoogle Labs@GoogleLabs · Jul 24

We just discovered the 🔥 COOLEST 🔥 trick in Flow that we have to share: Instead of wordsmithing the perfect prompt, you can just... draw it. Take the image of your scene, doodle what you'd like on it (through any editing app), and then briefly describe what needs to happen…

1.0K

You Jiacheng@YouJiacheng · 10 h

Cool

EEnactic@enactic_ai · Jul 24

We just open-sourced #OpenArm 01: A fully open-source humanoid arm for physical AI research and deployment in contact-rich environments. All hardware and software are now live and ready for you to build, hack, and deploy 🚀 Get started at openarm.dev #OpenSource

858

You Jiacheng Retweeted

Imken@immccn123 · Jul 25

我觉得我可能是第一个拿typst干这种事情的人

445

111

33.0K

You Jiacheng@YouJiacheng · 11 h

damn, I always have a mental model that an action of a LM should be a sequence (a turn, or until a tool call) instead of a token, but people keep telling my that token-level loss is better… Thank Qwen team for verifying my mental model, now it makes much more sense.

CChujie Zheng@ChujieZheng · Jul 25

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…

152

10.0K

You Jiacheng@YouJiacheng · 11 h

Okay, READ YOUR DATA carefully😂

XXinyu Zhou@zxytim · 11 h

you are look at the wrong image of that problem. here’s the correct one.

857

You Jiacheng@YouJiacheng · 12 h

Jason Wei @_jasonwei has told you: READ YOUR DATA.

vvik@vikhyatk · 13 h

i used to think AI2D was a small but high quality dataset, but i actually looked today and this is the very first sample

1.0K

You Jiacheng@YouJiacheng · 22 h

time to vibe coding

麦麦格@unflwMaige · Jul 25

初尝 figma，太好玩了🥺

1.0K

You Jiacheng Retweeted

Simo Ryu@cloneofsimo · Jul 24

Good paper btw

197

161

13.0K

You Jiacheng Retweeted

Simo Ryu@cloneofsimo · Jul 25

Here is what people mean by "residual network makes your gradient happy" + intuition on depth muP Gradients vanish if activations are not unit-scaled. But thats not issue if you are using residual connection! But if you don't scale down branch, your activations / backward blow…

183

155

11.0K

You Jiacheng@YouJiacheng · Jul 25

It's a good model.

QQwen@Alibaba_Qwen · Jul 25

🚀 We’re excited to introduce Qwen3-235B-A22B-Thinking-2507 — our most advanced reasoning model yet! Over the past 3 months, we’ve significantly scaled and enhanced the thinking capability of Qwen3, achieving: ✅ Improved performance in logical reasoning, math, science & coding…

1.0K

You Jiacheng@YouJiacheng · Jul 25

Unfortunately we need 1T.ai now.

LLamar@LamarDealMaker · Jul 25

one of the most exciting LLM posts i’ve read in a while read it. your future self will thank you

929

You Jiacheng@YouJiacheng · Jul 25

Bros told me this model is SOTA anime model, because natural language greatly enriches the details, something SDXL can't do.

NNeta Art@NetaArt_AI · Jul 22

🎨 We’re thrilled to officially launch Neta Lumina — the most advanced open-source anime model yet. As our 4th open-source model, Neta Lumina has achieved: 🔹 Expertly tuned for 200+ anime aesthetics including Guofeng, Furry, Pets, Scenery Shots and more niche themes 🔹…

6.0K

You Jiacheng@YouJiacheng · Jul 25

> our models are optimized with Adam > reviewer: this guy violates double blind, call for a desk reject!

AAdam Zweiger@AdamZweiger · Jul 25

everyone always asks who/what is adam. never how is adam

1.0K

You Jiacheng@YouJiacheng · Jul 25

FWIW, in ρ log ρ, log is matrix-log and multiplication is matmul.

DDmitry Rybin@DmitryRybin1 · Jul 25

RL+LLM researchers actively use LLM distribution Entropy to measure training dynamics. This number is misleading. John Von-Neumann and Lev Landau gave us the correct answer 100 years ago while studying mixed quantum states in Hilbert spaces. Usual Entropy treats all tokens as…

831

You Jiacheng@YouJiacheng · Jul 25

Another knowledge test.

YYou Jiacheng@YouJiacheng · Jul 25

Quick test. stepfun.com/share/13768625… kimi.com/share/d21pcrsc…

903

You Jiacheng@YouJiacheng · Jul 25

Quick test. stepfun.com/share/13768625… kimi.com/share/d21pcrsc…

机机器之心 JIQIZHIXIN@jiqizhixin · Jul 25

The wait is over! Meet Step 3 — the groundbreaking multimodal LLM from StepFun! 🚀 MoE architecture (321B total params, 38B active) 💡 Rivals OpenAI o3, Gemini 2.5 Pro, and Claude Opus 4 in performance 🖥️ Optimized for China’s domestic AI chips StepFun just announced: Step 3…

3.0K

You Jiacheng@YouJiacheng · Jul 25

False, they didn't control the number of parameters when comparing architectures.

机机器之心 JIQIZHIXIN@jiqizhixin · Jul 25

AlphaGo Moment for Model Architecture Discovery Paper: arxiv.org/abs/2507.18074

3.0K

You Jiacheng@YouJiacheng · Jul 25

H20 is very good for memory bound workloads, e.g. attention (SDPA) in decoding.

JJukan@Jukanlosreve · Jul 25

Rumors circulated in China today that the government is banning the use of H20 and mandating the use of domestically-produced GPUs. Cambricon, a domestic Chinese GPU developer, surged.

2.0K

You Jiacheng@YouJiacheng · Jul 25

IMO, Step3's AFD is significantly worse than this old work. AFD chooses to let attention instances to compute qkvo projection, that almost eliminates the advantage (memory bandwidth per $) of H20.

YYou Jiacheng@YouJiacheng · Jul 25

actually earlier. arxiv.org/abs/2405.01814

912