YIFENG LIU

@YIFENGLIU_AI

CS Ph.D. student on LLM @ UCLA AGI Lab. Previous works: RPG, MARS, TPA, Kimi-1.5....

Los Angeles

Joined April 2024

25Following

168Followers

YIFENG LIU@YIFENGLIU_AI · Jul 14

Why does CANADA try to prevent AI researchers from attending conferences in Cadana? I doubt whether Canada wants to develop their AI industry. Why does CANADA try to prevent ones named for maple to enter Canada? I doubt whether Canadians love maples.

190

YIFENG LIU@YIFENGLIU_AI · Jun 11

Which optimizer (from 100+ optimizers for DL models) is best for training Large Language Models? 🤔 github.com/lauyikfung/A-S…

YIFENGLIU_AI's tweet image. Which optimizer (from 100+ optimizers for DL models) is best for training Large Language Models? 🤔

github.com/lauyikfung/A-S…

733

YIFENG LIU@YIFENGLIU_AI · Apr 28

The only thing that’s certain is that MLA has been abandoned for good reason. People should be using TPA instead.

TTeortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex · Apr 28

Qwen-3-MoE vs DeepSeek V2 (original) their designs are superficially similar – but different This will be a very interesting test of a few scaling laws

3.0K

YIFENG LIU Retweeted

Huizhuo Yuan@HuizhuoY · Feb 11

🚀 Introducing MARS v2: Make Variance Reduction Shine! Tired of AdamW being the default optimizer for training large models? What if variance reduction could finally outperform it? MARS is here to change the game! Now with its 2nd version, we’ve refined its core idea and…

113

23.0K

YIFENG LIU@YIFENGLIU_AI · Jan 28

Scaling law is good, but anti-scaling-law research is the future. arxiv.org/abs/2501.12948

134

YIFENG LIU Retweeted

Kimi.ai@Kimi_Moonshot · Jan 20

🚀 Introducing Kimi k1.5 --- an o1-level multi-modal model -Sota short-CoT performance, outperforming GPT-4o and Claude Sonnet 3.5 on 📐AIME, 📐MATH-500, 💻 LiveCodeBench by a large margin (up to +550%) -Long-CoT performance matches o1 across multiple modalities (👀MathVista,…

117

432

2.0K

736

413.0K

YIFENG LIU Retweeted

Y@yifan_zhang_ · Jan 14

12/ Joint work with @yifan_zhang_, @YIFENGLIU_AI, @HuizhuoY, Zhen Qin, Yang Yuan, @QuanquanGu, and Andrew Chi-Chih Yao. Incredible work by an outstanding team!

908

YIFENG LIU Retweeted

Y@yifan_zhang_ · Jan 14

11/ Closing Remarks: “Tensor Product Attention Is All You Need” reforms attention as dynamic, low-rank factorization. If you need to push context lengths or want more efficient large language models, TPA is your solution. Check out our code at: github.com/tensorgi/T6.

980