Yian Zhang

@zhang_yian

Language and more. Prev @stanfordnlp @CILVRatNYU @SiebelScholars Class of 2023 Opinions are my own.

Joined October 2020

146Following

191Followers

Yian Zhang@zhang_yian · Jul 26

We want to set a SUPER high bar for OAI's open-source release 😉

NNVIDIA AI Developer@NVIDIAAIDev · Jul 26

📣 Announcing Llama Nemotron Super v1.5 📣 This release pushes the boundaries of reasoning model capabilities at the weight class of the model and is ready to power agentic applications from individual developers, all the way to enterprise applications. 📈 The Llama Nemotron…

2.0K

Yian Zhang Retweeted

NVIDIA AI Developer@NVIDIAAIDev · Jun 6

👀 Nemotron-H tackles large-scale reasoning while maintaining speed -- with 4x the throughput of comparable transformer models.⚡ See how #NVIDIAResearch accomplished this using a hybrid Mamba-Transformer architecture, and model fine-tuning ➡️ nvda.ws/43PMrJm

126

28.0K

Yian Zhang Retweeted

clem 🤗@ClementDelangue · May 7

Nvidia is currently #1 trending open model & #1 trending open dataset and closed to 25,000 followers on Hugging Face. They've been really impactful for open-source AI recently!

409

63.0K

Yian Zhang@zhang_yian · May 6

Open recipe and open data for training the best open model.

AAK@_akhaliq · May 5

Nvidia dropped Llama-Nemotron on Hugging Face Efficient Reasoning Models

176

Yian Zhang Retweeted

NVIDIA AI Developer@NVIDIAAIDev · Apr 8

🎊 Llama Nemotron Ultra 253B is here 🎊 ✅ 4x higher inference throughput over DeepSeek R1 671B 🏆Highest accuracy on reasoning benchmarks: 💎 GPQA-Diamond for advanced scientific reasoning 💎 AIME 2024/25 for complex math 💎 LiveCodeBench for code generation and completion…

356

125

84.0K

Yian Zhang@zhang_yian · Apr 8

Probably the best open model at the moment

OOleksii Kuchaiev@kuchaev · Apr 8

We are excited to release Llama-Nemotron-Ultra! This is a reasoning ON/OFF, dense 253B model. Open weights and post-training data. huggingface.co/nvidia/Llama-3… We started with llama-405B, changed it via NAS pruning then followed by reasoning-focused post-training: SFT + RL in FP8.

272

Yian Zhang@zhang_yian · Mar 21

New on LMArena: @Nvidia's Llama-3.3-Nemotron-Super-49B-v1 lands at #14! A powerful open reasoning model—top-15 overall, excelling in math, with an openly released 15M post-training dataset. Congrats to the @NvidiaAI Nemo team for this fantastic contribution to the open…

OOleksii Kuchaiev@kuchaev · Mar 18

We are excited to release new Llama-Nemotron models. These models allow you to set reasoning ON/OFF during runtime. We also release all the post-training data under CC-BY-4! Try it now on build.nvidia.com/nvidia/llama-3… HF collection: huggingface.co/collections/nv…

197

37.0K

Yian Zhang Retweeted

Oleksii Kuchaiev@kuchaev · Mar 18

195

51.0K

Yian Zhang Retweeted

Oleksii Kuchaiev@kuchaev · Feb 4

Our team put together a unified mathematical framework to analyze popular model alignment algorithms. “Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment” arxiv.org/pdf/2502.00203

6.0K

Yian Zhang Retweeted

rishi@RishiBommasani · Dec 20

Today, HELM was recognized by @TmlrOrg with its best paper award! The true success of HELM has been the sustained maintenance, growth, and impact led by @yifan_mai @percyliang 5k commits, 2k PRs, 2k stars, 1k citations, 11 leaderboards, 20 partner orgs crfm.stanford.edu/helm/

105

13.0K

Yian Zhang Retweeted

Alex Warstadt@a_stadt · Nov 8

I'm excited to announce my new lab: UCSD's Learning Meaning and Natural Language Lab. a.k.a. LeM🍋N Lab! And 📢WE ARE RECRUITING📢 PhD students to join us in sunny San Diego in either Linguistics OR Data Science. Apply by Dec 4: connect.grad.ucsd.edu/apply/ More about the lab👇

450

144

38.0K

Yian Zhang Retweeted

Peng Qi@qi2peng2 · Nov 11

Modern LLMs can be both creative (e.g., write poems) and grounded (e.g., QA w/ documents). Do these actually work well together? Our #EMNLP2024 paper ("𝐃𝐚𝐧𝐜𝐢𝐧𝐠 𝐢𝐧 𝐂𝐡𝐚𝐢𝐧𝐬") finds that faithfulness and instruction following inherently counteract each other.

9.0K

Yian Zhang Retweeted

Percy Liang@percyliang · Oct 14

Position: When a foundation model developer reports a test score, they should report the corresponding train-test overlap. Does this happen? Based on public documentation, only 9/30 language models have train-test overlap for the test sets they report on (or have open data).

217

31.0K

Yian Zhang Retweeted

rishi@RishiBommasani · Oct 14

For evaluations to be useful, we need to understand train-test overlap. The norm should be that model developers report train-test overlap. Read our paper that argues for this and more, led by Andy Zhang: arxiv.org/abs/2410.08385

10.0K

Yian Zhang Retweeted

Bryan Catanzaro@ctnzr · Jun 14, 2024

Nemotron-4-340B-Instruct: * Aligned using 98% synthetic data * 28.19% : 46.57% : 25.24% win/tie/loss against GPT-4-1106-preview on our eval set with human raters

3.0K

Yian Zhang Retweeted

Percy Liang@percyliang · Feb 19, 2024

Until now, HELM has evaluated LMs with on short responses, where evaluation is simple. We now introduce HELM Instruct, which evaluates open-ended instruction following. We evaluate 4 models on 7 scenarios using 4 evaluators against 5 criteria:

150

39.0K