Tzu-Heng Huang (@zihengh1)

Pinned

T

Efficient data curation is critical for modern ML. 📣 We introduce Mimic Score, a new, lightweight, model-based metric for sample utility that leverages reference model's weights to identify high-value samples and accelerate training. 🎉 Accepted as an Oral at ICML’25 DataWorld!

1

14

26

7

3.0K

T

Tzu-Heng Huang@zihengh1 · Jul 19

We will give a talk about using open-weight models as your reference to curate your dataset this afternoon 2:30pm at ICML’25 DataWorld! Check out how a new sample utility metric is derived and what’s next for Grad-Mimic!

TTzu-Heng Huang@zihengh1 · Jul 10

Efficient data curation is critical for modern ML. 📣 We introduce Mimic Score, a new, lightweight, model-based metric for sample utility that leverages reference model's weights to identify high-value samples and accelerate training. 🎉 Accepted as an Oral at ICML’25 DataWorld!

0

3

0

375

Tzu-Heng Huang Retweeted

R

Ruoming Pang@ruomingpang · Jul 17

In this report we describe the 2025 Apple Foundation Models ("AFM"). We also introduce the new Foundation Models framework, which gives app developers direct access to the on-device AFM model. machinelearning.apple.com/research/apple…

328

91

453

213

54.0K

T

Tzu-Heng Huang@zihengh1 · Jul 18

Check out our paper being presented at ICML DataWorld as an Oral talk! Amazing speaker lineup! Shout out to @zihengh1 for driving this work. On a side note, my team at Apple hiring for research engineers and research interns for data-centric ML. Reach out if you’re interested!

TTzu-Heng Huang@zihengh1 · Jul 10

Efficient data curation is critical for modern ML. 📣 We introduce Mimic Score, a new, lightweight, model-based metric for sample utility that leverages reference model's weights to identify high-value samples and accelerate training. 🎉 Accepted as an Oral at ICML’25 DataWorld!

0

2

4

0

531

T

Tzu-Heng Huang@zihengh1 · Jul 17

We are organizing a workshop tomorrow at #icml25. Come join us and checkout the latest on programmatic representation and agent learning

SShao-Hua Sun@shaohua0116 · Jul 17

Our #ICML2025 Programmatic Representations for Agent Learning workshop will take place tomorrow, July 18th, at the West Meeting Room 301-305, exploring how programmatic representations can make agent learning more interpretable, generalizable, efficient, and safe! Come join us!

0

5

26

2

2.0K

Tzu-Heng Huang Retweeted

S

Shao-Hua Sun@shaohua0116 · Jul 17

Our #ICML2025 Programmatic Representations for Agent Learning workshop will take place tomorrow, July 18th, at the West Meeting Room 301-305, exploring how programmatic representations can make agent learning more interpretable, generalizable, efficient, and safe! Come join us!

1

17

62

25

32.0K

Tzu-Heng Huang Retweeted

T

Thao Nguyen@thao_nguyen26 · Jul 17

If you are attending #ICML2025, check out our DataWorld workshop on Sat July 19. We have updated the website with more info on speakers & accepted papers! dataworldicml2025.github.io Also happy to chat offline about all things ✨ data ✨

0

18

81

15

9.0K

Tzu-Heng Huang Retweeted

A

Albert Ge@albert_ge_95 · Jul 17

I'll be traveling to ICML to present our work on data mixtures at two workshops on Saturday (DataWorld + DIG-BUGS). looking forward to attending my first in-person conference and connecting with others!

0

3

14

3

463

Tzu-Heng Huang Retweeted

J

Jason Wei@_jasonwei · Jul 16

New blog post about asymmetry of verification and "verifier's law": jasonwei.net/blog/asymmetry… Asymmetry of verification–the idea that some tasks are much easier to verify than to solve–is becoming an important idea as we have RL that finally works generally. Great examples of…

50

242

1.0K

330.0K

Tzu-Heng Huang Retweeted

H

Harit Vishwakarma@harit_v · Jul 16

Next up this morning at #ICML2025, we will be presenting our work on pseudolabeling-based semi-supervised learning (SSL). East Exhibition Hall A&B # E-1304, 11 am to 1:30 pm Paper: openreview.net/pdf?id=w4c5bLk… Pseudolabeling-based SSL relies on the model’s confidence scores and…

0

6

13

1

459

Tzu-Heng Huang Retweeted

M

Mustafa Shukor@MustafaShukor1 · Jul 15

We propose new scaling laws that predict the optimal data mixture, for pretraining LLMs, native multimodal models and large vision encoders ! Only running small-scale experiments is needed, and we can then extrapolate to large-scale ones. These laws allow 1/n 🧵

5

47

265

210

29.0K

Tzu-Heng Huang Retweeted

H

Harit Vishwakarma@harit_v · Jul 15

Join us today in the morning poster session at #ICML2025. We will talk about some neat ways for reducing uncertainty and improving LLM accuracy at test-time on multi-choice tasks (e.g., tool selection) using conformal prediction and an additional inference round. 📍 East…

0

5

11

1

523

Tzu-Heng Huang Retweeted

F

Fred Sala@fredsala · Jul 15

Heading to #ICML! I’ll be representing SprocketLab at @UWMadison and @SnorkelAI. Reach out if you want to chat about data-centric AI, data development, agents, and foundation models.

1

9

39

1

2.0K

Tzu-Heng Huang Retweeted

H

Harit Vishwakarma@harit_v · Jul 15

Link for the last paper [W1]: arxiv.org/pdf/2506.10403

0

1

3

0

136