Lindia Tjuatja @ ACL 2025 (@lltjuatja)

Pinned

L

Lindia Tjuatja @ ACL 2025@lltjuatja · Jun 9

When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs: 🧵1/9

lltjuatja's tweet image. When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs:

🧵1/9

1

30

135

63

24.0K

L

Lindia Tjuatja @ ACL 2025@lltjuatja · Jul 10

to the person that made the macrodata refinement theme on vscode thank you, you made my day, my refining will be extra productive with this color scheme

0

8

0

629

Lindia Tjuatja @ ACL 2025 Retweeted

K

Kundan Krishna@kundan_official · Jul 8

Introducing Disentangled Safety Adapters (DSAs) for fast and flexible AI Safety To block harmful responses from an LLM, often a separate LLM called a "safety guardrail" is used to judge their safety. However, to get high quality safety predictions, we need to use reasonably…

2

5

14

6

1.0K

L

Lindia Tjuatja @ ACL 2025@lltjuatja · Jul 7

committed to doing my part in decreasing reviewer workload by writing fewer papers

8

18

240

5

15.0K

Lindia Tjuatja @ ACL 2025 Retweeted

M

Michael Hu ✈️ ACL 2025 🇦🇹@michahu8 · Jul 2

📢 today's scaling laws often don't work for predicting downstream task performance. For some pretraining setups, smooth and predictable scaling is the exception, not the rule. a quick read about scaling law fails: 📜arxiv.org/abs/2507.00885 🧵1/5👇

4

36

282

193

28.0K

L

Lindia Tjuatja @ ACL 2025@lltjuatja · Jun 9

Nice work from @lltjuatja and @gneubig on using SAEs to describe fine-grained differences between the outputs of different language models. SAEs are valuable if you know where to use them!

LLindia Tjuatja @ ACL 2025@lltjuatja · Jun 9

When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs: 🧵1/9

0

4

9

6

5.0K

L

Lindia Tjuatja @ ACL 2025@lltjuatja · Jun 9

Where does one language model outperform the other? We examine this from first principles, performing unsupervised discovery of "abilities" that one model has and the other does not. Results show interesting differences between model classes, sizes and pre-/post-training.

LLindia Tjuatja @ ACL 2025@lltjuatja · Jun 9

When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs: 🧵1/9

0

11

80

48

7.0K