Srini Iyer (@sriniiyer88)

Pinned

S

Srini Iyer@sriniiyer88 · Dec 13

New paper! Byte-Level models are finally competitive with tokenizer-based models with better inference efficiency and robustness! Dynamic patching is the answer! Read all about it here: dl.fbaipublicfiles.com/blt/BLT__Patch… (1/n)

1

22

87

32

16.0K

S

Srini Iyer@sriniiyer88 · Jul 3

Turns out, if you teach llamas how to self-reflect and backtrack from wrong reasoning paths, it does extra well on math reasoning! - MATH 500: 65.8% ➡️ 81.8% - AMC 23: 37.5% ➡️ 64.4% - AIME 24: 10% ➡️ 30% Amazing work by @danieljwkim, can be a nice long weekend read!

JJoongwon Kim@danieljwkim · Jul 3

Can we improve Llama 3’s reasoning abilities through post-training only? Introducing ASTRO, our new framework that teaches LLMs to perform in-context search and generate long CoT to solve math problems, via SFT and RL. Work done at @aiatmeta. 📄 Paper: arxiv.org/abs/2507.00417

1

12

67

32

4.0K

S

Srini Iyer@sriniiyer88 · May 9

This is exciting! Check out our new step-by-step playbook that shows how to do MoT on top of your existing transformer implementation! Also, MoT is now in TMLR! Huge congrats to @liang_weixin, @VictoriaLinML and others!

WWeixin Liang@liang_weixin · May 9

🎉 Excited to share: "𝐌𝐢𝐱𝐭𝐮𝐫𝐞-𝐨𝐟-𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬 (𝐌𝐨𝐓)" has been officially accepted to TMLR (March 2025) and the code is now open-sourced! 📌 GitHub repo: github.com/facebookresear… 📄 Paper: arxiv.org/abs/2411.04996 How can we reduce pretraining costs for…

1

0

4

0

534

Srini Iyer Retweeted

J

Jeff Wang 👨‍🚀@jffwng · Apr 17

We just released model weights for our 1B & 8B-parameter BLT: Byte Latent Transformer, token-less model with sig. improvements in inference efficiency and robustness Model on @huggingface: huggingface.co/facebook/blt Code: github.com/facebookresear… Paper: arxiv.org/abs/2412.09871

12

77

472

274

35.0K

Srini Iyer Retweeted

D

Dr. Pedro Rodriguez @[email protected]@EntilZhaPR · Apr 17

By popular demand (see our GH issues 😅), we're releasing 1B and 8B weights for our BLT models! We're also hard at work at adding BLT to HF transformers! Model Weights: huggingface.co/facebook/blt Code + Instructions for loading weights: github.com/facebookresear…

0

5

19

6

1.0K

Srini Iyer Retweeted

A

AI at Meta@AIatMeta · Apr 17

🚀 Meta FAIR is releasing several new research artifacts on our road to advanced machine intelligence (AMI). These latest advancements are transforming our understanding of perception. 1️⃣ Meta Perception Encoder: A large-scale vision encoder that excels across several image &…

55

228

977

387

161.0K

Srini Iyer Retweeted

G

Gargi Ghosh@gargighosh · Apr 17

Excited to share that we are open sourcing BLT model weights by popular demand(Code was open sourced already): github.com/facebookresear… ai.meta.com/blog/meta-fair… paper: arxiv.org/pdf/2412.09871

2

5

26

9

2.0K

S

Srini Iyer@sriniiyer88 · Jan 17

We're hiring PhD interns for Summer 2025 in Seattle to work with us on improving BLT even more! If this is something that excites you, reach out to me on dm/email asap!

AAI at Meta@AIatMeta · Dec 27

New from Meta FAIR — Byte Latent Transformer: Patches Scale Better Than Tokens introduces BLT, which for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency & robustness. Paper ➡️ go.fb.me/w23lmz

4

28

315

237

48.0K

S

Srini Iyer@sriniiyer88 · Dec 28

BLT related post by Meta AI - eliminate all tokenization once and for all!

AAI at Meta@AIatMeta · Dec 27

New from Meta FAIR — Byte Latent Transformer: Patches Scale Better Than Tokens introduces BLT, which for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency & robustness. Paper ➡️ go.fb.me/w23lmz

0

2

11

1

2.0K

Srini Iyer Retweeted

D

Dimitri Zhorzholiani@dimitrizho · Dec 14

Meta's Byte Latent Transformer (BLT) paper looks like the real-deal. Outperforming tokenization models even up to their tested 8B param model size. 2025 may be the year we say goodbye to tokenization.

0

1

3

1

701

Srini Iyer Retweeted

E

Edrick🕗@edkesuma · Dec 14

Gm. Woke up to a new paper on Byte Latent Transformers (BLT). Now you can increase model size without increasing inference compute by tweaking *patch sizes*. Great day for LLMs. Full article: ai.meta.com/research/publi…

3

2

12

3

2.0K

Srini Iyer Retweeted

P

Power System Automation@PowerSystemAuto · Dec 14

Meta AI's Byte Latent Transformer (BLT) is revolutionizing the tokenization process, enhancing scalability and efficiency. This model could redefine how we approach natural language processing, paving the way for more streamlined AI applications. Exciting times ahead for tech…

1

3

1

212

S

Srini Iyer@sriniiyer88 · Dec 14

[13 Dec 2024] Meta BLT: Tokenizer-free, Byte-level LLM buttondown.com/ainews/archive… a few months ago @karpathy noted that tokenizers are the root of all evils in llm flaws. Could @AIatMeta have finally cracked the algorithm to process byte-level data directly (enabling all kinds of…

LLisan al Gaib@scaling01 · Dec 13

META JUST KILLED TOKENIZATION !!! A few hours ago they released "Byte Latent Transformer". A tokenizer free architecture that dynamically encodes Bytes into Patches and achieves better inference efficiency and robustness! (I was just talking about how we need dynamic…

7

6

40

25

17.0K

Srini Iyer Retweeted

Z

Zain@ZainHasan6 · Dec 13

Pretty cool work on tokenization-less transformer from Meta! > Byte Latent Transformer (BLT), byte-level LLM architecture, matches tokenization-based LLM performance > BLT encodes bytes into dynamically sized patches, which serve as the primary units of computation. >…

2

4

22

8

2.0K

S

Srini Iyer@sriniiyer88 · Dec 13

Been waiting for this one, a strong step in removing tokenization from LLMs. Congrats to the team!

SSrini Iyer@sriniiyer88 · Dec 13

New paper! Byte-Level models are finally competitive with tokenizer-based models with better inference efficiency and robustness! Dynamic patching is the answer! Read all about it here: dl.fbaipublicfiles.com/blt/BLT__Patch… (1/n)

0

3

19

0

2.0K

S

Srini Iyer@sriniiyer88 · Dec 13

This could be one of the biggest AI papers of the year, if it really works as well as they report in this paper. It's hard to overstate how impactful ending the tyranny of tokenizers would be for AI. I'm very eager to see the open source implementations and replications.

AArtidoro Pagnoni@ArtidoroPagnoni · Dec 13

🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 dl.fbaipublicfiles.com/blt/BLT__Patch… Code 🛠️ github.com/facebookresear…

3

16

2

1.0K

S

Srini Iyer@sriniiyer88 · Dec 13

Llamas ... Tokenizer Free?! USING ENTROPY STEERING?!?!! sometimes the universe conspires to make a paper just for you and it feels wonderful when it happens.

AArtidoro Pagnoni@ArtidoroPagnoni · Dec 13

🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 dl.fbaipublicfiles.com/blt/BLT__Patch… Code 🛠️ github.com/facebookresear…

12

36

716

262

47.0K

Srini Iyer Retweeted

L

Lisan al Gaib@scaling01 · Dec 13

I can rest now🥲 I have gathered all the infinity stones. thanks @karpathy

8

12

1.0K

211

126.0K

S

Srini Iyer@sriniiyer88 · Dec 13

We scaled up Megabyte and ended up with a BLT! A pure byte-level model, has a steeper scaling law than the BPE-based models. With up to 8B parameters, BLT matches Llama 3 on general NLP tasks—plus it excels on long-tail data and can manipulate substrings more effectively. The…

AArtidoro Pagnoni@ArtidoroPagnoni · Dec 13

🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 dl.fbaipublicfiles.com/blt/BLT__Patch… Code 🛠️ github.com/facebookresear…

0

8

70

17

8.0K