Srini Iyer
@sriniiyer88
Research Scientist at Facebook AI Research
New paper! Byte-Level models are finally competitive with tokenizer-based models with better inference efficiency and robustness! Dynamic patching is the answer! Read all about it here: dl.fbaipublicfiles.com/blt/BLT__Patch… (1/n)
Turns out, if you teach llamas how to self-reflect and backtrack from wrong reasoning paths, it does extra well on math reasoning! - MATH 500: 65.8% ➡️ 81.8% - AMC 23: 37.5% ➡️ 64.4% - AIME 24: 10% ➡️ 30% Amazing work by @danieljwkim, can be a nice long weekend read!
Can we improve Llama 3’s reasoning abilities through post-training only? Introducing ASTRO, our new framework that teaches LLMs to perform in-context search and generate long CoT to solve math problems, via SFT and RL. Work done at @aiatmeta. 📄 Paper: arxiv.org/abs/2507.00417
This is exciting! Check out our new step-by-step playbook that shows how to do MoT on top of your existing transformer implementation! Also, MoT is now in TMLR! Huge congrats to @liang_weixin, @VictoriaLinML and others!
🎉 Excited to share: "𝐌𝐢𝐱𝐭𝐮𝐫𝐞-𝐨𝐟-𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬 (𝐌𝐨𝐓)" has been officially accepted to TMLR (March 2025) and the code is now open-sourced! 📌 GitHub repo: github.com/facebookresear… 📄 Paper: arxiv.org/abs/2411.04996 How can we reduce pretraining costs for…
We just released model weights for our 1B & 8B-parameter BLT: Byte Latent Transformer, token-less model with sig. improvements in inference efficiency and robustness Model on @huggingface: huggingface.co/facebook/blt Code: github.com/facebookresear… Paper: arxiv.org/abs/2412.09871
By popular demand (see our GH issues 😅), we're releasing 1B and 8B weights for our BLT models! We're also hard at work at adding BLT to HF transformers! Model Weights: huggingface.co/facebook/blt Code + Instructions for loading weights: github.com/facebookresear…
🚀 Meta FAIR is releasing several new research artifacts on our road to advanced machine intelligence (AMI). These latest advancements are transforming our understanding of perception. 1️⃣ Meta Perception Encoder: A large-scale vision encoder that excels across several image &…
Excited to share that we are open sourcing BLT model weights by popular demand(Code was open sourced already): github.com/facebookresear… ai.meta.com/blog/meta-fair… paper: arxiv.org/pdf/2412.09871
We're hiring PhD interns for Summer 2025 in Seattle to work with us on improving BLT even more! If this is something that excites you, reach out to me on dm/email asap!
New from Meta FAIR — Byte Latent Transformer: Patches Scale Better Than Tokens introduces BLT, which for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency & robustness. Paper ➡️ go.fb.me/w23lmz
BLT related post by Meta AI - eliminate all tokenization once and for all!
New from Meta FAIR — Byte Latent Transformer: Patches Scale Better Than Tokens introduces BLT, which for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency & robustness. Paper ➡️ go.fb.me/w23lmz
Meta's Byte Latent Transformer (BLT) paper looks like the real-deal. Outperforming tokenization models even up to their tested 8B param model size. 2025 may be the year we say goodbye to tokenization.
Gm. Woke up to a new paper on Byte Latent Transformers (BLT). Now you can increase model size without increasing inference compute by tweaking *patch sizes*. Great day for LLMs. Full article: ai.meta.com/research/publi…
Meta AI's Byte Latent Transformer (BLT) is revolutionizing the tokenization process, enhancing scalability and efficiency. This model could redefine how we approach natural language processing, paving the way for more streamlined AI applications. Exciting times ahead for tech…
[13 Dec 2024] Meta BLT: Tokenizer-free, Byte-level LLM buttondown.com/ainews/archive… a few months ago @karpathy noted that tokenizers are the root of all evils in llm flaws. Could @AIatMeta have finally cracked the algorithm to process byte-level data directly (enabling all kinds of…
META JUST KILLED TOKENIZATION !!! A few hours ago they released "Byte Latent Transformer". A tokenizer free architecture that dynamically encodes Bytes into Patches and achieves better inference efficiency and robustness! (I was just talking about how we need dynamic…
Pretty cool work on tokenization-less transformer from Meta! > Byte Latent Transformer (BLT), byte-level LLM architecture, matches tokenization-based LLM performance > BLT encodes bytes into dynamically sized patches, which serve as the primary units of computation. >…
Been waiting for this one, a strong step in removing tokenization from LLMs. Congrats to the team!
New paper! Byte-Level models are finally competitive with tokenizer-based models with better inference efficiency and robustness! Dynamic patching is the answer! Read all about it here: dl.fbaipublicfiles.com/blt/BLT__Patch… (1/n)
This could be one of the biggest AI papers of the year, if it really works as well as they report in this paper. It's hard to overstate how impactful ending the tyranny of tokenizers would be for AI. I'm very eager to see the open source implementations and replications.
🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 dl.fbaipublicfiles.com/blt/BLT__Patch… Code 🛠️ github.com/facebookresear…
Llamas ... Tokenizer Free?! USING ENTROPY STEERING?!?!! sometimes the universe conspires to make a paper just for you and it feels wonderful when it happens.
🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 dl.fbaipublicfiles.com/blt/BLT__Patch… Code 🛠️ github.com/facebookresear…
I can rest now🥲 I have gathered all the infinity stones. thanks @karpathy
We scaled up Megabyte and ended up with a BLT! A pure byte-level model, has a steeper scaling law than the BPE-based models. With up to 8B parameters, BLT matches Llama 3 on general NLP tasks—plus it excels on long-tail data and can manipulate substrings more effectively. The…
🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 dl.fbaipublicfiles.com/blt/BLT__Patch… Code 🛠️ github.com/facebookresear…