Naman Goyal
@NamanGoyal21
Research @thinkymachines, previously pretraining LLAMA at GenAI Meta
The past 4 months have been among the most rewarding of my career—filled with learning and building alongside some of the most talented ML research and infra folks I know. I truly believe magic happens when driven, talented people are aligned on a shared mission.
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're…
🚨🔥 CUTLASS 4.0 is released 🔥🚨 pip install nvidia-cutlass-dsl 4.0 marks a major shift for CUTLASS: towards native GPU programming in Python slidehelloworld.png docs.nvidia.com/cutlass/media/…
Congrats amazing friends and ex colleagues on killer release! Pushing the frontier of open source models pushes the field collectively forward!
Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model…
This is what we have been up to and much more! come join us!!! 🚀
Great to visit one of our data centers where we're training Llama 4 models on a cluster bigger than 100K H100’s! So proud of the incredible work we’re doing to advance our products, the AI field and the open source community. We’re hiring top researchers to work on reasoning,…
Thousands of gpus isn't cool, you know what's cool? Thousands of hosts
Does style matter over substance in Arena? Can models "game" human preference through lengthy and well-formatted responses? Today, we're launching style control in our regression model for Chatbot Arena — our first step in separating the impact of style from substance in…
llama1: 2048 gpus llama2: 4096 gpus llama3: 16384 gpus llama4: ..... You see where we are headed! Gonna be insane ride!
Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context…
Very excited to release the technical report and the model weights for the all 3 sizes of llama3 models. It has been exciting past 12 months. Really looking forward to the incredible research this will unlock from the community. Now on to llama4 🚀
Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context…
pretty cool! nice work, really happy the amazing research open sourcing base model weights can enable.
The @AiEleuther interpretability team is releasing a set of top-k sparse autoencoders for every layer of Llama 3 8B: huggingface.co/EleutherAI/sae… We are working on an automated pipeline to explain the SAE features, and will start training SAEs for the 70B model shortly.
This is extremely exciting, looking forward to the impact it will have on biology. The team behind EvolutionaryScale is one of the most talented and passionate set of people, I have interacted with.
We have trained ESM3 and we're excited to introduce EvolutionaryScale. ESM3 is a generative language model for programming biology. In experiments, we found ESM3 can simulate 500M years of evolution to generate new fluorescent proteins. Read more: evolutionaryscale.ai/blog/esm3-rele…
Got curious about this. Suggests average case of achieving 1e6 * gpt4 (or 3e31) flops model by 2028. At 2500 bf16 Tflops, 1.2KW of B100, that will require roughly ~456 GW per hour power to train in 6 months. Which afaik, is roughly United States's entire electricity usage in 2023
AGI by 2027 is strikingly plausible. That doesn’t require believing in sci-fi; it just requires believing in straight lines on a graph.
Llama 3 is out in 8B and 70B sizes! (400B still training) Congrats to the @AIatMeta team! ai.meta.com/blog/meta-llam…
Excited to share a preview of Llama3, including the release of an 8B and 70B (82 MMLU, should be the best open weights model!), and preliminary results for a 405B model (still training, but already competitive with GPT4). Lots more still to come... ai.meta.com/blog/meta-llam…
Really proud of the work that went into making this possible, hope this helps the community push the field forward. Also in case anyone missed it, there's a sneak peak of what to come next at the end of blog post ai.meta.com/blog/meta-llam…
It’s here! Meet Llama 3, our latest generation of models that is setting a new standard for state-of-the art performance and efficiency for openly available LLMs. Key highlights • 8B and 70B parameter openly available pre-trained and fine-tuned models. • Trained on more…
It’s here! Meet Llama 3, our latest generation of models that is setting a new standard for state-of-the art performance and efficiency for openly available LLMs. Key highlights • 8B and 70B parameter openly available pre-trained and fine-tuned models. • Trained on more…
Excited to share Emu Video, for high quality video generation! Our factorized {text}-to-image generation followed by {image, text}-to-video generation approach outperforms all prior work & commercial solutions in human evals. Demo + blog + paper: emu-video.metademolab.com #emuvideo
Finished 30/30 radiation therapy sessions today. Past 3-4 months have been one of the most challenging part of my life. Recovery from surgery and radiation therapy was quite physically and mentally challenging. With due respect, Cancer, please stay from me from now on.
I’m excited to release our most recent work setting a new SOTA FID of 4.88 on text-to-image generation we call CM3Leon (pronounced chameleon)! ai.meta.com/research/publi…
The False Promise of Imitating Proprietary LLMs Open-sourced LLMs are adept at mimicking ChatGPT’s style but not its factuality. There exists a substantial capabilities gap, which requires better base LM. arxiv.org/abs/2305.15717