Emily Dinan
@em_dinan
eng @ Meta GenAI (she/her)
check out our new work on merging expert models, Branch-Train-Stitch 🪡🪡🪡 had so much fun working on this with the incredible @IreneZhang30 and team!!! 😊
✨New Preprint✨We introduce 𝐁𝐫𝐚𝐧𝐜𝐡-𝐓𝐫𝐚𝐢𝐧-𝐒𝐭𝐢𝐭𝐜𝐡 (𝐁𝐓𝐒), an efficient & flexible method for stitching together independently pretrained LLM experts (i.e. code, math) into a single, capable generalist model. Key Takeaways: ✅BTS achieves the best average…
llama1: 2048 gpus llama2: 4096 gpus llama3: 16384 gpus llama4: ..... You see where we are headed! Gonna be insane ride!
Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context…
We’ve also updated our license to allow developers to use the outputs from Llama models — including 405B — to improve other models for the first time. We’re excited about how this will enable new advancements in the field through synthetic data generation and model distillation…
as my other amazing teammates have already shared, check out our llama 3.1 paper here! lots of fun tidbits about the highs, lows, sweat, and tears that go into training LLMs lol ... onto llama 4!!! ai.meta.com/research/publi…
Really proud of the work that went into making this possible, hope this helps the community push the field forward. Also in case anyone missed it, there's a sneak peak of what to come next at the end of blog post ai.meta.com/blog/meta-llam…
It’s here! Meet Llama 3, our latest generation of models that is setting a new standard for state-of-the art performance and efficiency for openly available LLMs. Key highlights • 8B and 70B parameter openly available pre-trained and fine-tuned models. • Trained on more…
and we have more cookin' 🦙🦙🦙
Introducing Meta Llama 3: the most capable openly available LLM to date. Today we’re releasing 8B & 70B models that deliver on new capabilities such as improved reasoning and set a new state-of-the-art for models of their sizes. Today's release includes the first two Llama 3…
Excited to share a preview of Llama3, including the release of an 8B and 70B (82 MMLU, should be the best open weights model!), and preliminary results for a 405B model (still training, but already competitive with GPT4). Lots more still to come... ai.meta.com/blog/meta-llam…
Happy to be part of this incredible journey of Llama3 and to share the best open weight 8B and 70B models! Our largest 400B+ model is still cooking but we are providing a sneak peek into how it is trending! Check more details here ai.meta.com/blog/meta-llam…
My team launched a suite of models that enable near real-time and expressive AI translations last week. This is a little personal reflection/thank-you note to everyone who contributed to it. (1/n)
Last week we introduced SeamlessStreaming — an AI translation model that can deliver streaming translation with <2 seconds of latency. One of the core pieces of our recent Seamless Communication announcement. More details on this work ➡️ bit.ly/4165c9z
We're hiring research interns for Meta's GenAI language research team: metacareers.com/jobs/841036594… Email me if you're interested in working on language models together 🥳
Excited to be back at Meta today, working on some llamas with the GenAI team 🦙🥳
Today we’re announcing two new updates in our computer vision work — a new, expanded license for our DINOv2 model and the release of FACET, a comprehensive new benchmark dataset to help evaluate and improve fairness in vision models. More details ➡️ bit.ly/3L35E1U 🧵
🚨New Paper 🚨 Self-Alignment with Instruction Backtranslation - New method auto-labels web text with instructions & curates high quality ones for FTing - Our model Humpback 🐋 outperforms LIMA, Claude, Guanaco, davinci-003 & Falcon-Inst arxiv.org/abs/2308.06259 (1/4)🧵
It's been six months since ChatGPT's emergence. In this paper, we critically examine the social impact of its proliferation and advocate for AI development to go beyond human-centeredness to include a social-centered dimension. Paper: arxiv.org/abs/2306.00227 (1/N)
Are you... - Working on something new in the generative Natural Language Processing (NLP) space? - Excited about learning what others are doing in this domain? Submit to TEACH! We've updated our submission deadline to May 24th! #icml2023 #teach2023
📢Call for Papers 📢 The 2nd Workshop on Neural Conversational AI: What’s left to TEACH (Trustworthy, Enhanced, Adaptable, Capable & Human-centric) will be hosted at @icmlconf in July 2023 in Honolulu 🤩 📌 sites.google.com/view/teach-icm… #ICML2023 #TEACH2023 #ML #NLProc
Congrats to @em_dinan, @ShoYaida, and @suchenzang on their new work arxiv.org/abs/2304.02034, applying the effective theory blueprint of PDLT to the transformer architecture!
From theory to practice, here's our latest attempt to bring the two sides closer together: arxiv.org/abs/2304.02034 This was also one of the most delightful collabs I've had the chance to be a part of. Thanks @ShoYaida & @em_dinan for being such wonderful humans to work with! 😍