Ava Amini (@avapamini)

Pinned

A

Ava Amini@avapamini · 7 h

thrilled to share The Dayhoff Atlas of protein language data and models 🚀 protein biology in the age of AI! aka.ms/dayhoff/prepri… we built + open source the largest natural protein dataset, w/ 3.3 billion seqs & a first-in-class dataset of structure-based synthetic proteins

avapamini's tweet image. thrilled to share The Dayhoff Atlas of protein language data and models 🚀 protein biology in the age of AI!

aka.ms/dayhoff/prepri…

we built + open source the largest natural protein dataset, w/ 3.3 billion seqs &amp; a first-in-class dataset of structure-based synthetic proteins

1

36

162

79

24.0K

Pinned

A

Ava Amini@avapamini · 24 h

The Dayhoff Atlas! Open code. Open weights. Open datasets. Thanks @huggingface for helping to facilitate open science. huggingface.co/collections/mi… @ClementDelangue @julien_c

KKevin K. Yang 楊凱筌@KevinKaichuang · Jul 25

Our models, code, and data are openly available on Github, Zenodo, and Huggingface. huggingface.co/collections/mi… zenodo.org/records/152652… github.com/microsoft/dayh…

1

6

21

4

10.0K

Pinned

Ava Amini Retweeted

D

David Li@davidycli · Jul 24

**A grand unified theory on what will happen in biotech in the next 10-20 years** the two major forces reshaping industrial biotech in the next decade are: 1. China 2. AI - and they're critically linked how? China's low R&D cost basis democratizes execution by providing…

23

29

205

193

34.0K

A

Ava Amini@avapamini · 5 h

Another amazing BioAI paper by dear Ava Amini and her team! Congratulations!

T@ ·

0

2

12

1

10.0K

A

Ava Amini@avapamini · 6 h

New synthetic and metagenomic data boosted experimental success while popular metrics failed to predict it. Read Ava's thread on the really cool models, analysis, and data resources!

AAva Amini@avapamini · 7 h

increasing model and data scale increased the fraction of proteins expressed by E. coli, and the highest expression success rate came from augmenting w/ structure-based synthetic data. data quality + diversity bring real gains in real-world protein expression!

1

6

14

4

2.0K

A

Ava Amini@avapamini · 7 h

🧬 The largest open dataset of natural proteins in the world — 3.3 billion seqs 🧠 A 3 billion param hybrid ssm+transformer model 🤗 Fully open-source data + model biorxiv.org/content/10.110… Congrats to @avapamini + entire team, including @LiquidAI_'s own Kaeli Kaymak-Loveless

AAva Amini@avapamini · 7 h

thrilled to share The Dayhoff Atlas of protein language data and models 🚀 protein biology in the age of AI! aka.ms/dayhoff/prepri… we built + open source the largest natural protein dataset, w/ 3.3 billion seqs & a first-in-class dataset of structure-based synthetic proteins

2

8

46

12

4.0K

Ava Amini Retweeted

K

Kevin K. Yang 楊凱筌@KevinKaichuang · Jul 25

In 1965, Margaret Dayhoff published the Atlas of Protein Sequence and Structure, which collated the 65 proteins whose amino acid sequences were then known. Inspired by that Atlas, today we are releasing the Dayhoff Atlas of protein sequence data and protein language models.

5

64

215

112

20.0K

Ava Amini Retweeted

K

Kyle Tretina, Ph.D.@AllThingsApx · Jul 25

Why is this not all over my feed all day today?!?! 😁@avapamini @KevinKaichuang The Dayhoff Atlas: scaling sequence diversity for improved protein generation biorxiv.org/content/10.110…

1

2

6

5

1.0K

Ava Amini Retweeted

K

Kyle Tretina, Ph.D.@AllThingsApx · Jul 25

I was surprised to see that BackboneRef boosts Dayhoff‑170 m pLM generations expressed in E. coli 27.6% → 51.7%, 1.9× with zero filtering ...while common metrics (pLDDT, perplexity) failed to predict wet‑lab outcomes (AUROC ≤ 0.57) This quietly re‑prioritizes how we…

0

2

8

5

5.0K

Ava Amini Retweeted

K

Kyle Tretina, Ph.D.@AllThingsApx · Jul 25

👀#DayhoffAtlas dropped for #SynBio:👀 3.34B natural🧬 + 46M structure‑guided synthetic protein sequences (from 240k novel backbones), all open‑source Hybrid Mamba‑Transformer learns single seqs & MSAs → 51.7 % of unfiltered designs express in E. coli🦠✨…

1

20

72

42

6.0K

A

Ava Amini@avapamini · Jul 26

To the GPU-poor grad students out there, finding a better predictor of expression is one of the highest leverage contributions you could make to PLM research. Scale isn't always all you need.

KKyle Tretina, Ph.D.@AllThingsApx · Jul 25

I was surprised to see that BackboneRef boosts Dayhoff‑170 m pLM generations expressed in E. coli 27.6% → 51.7%, 1.9× with zero filtering ...while common metrics (pLDDT, perplexity) failed to predict wet‑lab outcomes (AUROC ≤ 0.57) This quietly re‑prioritizes how we…

0

4

19

6

3.0K

A

Ava Amini@avapamini · 22 h

Data diversity, quality & relevance rules over model size any day of the week. Very clever approach of generating synthetic protein sequences from backbone structures to give big boosts to pLMs.

KKevin K. Yang 楊凱筌@KevinKaichuang · Jul 25

Learning on GigaRef yielded a small increase in the fraction of expressed proteins. Increasing model and dataset scale further improved the expression rate. Augmenting training with structure-based synthetic data from BackboneRef produced the highest expression success rate.

0

10

41

20

5.0K

A

Ava Amini@avapamini · 10 h

Very cool work on scaling data for protein language modeling, congrats to the team!

KKevin K. Yang 楊凱筌@KevinKaichuang · Jul 25

In 1965, Margaret Dayhoff published the Atlas of Protein Sequence and Structure, which collated the 65 proteins whose amino acid sequences were then known. Inspired by that Atlas, today we are releasing the Dayhoff Atlas of protein sequence data and protein language models.

1

5

26

6

3.0K

A

Ava Amini@avapamini · Jul 15

Anybody can run cloud LLMs — that's the past. Now with LEAP 🐸, you don’t need the cloud — just tap. No lag, no limits, no looking back.

LLiquid AI@LiquidAI_ · Jul 15

Today, we release LEAP, our new developer platform for building with on-device AI — and Apollo, a lightweight iOS application for vibe checking small language models directly on your phone. With LEAP and Apollo, AI isn’t tied to the cloud anymore. Run it locally when you want,…

0

6

30

0

2.0K

Ava Amini Retweeted

A

Alexander Amini@xanamini · Jul 10

🚀 Introducing LFM2: the fastest on-device foundation models on the market. Built by @LiquidAI_, LFM2 is: ⚡️ Optimized for speed, quality, efficiency. No trade-offs. 🤗 Available in three sizes: - LFM2-350M - LFM2-700M - LFM2-1.2B 🧠 SoTA on knowledge, math, and…

3

13

48

9

4.0K

Ava Amini Retweeted

L

Liquid AI@LiquidAI_ · Jul 10

Today, we release the 2nd generation of our Liquid foundation models, LFM2. LFM2 set the bar for quality, speed, and memory efficiency in on-device AI. Built for edge devices like phones, laptops, AI PCs, cars, wearables, satellites, and robots, LFM2 delivers the fastest…

14

59

255

70

323.0K