Yunha Hwang

@Micro_Yunha

Building genomic intelligence @tatta_bio, incoming Asst Prof @MITBiology, @MITEECS, @MIT_SCC (fall 2025) http://microyunha.bsky.social

Joined May 2019

1KFollowing

4KFollowers

Pinned

Yunha Hwang@Micro_Yunha · Jun 2

At @tatta_bio, we have been thinking deeply about the sequence-to-function problem. We believe that before AI can power functional prediction, we first need to rethink how we curate, manage, and share sequence data. Here, we share our initial ideas on what we are building next:…

Micro_Yunha's tweet image. At @tatta_bio, we have been thinking deeply about the sequence-to-function problem. We believe that before AI can power functional prediction, we first need to rethink how we curate, manage, and share sequence data. Here, we share our initial ideas on what we are building next:…

123

13.0K

Yunha Hwang Retweeted

Kevin K. Yang 楊凱筌@KevinKaichuang · 14 h

In 1965, Margaret Dayhoff published the Atlas of Protein Sequence and Structure, which collated the 65 proteins whose amino acid sequences were then known. Inspired by that Atlas, today we are releasing the Dayhoff Atlas of protein sequence data and protein language models.

144

10.0K

Yunha Hwang Retweeted

Kyle Tretina, Ph.D.@AllThingsApx · 12 h

👀#DayhoffAtlas dropped for #SynBio:👀 3.34B natural🧬 + 46M structure‑guided synthetic protein sequences (from 240k novel backbones), all open‑source Hybrid Mamba‑Transformer learns single seqs & MSAs → 51.7 % of unfiltered designs express in E. coli🦠✨…

4.0K

Yunha Hwang Retweeted

Tatta Bio@tatta_bio · Jul 17

🧬 “As life sciences enter the age of AI, real experimental data are more valuable than ever.” — Nature But data infrastructure hasn’t kept up. Open science depends on fixing that. Our take: tatta.bio/blog/o0z8nb07l… Nature: nature.com/articles/s4159…

554

Yunha Hwang Retweeted

ARIA@ARIA_research · Jul 24

Ages of human history are often defined by the materials we use. But in our latest opportunity space, PD Ivan Jayapurna is asking: what if the next age could instead be defined by our ability to assemble molecules? Dive in + share feedback: link.aria.org.uk/MA-X

4.0K

Yunha Hwang@Micro_Yunha · Jul 23

So glad to see FROs becoming a part of national policy in the US!

CCaleb Watney@calebwatney · Jul 23

👀

1.0K

Yunha Hwang@Micro_Yunha · Jul 19

For biological data, if you don't have deep expertise in this low value work called data cleaning, u r lacking a fundamental understanding of the idiosyncrasies of the data. Without this knowledge, it is impossible to seriously model data.

LLuke Heeney@heeney_luke · Jul 18

Academia must be the only industry where extremely high-skilled PhD students spend much of their time doing low value work (like data cleaning). A 1st year management consultant outsources this immediately. Imagine the productivity gains if PhDs could focus on thinking

679

133

80.0K

Yunha Hwang@Micro_Yunha · Jul 17

🧬🪦“SRA is the graveyard for sequence data.” Overheard at @Spec__Tech's Nerd Party yesterday.🥳 Sequencing is cheaper than ever, so we generate massive datasets, extract a sliver of publishable insight, and the rest gets buried. It's about time we build a scalable infrastructure…

4.0K

Yunha Hwang Retweeted

Nature Methods@naturemethods · Jul 14

As life sciences research becomes enmeshed in the age of AI, real experimental data are more valuable than ever. Read more in this month's Editorial. nature.com/articles/s4159…

8.0K

Yunha Hwang Retweeted

Peter Koo@pkoo562 · Jul 15

gLMs provide promise in learning structure in the genome, but we need to rethink how we either tokenize the genome (and no byte pair encoding isn't the answer either) or come up with a better masking strategy for non-coding genome that is different from other regions (eg coding).

2.0K

Yunha Hwang@Micro_Yunha · Jul 14

Excited for @AI_for_Science @ NeurIPS 2025!

AAI for Science@AI_for_Science · Jul 14

✨ Amazing line up of speakers and panelists: @KulikGroup, @Micro_Yunha, @MicheleCeriotti, @yuqirose, @shoyer, Gurtej Kanwar, @nc_frey, Pratyush Tiwary, @cosmo_shirley, @priyald17. Find out more at @AI_for_Science ai4sciencecommunity.github.io

898

Yunha Hwang Retweeted

Sukjun (June) Hwang@sukjun_hwang · Jul 11

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

699

5.0K

4.0K

697.0K

Yunha Hwang Retweeted

Tatta Bio@tatta_bio · Jul 11

When my sequences are annotated as hypothetical proteins…well, there is really not much I can do!” We’ve been talking to biologists about their sequence analysis pain points. DM or email us — we want your input. Private beta for our next-gen sequence engine opens soon 👀🧬

752

Yunha Hwang Retweeted

Kexin Huang@KexinHuang5 · Jul 9

🧬 Excited to open-source Biomni! With just a few lines of code, you can now automate biomedical research with AI agent! We are releasing Biomni A1 (agent) + E1 (env) with 150 specialized tools, 59 databases, and 105 software. E1 is our first attempt at curating the bio-agent…

100

442

268

29.0K

Yunha Hwang@Micro_Yunha · Jul 8

By this time next year, either: 🏆 we'll have engineered PETase enzymes actually work industrially & recycle plastic ... or we'll know that AI-for-proteins is still a bit underbaked 😜 Excited to see what happens!!!! 😬 Spread the word ⬇️

TThe Align Foundation@Align_Bio · Jul 8

1/4 🚀 Announcing the 2025 Protein Engineering Tournament. This year’s challenge: design PETase enzymes, which degrade the type of plastic in bottles. Can AI-guided protein design help solve the climate crisis? Let’s find out! ⬇️ #AIforBiology #ClimateTech #ProteinEngineering…

5.0K

Yunha Hwang@Micro_Yunha · Jul 1

TTatta Bio@tatta_bio · Jul 1

We’re opening the waitlist to the first 50 signups! Gaia is evolving: 💡 New capabilities ✨ Improved UI/UX 🧬 Beyond single sequence search Help shape the future of biological data. Join the waitlist 👉 shorturl.at/dKae4 Let’s build this together.

2.0K

Yunha Hwang@Micro_Yunha · Jun 30

cool paper! we also noticed that autoregressive models seem to learn phylogeny better/more directly than masked language models - curious why is this the case🤔

YYasha Ektefaie@YEktefaie · Jun 27

7/ The result? Most PLMs fail. They’re often beaten by simple baselines like Hamming distance. They may model evolution, but they don't yet reason with it.

4.0K