Anshul Kundaje (anshulkundaje@bluesky)

@anshulkundaje

Federally funded academic research is the innovation engine of the US economy. Reform is welcome. Destruction will have long term consequences.

Stanford, CA

Joined July 2006

3KFollowing

26KFollowers

Pinned

Anshul Kundaje (anshulkundaje@bluesky)@anshulkundaje · Jun 19

@sara_mostafavi (@genentech ) & I (@Stanford) r excited to announce co-advised postdoc positions for candidates with deep expertise in ML for bio (especially sequence to function models, causal perturbational models & single cell models). See details below. Pls RT 1/

anshulkundaje's tweet image. @sara_mostafavi (@genentech ) &amp; I (@Stanford) r excited to announce co-advised postdoc positions for candidates with deep expertise in ML for bio (especially sequence to function models, causal perturbational models &amp; single cell models). See details below. Pls RT 1/

125

34.0K

Pinned

Anshul Kundaje (anshulkundaje@bluesky) Retweeted

Zeyun Lu 鲁泽沄@zeyun_lu · Jul 21

Happy to share that our work from the @nmancuso_ lab is out in @NatureGenet! We developed SuShiE, a multiancestry fine-mapping method for molecular traits. doi.org/10.1038/s41588…

7.0K

Anshul Kundaje (anshulkundaje@bluesky) Retweeted

Ian Shi@ianshi3 · Jul 15

We're excited to release 𝐦𝐑𝐍𝐀𝐁𝐞𝐧𝐜𝐡, a new benchmark suite for mRNA biology containing 10 diverse datasets with 59 prediction tasks, evaluating 18 foundation model families. Paper: biorxiv.org/content/10.110… GitHub: github.com/morrislab/mRNA… Blog: blank.bio/post/mrnabench

6.0K

Anshul Kundaje (anshulkundaje@bluesky)@anshulkundaje · 13 h

When AI drives your data generation, learning is more efficient and effective. Take a deep dive into VISTA:

EElizabeth Wood 🧬🖥️🥼@lizbwood · 13 h

The biggest challenge for AI in biology isn't just models, it's the data used to train them. Standard biological data isn't built for AI. To unlock generative AI for drug discovery, we must rethink how we generate and capture data. 1/

1.0K

Anshul Kundaje (anshulkundaje@bluesky) Retweeted

CL • Le Cong@lecong · 9 h

🚀 Join CongLab @Stanford! We’re hiring postdocs to create lab-in-the-loop self-evolving AI agents, open benchmarks, to design, test & learn—advancing safer gene & cell therapies. Build on CRISPR-GPT, RNAGenesis model, Genome-Bench, for innovative medicines. #PostdocJobs

5.0K

Anshul Kundaje (anshulkundaje@bluesky) Retweeted

David Kelley@drklly · Jul 21

I'm excited to share work on a research direction my team has been advancing: connecting machine learning derived genetic variant embeddings to downstream tasks in human genetics. This work was led by the amazing @divyanshi91! biorxiv.org/content/10.110…

108

10.0K

Anshul Kundaje (anshulkundaje@bluesky) Retweeted

Jake Scott, MD@jakescottMD · Jul 20

The same administration that promotes Skittles ingredient changes as major victories passed legislation projected to push 17 million off Medicaid and withdrew from WHO pandemic response. Candy ingredients get celebrated. Healthcare gets dismantled. open.substack.com/pub/jakescottm…

184

6.0K

Anshul Kundaje (anshulkundaje@bluesky) Retweeted

Arvind Narayanan@random_walker · Jul 21

Back in grad school, when I realized how the “marketplace of ideas” actually works, it felt like I’d found the cheat codes to a research career. Today, this is the most important stuff I teach students, more than anything related to the substance of our research. A quick…

397

501

44.0K

Anshul Kundaje (anshulkundaje@bluesky)@anshulkundaje · Jul 21

Check out Bacformer 🦠, a foundation model for bacterial genomics! Led by the fantastic @wiatrak_maciej

MMaciej Wiatrak@wiatrak_maciej · Jul 21

💥 Excited to introduce Bacformer 🦠 - the first foundation model for bacterial genomics. Bacformer represents genomes as sequences of ordered proteins, learning the “grammar” of how genes are arranged, interact and evolve. Preprint 📝: biorxiv.org/content/10.110… 🧵 1/n

5.0K

Anshul Kundaje (anshulkundaje@bluesky) Retweeted

Maciej Wiatrak@wiatrak_maciej · Jul 21

332

216

38.0K

Anshul Kundaje (anshulkundaje@bluesky)@anshulkundaje · Jul 20

With the history of this guy and xAI, there is no chance I'd recommend any parent to touch "Baby Grok" with a million foot pole.

EElon Musk@elonmusk · Jul 20

We’re going to make Baby Grok @xAI, an app dedicated to kid-friendly content

3.0K

Anshul Kundaje (anshulkundaje@bluesky) Retweeted

Patrick Schwab@schwabpa · Jul 20

I assume you refer to Figure 4 in your manuscript? i) You do not seem to evaluate the actual representations learned by the various models in your comparison - you removed top 5% activations, and filtered/preprocessed the representations (see Appendix C). Crucially, for…

1.0K

Anshul Kundaje (anshulkundaje@bluesky) Retweeted

Patrick Schwab@schwabpa · Jul 20

It's not surprising to me that the correlation structures in similarly collected data would be similar - all language models essentially use the same approach to data collection (large sets of internet-derived texts). Shared correlation structures do not imply that the…

2.0K

Anshul Kundaje (anshulkundaje@bluesky) Retweeted

Patrick Schwab@schwabpa · Jul 20

If you know the causal graph, then yes that could be a strategy. But it's often a hen-and-egg situation in that we lack the causal model and we want to use models to discover it (at least in biomedicine) -- if we already had the causal model, we wouldn't need to use AI to find…

1.0K

Anshul Kundaje (anshulkundaje@bluesky) Retweeted

Patrick Schwab@schwabpa · Jul 20

This example also very well illustrates why getting to a causal model is so exceedingly difficult (with observational data and the en-vogue breed of frequentist methods). The causal model would actually look worse by the metrics, as it cannot exploit spurious correlations.…

4.0K

Anshul Kundaje (anshulkundaje@bluesky)@anshulkundaje · Jul 20

A model trained with a frequentist objective using i.i.d. observational train and validation data is almost surely converging to the non-causal model of the underlying data generating process. That is because using short cuts/spurious correlations leads to a lower loss than not,…

AAnshul Kundaje (anshulkundaje@bluesky)@anshulkundaje · Jul 19

Models trained on observational RNA data with no notion of time have no chance of learning models that have strong guarantees of biological causality. They are learning correlations.

240

174

32.0K

Anshul Kundaje (anshulkundaje@bluesky)@anshulkundaje · Jul 20

BTW this work was originally presented at ICML Workshop on Comp Bio in 2023. That’s 2 years ago!!! This kind of critical but fair view should not come to light this late, especially when so many gLMs are being released every week!

PPeter Koo@pkoo562 · Jul 15

Our work on "Evaluating the representational power of pre-trained DNA language models for regulatory genomics" led by @AmberZqt with help from @NiraliSomia & @stevenyuyy is finally published in Genome Biology! Check it out! genomebiology.biomedcentral.com/articles/10.11…

4.0K

Anshul Kundaje (anshulkundaje@bluesky) Retweeted

Rota@pli_cachete · Jul 19

Terence Tao on the supposed Gold from OpenAI at IMO

551

6.0K

3.0K

596.0K

Anshul Kundaje (anshulkundaje@bluesky) Retweeted

Dhananjay Bhaskar@dbhaskar92 · Jul 19

(1/2) Thrilled to share that I’m joining @UWMadison_BME as a tenure-track Assistant Professor starting today! Endlessly grateful to my mentors, friends, and family - I wouldn’t be here without your support 🙏 Excited for what lies ahead! #NewFaculty #UWBadgers

5.0K

Anshul Kundaje (anshulkundaje@bluesky) Retweeted

bioRxiv Genomics@biorxiv_genomic · Jul 18

A Complete Telomere-to-Telomere Diploid Reference Genome for Indian Population biorxiv.org/content/10.110… #biorxiv_genomic

2.0K

Anshul Kundaje (anshulkundaje@bluesky) Retweeted

Rohit Singh@rohitsingh8080 · Jul 19

Which of these images is not like the other and why does it matter for cancer research? That fourth transform is what real tissues need but most spatial methods can't handle. Enter SAME, our algorithm for integrating multimodal spatial omics across near-serial sections.

3.0K