Jiahai Feng

@feng_jiahai

PhD student @berkeley_ai | Previously MIT

Joined June 2022

538Following

568Followers

Jiahai Feng Retweeted

Anna Leshinskaya@AnnaLeshinskaya · Jul 17

Thrilled to announce our symposium, Cognitively Inspired Interpretability inn Large Neural Networks, at #CogSci2025 featuring @TaylorWWebb, Ellie Pavlick, @feng_jiahai, Gustaw Opielka, Claire Stevenson, and @IbanDlank!

3.0K

Jiahai Feng Retweeted

Belinda Li@belindazli · Mar 12

Past work has shown that world state is linearly decodable from LMs trained on text and games like Othello. But how do LMs *compute* these states? We investigate state tracking using permutation composition as a model problem, and discover interpretable, controllable procedures🧵

227

156

41.0K

Jiahai Feng Retweeted

Alex Pan@aypan_17 · Dec 13

LLMs have behaviors, beliefs, and reasoning hidden in their activations. What if we could decode them into natural language? We introduce LatentQA: a new way to interact with the inner workings of AI systems. 🧵

130

20.0K

Jiahai Feng Retweeted

Jack Merullo@jack_merullo_ · Dec 5

Can we find circuits directly from a model’s params? At Neurips I’m presenting work on understanding how attn heads in LMs communicate by analyzing their weights. We find a lot of interesting things, like a 3D subspace that controls which index in a list to attend to!

7.0K

Jiahai Feng Retweeted

Megan Wei@MeganJWei · Nov 10

🎶 We can often tell if a song is fast or slow or if a note is high or low. Music generation models are trained on vast amounts of data, but do they pick up music theory concepts like we do? 🎶 Our #ISMIR2024 paper explores this question! arxiv.org/abs/2410.00872 @ISMIRConf

11.0K

Jiahai Feng Retweeted

Transluce@TransluceAI · Oct 23

Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: transluce.org/introducing-tr…

146

700

256

327.0K

Jiahai Feng@feng_jiahai · Sep 28

Excited to give a talk about interpretability via reverse engineering tmrw at @eccvconf!

SSukrut Rao@sukrutrao · Sep 26

Less than three days to go for the eXCV Workshop at #ECCV2024! Join us on Sunday from 14:00-18:00 in Brown 1 to hear about the state of XAI research from an exciting lineup of speakers! @orussakovsky, @vidal_rene, @sunniesuhyoung, @YGandelsman, @zeynepakata @eccvconf (1/4)

4.0K

Jiahai Feng Retweeted

Jacob Andreas@jacobandreas · Jul 26, 2024

Some thoughts on how to think about "world models" in language models and beyond: lingo.csail.mit.edu/blog/world_mod…

256

163

34.0K