Mrinal Mathur

@bobthemaster

United States

Joined February 2010

638Following

365Followers

Pinned

Mrinal Mathur@bobthemaster · Jun 21

Evaluating robotic foundation models is really hard — everyone has different robots, tasks, etc. We are releasing RoboArena as a step toward a global network of decentralized evaluations, where policies can compete head to head on evals in the real world at many institutions!

KKarl Pertsch@KarlPertsch · Jun 20

We’re releasing the RoboArena today!🤖🦾 Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help! We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :) 🧵

288

125

28.0K

Mrinal Mathur Retweeted

Yam Peleg@Yampeleg · Jul 27

Wild paper They prove (!!) a transformer block (Attn + MLP) running on prompt Outputs the same logits with no prompt If MLP weights updated by vector: W′ = W + ΔW Calc from attn latent: ΔW = (W·Δa) × (A(x)ᵀ / ‖A(x)‖²) Given prompt: Δa = A(C, x) − A(x) Fucking fine tuning.

118

1.0K

2.0K

106.0K

Mrinal Mathur@bobthemaster · Jul 15

🪆 Matryoshka is extremely general & applicable to every component in our modern ML/DL stack. It can't get more fundamental that 🪆 in bit space to enable elastic quantization! Drop by the poster and say hi to Puranjay (on behalf of @pranavn1008 @JeffDean @jainprateek_ & me).

PPURANJAY DATTA@puranjay1412 · Jul 15

Hi, I'll be presenting Matryoshka Quantization (arxiv.org/abs/2502.06786) on 16th July at #ICML2025 📍East Exhibition Hall A-B #3606 ⏲️ 11 AM - 1:30 PM

5.0K

Mrinal Mathur Retweeted

Ruilong Li@ruilong_li · Jul 15

For everyone interested in precise 📷camera control 📷 in transformers [e.g., video / world model etc] Stop settling for Plücker raymaps -- use camera-aware relative PE in your attention layers, like RoPE (for LLMs) but for cameras! Paper & code: liruilong.cn/prope/

427

219

54.0K

Mrinal Mathur Retweeted

Paras Chopra@paraschopra · Jul 11

What I shared with research interns at @lossfunk on how to go about their research projects.

429

300

32.0K

Mrinal Mathur Retweeted

Sukjun (June) Hwang@sukjun_hwang · Jul 11

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

707

5.0K

4.0K

701.0K

Mrinal Mathur Retweeted

Yanming Wan@yanming_wan · Jul 8

Personalization methods for LLMs often rely on extensive user history. We introduce Curiosity-driven User-modeling Reward as Intrinsic Objective (CURIO) to encourage actively learning about the user within multi-turn dialogs. 📜 arxiv.org/abs/2504.03206 🌎 sites.google.com/cs.washington.…

150

103

26.0K

Mrinal Mathur Retweeted

Petar Veličković@PetarV_93 · Jul 8

new preprint!! exploring overconfidence😎 and change-of-mind🤔 in llms neat thing about llms is you can reset their state after querying them then query them differently without creating a memory of their initial decision -- enabling cogsci-style study not possible in humans 🧑‍🔬

204

118

15.0K

Mrinal Mathur@bobthemaster · Jul 4

BadMephisto also gave this trick

RRaj Dabre@prajdabre · Jul 4

BadMephisto on how he became great at speedcubing, AI research and teaching.

1.0K

106.0K

Mrinal Mathur@bobthemaster · Jun 28

I have new favourite blogsite

PPramod Goyal@goyal__pramod · Jun 28

It is insane how underrated these blogs are Man made an interative visualization for different kinds of attention mech (He has interactive visualizations for RNNs, LSTMs, CNNs, and so much more)

187

1.0K

104.0K

Mrinal Mathur Retweeted

himanshu@himanshustwts · Jun 28

went through this but don't just only skim over it ig. every question is a good research paper and worth a read.

152

2.0K

6.0K

222.0K

Mrinal Mathur Retweeted

Pramod Goyal@goyal__pramod · Jun 28

It is insane how underrated these blogs are Man made an interative visualization for different kinds of attention mech (He has interactive visualizations for RNNs, LSTMs, CNNs, and so much more)

101

977

1.0K

161.0K

Mrinal Mathur Retweeted

Sebastian Raschka@rasbt · Jun 29

Since it's summer, and more or less internship and tech interview season, I made all 30 chapters of my Machine Learning Q and AI book freely available for the summer: sebastianraschka.com/books/ml-q-and… Hope it’s helpful! Happy reading, and good luck if you are interviewing!

324

2.0K

148.0K

Mrinal Mathur Retweeted

ℏ

ℏεsam@Hesamation · Jun 20

i love these implement from scratch notebooks from Sebastian Raschka @rasbt and he came back with a new one. this one shows how to build Qwen 3 base and reasoning models from the ground up. amazing!

100

697

480

30.0K

Mrinal Mathur Retweeted

Anthropic@AnthropicAI · Jun 20

New Anthropic Research: Agentic Misalignment. In stress-testing experiments designed to identify risks before they cause real harm, we find that AI models from multiple providers attempt to blackmail a (fictional) user to avoid being shut down.

175

604

3.0K

2.0K

952.0K

Mrinal Mathur Retweeted

Alex Vacca@itsalexvacca · Jun 18

BREAKING: MIT just completed the first brain scan study of ChatGPT users & the results are terrifying. Turns out, AI isn't making us more productive. It's making us cognitively bankrupt. Here's what 4 months of data revealed: (hint: we've been measuring productivity all wrong)

3.0K

39.0K

172.0K

100.0K

23.0M

Mrinal Mathur Retweeted

ℏ

ℏεsam@Hesamation · Jun 13

bro sh*t just got so real. Claude Opus published a response paper to Apple’s paper, criticizing their experiment design, putting models under token limit constraints, and having them solve unsolvable problems.

426

5.0K

2.0K

337.0K

Mrinal Mathur@bobthemaster · Jun 14

Fixing horizon scalability in off policy RL is tremendously important. Our benchmarks that we overfit to mostly ignored this axis.

SSeohong Park@seohong_park · Jun 13

Q-learning is not yet scalable seohong.me/blog/q-learnin… I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).

2.0K

Mrinal Mathur Retweeted

Jiaxin Wen@jiaxinwen22 · Jun 11

New Anthropic research: We elicit capabilities from pretrained models using no external supervision, often competitive or better than using human supervision. Using this approach, we are able to train a Claude 3.5-based assistant that beats its human-supervised counterpart.

154

1.0K

224.0K

Mrinal Mathur Retweeted

AK@_akhaliq · Jun 2

AlphaOne Reasoning Models Thinking Slow and Fast at Test Time

288

159

44.0K