Jehanzeb Mirza
@jmie_mirza
i run and do computer vision and play with languages sometimes. postdoc at @MIT_CSAIL
Now accepted at #ICLR2025 under a new title: Can we talk models into seeing the world differently? More details soon.
🚀 GPT-4, Gemini, Qwen, LLaVA: LLMs are stepping into the multi-modal arena with a bang! But let's zoom in on their vision 👁️. Our preprint peels back the layers on a crucial bias in vision models that sets most apart from humans: the texture/shape bias 👉 t.ly/jT1-R
New preprint! We present a simple alternative to activation steering: KV-cache steering (intervene on the history once instead of continuous intervention on the current activations). This allows, e.g., inducing reasoning in small LLMs by distilling traces from big models. No…
Introducing cache steering – a new method for implicit behavior steering in LLMs Cache steering is a lightweight method for guiding the behavior of language models by applying a single intervention to their KV-cache. We show how it can be used to induce reasoning in small LLMs.
our work on inducing reasoning in small llms is finally on arxiv. without any finetuning or task-specific prompt tuning, we show how to steer the llm outputs in order to induce 'reasoning'. paper: arxiv.org/abs/2507.08799
Introducing cache steering – a new method for implicit behavior steering in LLMs Cache steering is a lightweight method for guiding the behavior of language models by applying a single intervention to their KV-cache. We show how it can be used to induce reasoning in small LLMs.
great news! congrats Assaf. check out our latest COLM work enhancing the long-context abilities of recurrent llms. paper: arxiv.org/abs/2505.07793
OPRM is accepted to #COLM2025! See you in Montreal 🇨🇦 Big thanks to our great collaborators from TAU, MIT, and IBM! #LLM @COLM_conf
Our recent ICCV work test few shot localization in these models and it seems that the understanding of these models of coordinates based tasks is still lacking (ofc we show a way to improve 😉) IPLOC arxiv.org/abs/2411.13317
Amazing news! Congrats Sivan. This is also my first major last-author publication. I will be in Hawaii later this year to present the paper in person. Project page: sivandoveh.github.io/IPLoc/
IPLOC accepted to ICCV25 ☺️ Thanks to all the people that were part of it 🩷 The idea for this paper came by a lake during a visit to Graz for a talk. It has traveled with me through too many countries and too many wars, and it’s now a complete piece of work.
Landed in Nashville! #CVPR2025 Hit me up if you want to grab a coffee. Also if you are an early runner (~5:30 am), definitely get in touch. Searching for fellow runners to explore the city 😁
This work was a great collaboration with @ItamarZimerman, @jmie_mirza, James Glass, @leokarlin, and @RGiryes Check out the paper and our github repo for more experiments, details and code! Arxiv: arxiv.org/abs/2505.07793 Github: github.com/assafbk/OPRM
New work! 🚨 Recurrent LLMs like Mamba and RWKV can efficiently process millions of tokens, yet still underperform on real-world long-context tasks. What's holding them back? 🤔 And how can a lightweight fix boost their performance by 35% on LongBench? 👇🏼🧵 Github:…
LiveXiv will be "live" on #ICLR2025 - Friday April 25th 10:00-12:30 Poster #356 @RGiryes @felipemaiapolo @LChoshen @WeiLinCV @jmie_mirza @leokarlin @ArbelleAssaf @SivanDoveh
I am happy to share that LiveXiv accepted to ICLR 2025 🥳
🚀 Call for Papers – 3rd Workshop on Multi-Modal Foundation Models (MMFM) @CVPR! 🚀 🔍 Topics: Multi-modal learning, vision-language, audio-visual, and more! 📅 Deadline: March 14, 2025 📝 Submit your paper: cmt3.research.microsoft.com/MMFM2025 🌐 More details: sites.google.com/view/mmfm3rdwo…
Learn how to 'talk to models into seeing the world differently' through our ICLR 2025 paper. Great work by @PaulGavrikov.
Proud to announce that our paper "Can We Talk Models Into Seeing the World Differently?" was accepted at #ICLR2025 🇸🇬. This marks my last PhD paper, and we are honored that all 4 reviewers recommended acceptance, placing us in the top 6% of all submissions. In our paper, we…
👉 A big thanks to my wonderful collaborators @jmie_mirza Masato Ishii Mengjie Zhao Christian Simon, my PhD supervisors @LinaYao314 @dgonginf , and to Shusuke Takahashi @mittu1204 for hosting me in the team. See you in Singapore! Pre-print: arxiv.org/pdf/2410.00700 @unsw_ai
🎉 Happy to share that our paper “Mining your own secrets: Diffusion Classifier scores for Continual Personalization of Text-to-Image Diffusion Models” has been accepted to #ICLR2025! 👉 The work results from my #Sony internship in the stunning #Tokyo 🗼city w/ @shiqi_yang_147
Thrilled to announce that our work, LiveXiv, has been accepted to #ICLR2025 ! 🌟 Introducing LiveXiv—a challenging, maintainable, and contamination-free scientific multi-modal live dataset, designed to set a new benchmark for Large Multimodal Models (LMMs). 🚀🙌
Introducing LiveXiv, a new, challenging and maintainable scientific multi-modal live dataset Paper: arxiv.org/abs/2410.10783 Github: github.com/NimrodShabtay/… Dataset: huggingface.co/datasets/LiveX…
I am happy to share that LiveXiv accepted to ICLR 2025 🥳
Introducing LiveXiv, a new, challenging and maintainable scientific multi-modal live dataset Paper: arxiv.org/abs/2410.10783 Github: github.com/NimrodShabtay/… Dataset: huggingface.co/datasets/LiveX…
i got 27 emergency review requests (still piling up by the minute) for cvpr 😂 the review quality is going to be interesting this time.