Saining Xie
@sainingxie
researcher in #deeplearning #computervision | assistant prof at @nyu_courant | rs @googledeepmind | past: rs @meta (FAIR) @ucsandiego
As the original poster of this strategy (sorry!), I agree that it is a bit unethical in reviews, but that is so overblown here. I do not advocate doing this strategy, but... My main issues are: Safety: Be cautious when taking large blocks of text, dropping them into an LLM, and…
#ICCV2025 is deeply committed to promoting diversity, equity, and inclusion within our community. As part of this commitment, travel support is available to help broaden participation. Applications will be reviewed on a rolling basis until August 20, 2025 (anywhere on Earth).
yes
i mostly use my visual intelligence when trying to solve this sota approaches to arc agi are mostly symbolic, vision doesn't really work well with today's models ergo this is really because we haven't really solved visual reasoning AI
The three biggest hps for stable training in everything are lr, bs, and beta2. We’ve built up good intuitions on how to tune them over time, but this lays it all out analytically and convincingly. this is definitely my new handbook for training big models on small gpus.
🚨 Did you know that small-batch vanilla SGD without momentum (i.e. the first optimizer you learn about in intro ML) is virtually as fast as AdamW for LLM pretraining on a per-FLOP basis? 📜 1/n
I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers" (or: tokens are bullshit) In a few days, we'll release what I believe is the next major advance for architectures.
internet at its peak--just look at how people roasted him in quotes/comments 6 months ago. Again, I think this is very wrong, but can you really blame the students if the community was encouraging this idea, and then suddenly next day they’re being treated like the big villain?
DO NOT DO THIS. I have previously raised this for Ethics Review when I saw it in a paper. You are not sneaky.
How can we unlock generalized reasoning? ⚡️Introducing Energy-Based Transformers (EBTs), an approach that out-scales (feed-forward) transformers and unlocks generalized reasoning/thinking on any modality/problem without rewards. TLDR: - EBTs are the first model to outscale the…
Thanks for bringing this to my attention. I honestly wasn’t aware of the situation until the recent posts started going viral. I would never encourage my students to do anything like this—if I were serving as an Area Chair, any paper with this kind of prompt would be…
Thanks for bringing this to my attention. I honestly wasn’t aware of the situation until the recent posts started going viral. I would never encourage my students to do anything like this—if I were serving as an Area Chair, any paper with this kind of prompt would be…
This looks like a great deep dive on neural network architectures for diffusion models. tl;dr use a Transformer, but there's quite a bit more to it, and as always in this field, the devil is in the details!
Had the honor to present diffusion transformers at CS25, Stanford. The place is truly magical. Slides: bit.ly/dit-cs25 Recording: youtu.be/vXtapCFctTI?si… Thanks to @stevenyfeng for making it happen!
Can VLMs build Spatial Mental Models like humans? Reasoning from limited views? Reasoning from partial observations? Reasoning about unseen objects behind furniture / beyond current view? Check out MindCube! 🌐mll-lab-nu.github.io/mind-cube/ 📰arxiv.org/pdf/2506.21458…
awesome work by @jiacheng_chen_ and @sanghyunwoo1219 on 3D-grounded visual compositing (and nice demos!)
Introducing BlenderFusion: Reassemble your visual elements—objects, camera, and background—to compose a new visual narrative. Play the interactive demo: blenderfusion.github.io
metaquery is now open-source — with both the data and code available.
The code and instruction-tuning data for MetaQuery are now open-sourced! Code: github.com/facebookresear… Data: huggingface.co/collections/xc… Two months ago, we released MetaQuery, a minimal training recipe for SOTA unified understanding and generation models. We showed that tuning few…
Do people *feel* how much work there is still to do. Like wow.
I'm hiring at least one post-doc! We're interested in creating language models that process language more like humans than mainstream LLMs do, through architectural modifications and interpretability-style steering.
guys, real geospatial data is a total goldmine for digital agents. step away from the web browser and get real. (we explored a bit in virl-platform.github.io, but building a simulation-ready pipeline like this could take things way further)
Virtual Community provides an online pipeline that automatically generates 3D scenes from real geospatial data, performing comprehensive cleaning and enhancement of both geometry and texture — including mesh simplification, texture refinement, object placement, and automatic…
wait, speaking of false dichotomies---during your phd, you *can* write code, dive into data and systems, collaborate with a team, and build useful things---all while enjoying complete openness and the freedom to pursue what *genuinely* excites you.
i left my phd before joining openai working in industry demands more rigor – you don’t just need to convince reviewer 2 with a nice graph and an ego-cite, it better actually work if it’s underwriting billions in research investment not saying it always pans out that way in…
New paper on the generalization of Flow Matching arxiv.org/abs/2506.03719 🤯 Why does flow matching generalize? Did you know that the flow matching target you're trying to learn **can only generate training points**? with @Qu3ntinB, Anne Gagneux & Rémi Emonet 👇👇👇
So excited to announce the DCVLR (Data Curation for Vision-Language Reasoning) competition at NeurIPS 2025, led by @Oumi_PBC and sponsored by @LambdaAPI! 🌟open-data 🌟 🤖 open-models 🤖 💻 open-source 💻 💪anyone can compete for free 💪 dcvlr-neurips.github.io 🧵 1 / n
This is really BAD news of LLM's coding skill. ☹️ The best Frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans still excel. LiveCodeBench Pro, a benchmark composed of problems from Codeforces, ICPC, and IOI (“International…