Tim Xiao
@TimZXiao
PhD student in Machine Learning @ University of Tübingen · IMPRS-IS scholar
✨ New paper: Flipping Against All Odds We found that large language models (LLMs) can describe probabilities—but fail to sample from them faithfully. Yes, even flipping a fair coin is hard. 🪙 🧵 Here’s what we learned—and how we fixed it. 🔗arxiv.org/abs/2506.09998 1/

I was surprised when I first saw that the black magic of prompt engineering can marry classical ML methods in such a natural way - simply asking an LLM to do rejection sampling makes it a more rational agent. Cannot wait to see how we may similarly design better "LLM algorithms".
✨ New paper: Flipping Against All Odds We found that large language models (LLMs) can describe probabilities—but fail to sample from them faithfully. Yes, even flipping a fair coin is hard. 🪙 🧵 Here’s what we learned—and how we fixed it. 🔗arxiv.org/abs/2506.09998 1/
Verbalized machine learning treats LLMs with prompts as function approximators. Building on this, @TimZXiao came up with the idea of studying whether LLMs can act as samplers. It turns out they’re often biased, even when they appear to understand the target distribution.
✨ New paper: Flipping Against All Odds We found that large language models (LLMs) can describe probabilities—but fail to sample from them faithfully. Yes, even flipping a fair coin is hard. 🪙 🧵 Here’s what we learned—and how we fixed it. 🔗arxiv.org/abs/2506.09998 1/
Great paper by my students @TimZXiao and @johanneszenn and collaborators that applies ideas from Monte Carlo sampling to (black-box) LLM execution to turn LLMs into better calibrated stochastic samplers.
✨ New paper: Flipping Against All Odds We found that large language models (LLMs) can describe probabilities—but fail to sample from them faithfully. Yes, even flipping a fair coin is hard. 🪙 🧵 Here’s what we learned—and how we fixed it. 🔗arxiv.org/abs/2506.09998 1/
Try it out!
🚀 Meet OFTv2 — Orthogonal Finetuning made scalable, finally. ⚡️ 10× faster 💾 3× less GPU memory 🤖 Quantized OFT: plug-and-play on quantized LLMs, better than QLoRA Try it now on Hugging face PEFT: tinyurl.com/ycxswfe7 Website: spherelab.ai/oftv2/ #AI #LLM 🧵1/6
Our @ICCVConference HANDS workshop will be on Oct. 20, PM! We focus on hand-related areas, e.g., hand pose est., hand-object interaction, robotics hand manipulation. hands-workshop.org @NUSingapore @CSatETH @unibirmingham @RealityLabs @AIatMeta @UTokyo_News @meshcapade
We have added some new experiments and analyses to the new version of our paper. Check it out here: arxiv.org/abs/2506.08001. We discovered that despite being generalized to spectrum-preserving training, POET can still preserve minimum hyperspherical energy. This property only…
📢Glad to introduce our paper: Reparameterized LLM Training via Orthogonal Equivalence Transformation (POET)! POET is a new algorithm for efficiently pretraining / finetuning large language models. Its training consists of three geometric phases. spherelab.ai/poet 1/6
Muon is gaining attention for its use of orthogonalization, making it a natural point of comparison with POET. We computed singular value entropy over training steps and find that POET always maintains high entropy. A recent study (arxiv.org/abs/2502.16982) suggests that this is a…
📢Glad to introduce our paper: Reparameterized LLM Training via Orthogonal Equivalence Transformation (POET)! POET is a new algorithm for efficiently pretraining / finetuning large language models. Its training consists of three geometric phases. spherelab.ai/poet 1/6
Checkout our recent work on efficient pretraining for LLM!
📢Glad to introduce our paper: Reparameterized LLM Training via Orthogonal Equivalence Transformation (POET)! POET is a new algorithm for efficiently pretraining / finetuning large language models. Its training consists of three geometric phases. spherelab.ai/poet 1/6
📣 Excited to share our #CVPR2025 Spotlight paper and my internship project @wayve_ai: SimLingo. A Vision-Language-Action (VLA) model that achieves state-of-the-art driving performance with language capabilities. Code: github.com/RenzKa/simlingo Paper: arxiv.org/abs/2503.09594
📢Glad to introduce FormalMATH, a large-scale Lean4 benchmark comprising 5,560 formally verified problems. 📖The benchmark spans from high-school Olympiad challenges to undergraduate-level theorems across diverse domains. The best LLM prover only achieved 16.46% accuracy. 1/4