Jake Tae
@jaesungtae
ML @OptiverGlobal, CS + Math @Yale. ex-@huggingface, @Meta.
1/ Excited to share our new work which we鈥檝e been working on since past year: TESS 2! TESS 2 is a 7B instruction-tuned diffusion LM that can perform close to AR counterparts for general QA tasks, trained by adapting from an existing pretrained AR model. 馃У
Excited to be at ACL 2025 next week to talk about TESS 2! My coauthor @jaesungtae will present it at 9am on Wed 30th in the Machine learning for NLP session - or just feel free to reach out and ask for a chat! Happy to talk anything post-training :)
We trained a diffusion LM! 馃攣 Adapted from Mistral v0.1/v0.3. 馃搳 Beats AR models in GSM8k when we finetune on math data. 馃搱 Performance improves by using more test-time compute (reward guidance or more diffusion steps). Check out @jaesungtae's thread for more details!