Ryo Kamoi
@RyoKamoi
#NLProc PhD candidate @PennStateEECS @RuiZhang_nlp. Intern @Microsoft #OAR. Prev: MS @UTCompSci BE @Keio_ST Intern @AmazonScience. @RyoKamoi_ja
📢 New paper! FoVer enhances PRMs for step-level verification of LLM reasoning w/o human annotation 🚀 We synthesize training data using formal verification tools and improve LLMs at step-level verification of LLM responses on MATH, AIME, MMLU, BBH, etc. arxiv.org/abs/2505.15960



We updated our VisOnlyQA paper for #COLM2025! * LVLMs exhibit weak geometric perception even on geometric shapes with 2–3 lines 😭 * Gemini 2.5 Pro largely improves over prior models on charts and chemistry 😳 but still struggles with geometric shapes 😖 arxiv.org/abs/2412.00947
Our paper VisOnlyQA has been accepted to @COLM_conf #COLM2025! See you in Montreal🍁 We find that even recent Vision Language Models struggle with simple questions about geometric properties in images, such as "What is the degree of angle AOD?"🧐 arxiv.org/abs/2412.00947
HRScene got accepted at #ICCV2025! HRScene is a novel unified benchmark for high-resolution image understanding with 25 scenes and 2 NIAH tests. Home page: yszh8.github.io/hrscene/ (Sorry, EvalAI for submission does not work currently...) My PhD research began with long text…
🚀 How Far Are VLMs from Effective High-Resolution Image Understanding? 👉 We found: Still far. 🆕 Introducing HRScene Benchmark: 📸 25 Real-world Scenes + 🧪 2 Diagnostic NIAH Tests 🏙️ 8 Categories: Daily, Paper, Urban Planning, etc. 🖼️ Resolution: 1,024 × 1,024 ➡️ 35,503 ×…
Congratulations to @UW #UWAllen Ph.D. grads @sharma_ashish_2 & @sewon__min, @TheOfficialACM Doctoral Dissertation Award honorees! Sharma won for #AI tools for mental health; Min received honorable mention for efficient, flexible language models. #ThisIsUW news.cs.washington.edu/2025/06/04/all…
Excited to share our latest work! The development of process reward models (PRMs) is limited by manual labeling of step-level reasoning correctness. In this new paper led by @RyoKamoi, we use formal verification tools — formal logic and theorem proving — to automatically…
📢 New paper! FoVer enhances PRMs for step-level verification of LLM reasoning w/o human annotation 🚀 We synthesize training data using formal verification tools and improve LLMs at step-level verification of LLM responses on MATH, AIME, MMLU, BBH, etc. arxiv.org/abs/2505.15960
Check out ChartMuseum from @LiyanTang4 @_grace_kim and many other collaborators from UT! Charts questions take us beyond current benchmarks for math/multi-hop QA/etc., which CoT is very good at, to *visual reasoning*, which is hard to express with text CoT!
Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B
Excited to share that I’ll be starting a research internship at Microsoft in Redmond! Looking forward to the research ahead and meeting people in Seattle/Redmond!
I passed my comprehensive exam and am now a PhD candidate! Thank you to my advisor and collaborators for their continued support!
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding? "we introduce HRScene, a novel unified benchmark for HRI understanding with rich scenes. HRScene incorporates 25 real-world datasets and 2 synthetic diagnostic datasets with resolutions ranging from…
🎉Our paper on fairness of multidoc summarization has received an SAC award at NAACL 2025! 🥳 We appreciate the recognition from senior area chairs. @HaoyuanLi9 and @YusenZhangNLP will present our work: Posters (Exhibit Hall), Session H: Oral/Poster 5, Thursday May 1,…
🟢 Announcing the #NAACL2025 Award Winners! The Best Paper and Best Theme Paper winners will present at our closing session 2025.naacl.org/blog/best-pape…
🚀 How Far Are VLMs from Effective High-Resolution Image Understanding? 👉 We found: Still far. 🆕 Introducing HRScene Benchmark: 📸 25 Real-world Scenes + 🧪 2 Diagnostic NIAH Tests 🏙️ 8 Categories: Daily, Paper, Urban Planning, etc. 🖼️ Resolution: 1,024 × 1,024 ➡️ 35,503 ×…