Ryo Kamoi (@RyoKamoi)

Pinned

R

Ryo Kamoi@RyoKamoi · May 23

📢 New paper! FoVer enhances PRMs for step-level verification of LLM reasoning w/o human annotation 🚀 We synthesize training data using formal verification tools and improve LLMs at step-level verification of LLM responses on MATH, AIME, MMLU, BBH, etc. arxiv.org/abs/2505.15960

RyoKamoi's tweet image. 📢 New paper!
FoVer enhances PRMs for step-level verification of LLM reasoning w/o human annotation 🚀
We synthesize training data using formal verification tools and improve LLMs at step-level verification of LLM responses on MATH, AIME, MMLU, BBH, etc.
arxiv.org/abs/2505.15960

4

25

127

69

31.0K

R

Ryo Kamoi@RyoKamoi · Jul 15

We updated our VisOnlyQA paper for #COLM2025! * LVLMs exhibit weak geometric perception even on geometric shapes with 2–3 lines 😭 * Gemini 2.5 Pro largely improves over prior models on charts and chemistry 😳 but still struggles with geometric shapes 😖 arxiv.org/abs/2412.00947

RRyo Kamoi@RyoKamoi · Jul 8

Our paper VisOnlyQA has been accepted to @COLM_conf #COLM2025! See you in Montreal🍁 We find that even recent Vision Language Models struggle with simple questions about geometric properties in images, such as "What is the degree of angle AOD?"🧐 arxiv.org/abs/2412.00947

0

2

11

0

3.0K

R

Ryo Kamoi@RyoKamoi · Jun 28

HRScene got accepted at #ICCV2025! HRScene is a novel unified benchmark for high-resolution image understanding with 25 scenes and 2 NIAH tests. Home page: yszh8.github.io/hrscene/ (Sorry, EvalAI for submission does not work currently...) My PhD research began with long text…

YYusen Zhang@YusenZhangNLP · Apr 30

🚀 How Far Are VLMs from Effective High-Resolution Image Understanding? 👉 We found: Still far. 🆕 Introducing HRScene Benchmark: 📸 25 Real-world Scenes + 🧪 2 Diagnostic NIAH Tests 🏙️ 8 Categories: Daily, Paper, Urban Planning, etc. 🖼️ Resolution: 1,024 × 1,024 ➡️ 35,503 ×…

0

3

6

0

616

Ryo Kamoi Retweeted

A

Allen School@uwcse · Jun 4

Congratulations to @UW #UWAllen Ph.D. grads @sharma_ashish_2 & @sewon__min, @TheOfficialACM Doctoral Dissertation Award honorees! Sharma won for #AI tools for mental health; Min received honorable mention for efficient, flexible language models. #ThisIsUW news.cs.washington.edu/2025/06/04/all…

1

19

103

5

31.0K

R

Ryo Kamoi@RyoKamoi · May 23

Excited to share our latest work! The development of process reward models (PRMs) is limited by manual labeling of step-level reasoning correctness. In this new paper led by @RyoKamoi, we use formal verification tools — formal logic and theorem proving — to automatically…

RRyo Kamoi@RyoKamoi · May 23

📢 New paper! FoVer enhances PRMs for step-level verification of LLM reasoning w/o human annotation 🚀 We synthesize training data using formal verification tools and improve LLMs at step-level verification of LLM responses on MATH, AIME, MMLU, BBH, etc. arxiv.org/abs/2505.15960

0

8

29

7

4.0K

R

Ryo Kamoi@RyoKamoi · May 20

Check out ChartMuseum from @LiyanTang4 @_grace_kim and many other collaborators from UT! Charts questions take us beyond current benchmarks for math/multi-hop QA/etc., which CoT is very good at, to *visual reasoning*, which is hard to express with text CoT!

LLiyan Tang@LiyanTang4 · May 20

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B

1

9

34

5

3.0K

R

Ryo Kamoi@RyoKamoi · May 19

Excited to share that I’ll be starting a research internship at Microsoft in Redmond! Looking forward to the research ahead and meeting people in Seattle/Redmond!

0

2

42

1

13.0K

R

Ryo Kamoi@RyoKamoi · May 8

I passed my comprehensive exam and am now a PhD candidate! Thank you to my advisor and collaborators for their continued support!

10

1

154

4

12.0K

Ryo Kamoi Retweeted

T

Tanishq Abraham is at ICML@iScienceLuvr · Apr 28

HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding? "we introduce HRScene, a novel unified benchmark for HRI understanding with rich scenes. HRScene incorporates 25 real-world datasets and 2 synthetic diagnostic datasets with resolutions ranging from…

3

11

35

18

4.0K

R

Ryo Kamoi@RyoKamoi · Apr 30

🎉Our paper on fairness of multidoc summarization has received an SAC award at NAACL 2025! 🥳 We appreciate the recognition from senior area chairs. @HaoyuanLi9 and @YusenZhangNLP will present our work: Posters (Exhibit Hall), Session H: Oral/Poster 5, Thursday May 1,…

NNAACL HLT 2025@naaclmeeting · Apr 25

🟢 Announcing the #NAACL2025 Award Winners! The Best Paper and Best Theme Paper winners will present at our closing session 2025.naacl.org/blog/best-pape…

0

5

20

0

2.0K

Ryo Kamoi Retweeted

Y

Yusen Zhang@YusenZhangNLP · Apr 30

🚀 How Far Are VLMs from Effective High-Resolution Image Understanding? 👉 We found: Still far. 🆕 Introducing HRScene Benchmark: 📸 25 Real-world Scenes + 🧪 2 Diagnostic NIAH Tests 🏙️ 8 Categories: Daily, Paper, Urban Planning, etc. 🖼️ Resolution: 1,024 × 1,024 ➡️ 35,503 ×…

1

5

14

4

4.0K