Hui Chen

@chchenhui

Researcher @NUSComputing; Prev: @NTUsg, Ph.D. @sutdsg, B.Eng. in CS @ZJU_china.

Singapore

Joined June 2017

508Following

100Followers

Pinned

Hui Chen@chchenhui · Jul 4

🤖How well can AI agents conduct open-ended machine learning research? 🚀Excited to share our latest #AI4Research benchmark, MLR-Bench, for evaluating AI agents on open-ended machine learning research!!📈 arxiv.org/pdf/2505.19955 1/

chchenhui's tweet image. 🤖How well can AI agents conduct open-ended machine learning research?

🚀Excited to share our latest #AI4Research benchmark, MLR-Bench, for evaluating AI agents on open-ended machine learning research!!📈

arxiv.org/pdf/2505.19955
1/

616

Hui Chen Retweeted

Jason Wei@_jasonwei · Jul 16

New blog post about asymmetry of verification and "verifier's law": jasonwei.net/blog/asymmetry… Asymmetry of verification–the idea that some tasks are much easier to verify than to solve–is becoming an important idea as we have RL that finally works generally. Great examples of…

242

1.0K

330.0K

Hui Chen Retweeted

CLS@ChengleiSi · May 30

This year, there have been various pieces of evidence that AI agents are starting to be able to conduct scientific research and produce papers end-to-end, at a level where some of these generated papers were already accepted by top-tier conferences/workshops. Intology’s…

220

36.0K

Hui Chen Retweeted

Lilian Weng@lilianweng · May 17

Giving your models more time to think before prediction, like via smart decoding, chain-of-thoughts reasoning, latent thoughts, etc, turns out to be quite effective for unblocking the next level of intelligence. New post is here :) “Why we think”: lilianweng.github.io/posts/2025-05-…

434

3.0K

2.0K

215.0K

Hui Chen@chchenhui · May 17

I have the same feeling. Claude Code might be a better option since its full auto mode already has access to the internet.

NNathan Lambert@natolambert · May 17

Until Codex can access the web and install things in its own environment it'll be pretty meh. Fun UX, but feels pretty lost when I try to use it. Like the model is pushing a really heavy object, but can't budge it at all yet. Obvious that it'll work in the future.

134

Hui Chen Retweeted

Google DeepMind@GoogleDeepMind · May 14

Introducing AlphaEvolve: a Gemini-powered coding agent for algorithm discovery. It’s able to: 🔘 Design faster matrix multiplication algorithms 🔘 Find new solutions to open math problems 🔘 Make data centers, chip design and AI training more efficient across @Google. 🧵

179

1.0K

7.0K

3.0K

2.5M

Hui Chen Retweeted

Jason Wei@_jasonwei · Apr 10

New benchmark for deep research agents! An agent that is creative and persistent should be able to find any piece of information on the open web, even if it requires browsing hundreds of webpages. Models that exercise this ability are like a frictionless interface to the…

643

313

50.0K

Hui Chen Retweeted

DeepSeek@deepseek_ai · Jan 20

🚀 DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Try DeepThink at chat.deepseek.com today! 🐋 1/n

2.0K

7.0K

37.0K

10.0K

12.4M

Hui Chen Retweeted

Sebastian Raschka@rasbt · Oct 26

I just read the "Thinking LLMs: General Instruction Following With Thought Generation" paper (I), which offers a simple yet effective way to improve the response quality of instruction-finetuned LLMs. Thinking of it as a very simple alternative to OpenAI's o1 model, which…

196

990

973

81.0K

Hui Chen Retweeted

Dongfu Jiang@DongfuJiang · Sep 29

LLaMA-3.2-Instruct-Vision is on Vision Arena now! 🤗huggingface.co/spaces/WildVis…

11.0K

Hui Chen Retweeted

Yujie Lu@yujielu_10 · Sep 10

Explore real-world failure cases and see how top VLMs like GPT-4o and Yi-VL-Plus handle visual challenges like object orientation—test it yourself at WildVision Arena in the Failure Case Examples tab. 🔗WildVision-Arena: huggingface.co/spaces/WildVis… Thanks to @XingyuFu2 and other…

21.0K

Hui Chen Retweeted

Omar Khattab@lateinteraction · Sep 4

🔗 Thoughts on Research Impact in AI. Grad students often ask: how do I do research that makes a difference in the current, crowded AI space? This is a blogpost that summarizes my perspective in six guidelines for making research impact via open-source artifacts. Link below.

264

1.0K

247.0K

Hui Chen Retweeted

Bill Yuchen Lin@billyuchenlin · Aug 27

🚨 Introducing WildVision’s datasets for research on vision-language models (VLMs) — ideal for SFT, RLHF, and Eval. One of the first large-scale VLM alignment data collections sourced from human users. - 💬 WildVision-Chat: Human-VLM conversations with images for VLM training…

176

28.0K