Peng Qi

@qi2peng2

Research Lead @OrbyAI. Previously: @AWS AI, $JD AI, PhD @stanfordnlp, UG @Tsinghua_Uni. He/him. Opinions my own.

Joined December 2015

378Following

3KFollowers

Pinned

Peng Qi@qi2peng2 · Mar 13

Is #AI the new #RocketScience? In my new blog post, I explore the similarities and connections between the two seemingly distant relatives, and reflect on what today's AI scientists can learn from their rocket cousins, plus what makes AI science unique: qipeng.me/blog/ai-is-the…

qi2peng2's tweet image. Is #AI the new #RocketScience? In my new blog post, I explore the similarities and connections between the two seemingly distant relatives, and reflect on what today's AI scientists can learn from their rocket cousins, plus what makes AI science unique: qipeng.me/blog/ai-is-the…

2.0K

Pinned

Peng Qi@qi2peng2 · Jul 1

Thanks, Liam, for helping me clean up my bloating Spotify Liked list.

465

Peng Qi@qi2peng2 · Jul 25

How do we prove that #AI can't do #maths? Real Mathematics (yes, "real" is a pun here): a+b+c = (a+b)+c = a+(b+c) AI Mathematics (well, floating point maths, really): >>> 0.1+0.2+0.3 0.6000000000000001 >>> 0.1+(0.2+0.3) 0.6 QED.

526

Peng Qi Retweeted

Yumo Xu@yumo_xu · Jul 25

Excited to share our #ACL2025NLP paper, "𝐂𝐢𝐭𝐞𝐄𝐯𝐚𝐥: 𝐏𝐫𝐢𝐧𝐜𝐢𝐩𝐥𝐞-𝐃𝐫𝐢𝐯𝐞𝐧 𝐂𝐢𝐭𝐚𝐭𝐢𝐨𝐧 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐟𝐨𝐫 𝐒𝐨𝐮𝐫𝐜𝐞 𝐀𝐭𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧"! 📜 If you’re working on RAG, Deep Research and Trustworthy AI, this is for you. Why? Citation quality is…

3.0K

Peng Qi@qi2peng2 · Jul 16

As 🔎 AI deep research agents 🔎 become an essential part of many people's day-to-day work, it is more essential than ever before that we can trust what they produce. When these agents cite sources they claim the report is based on, how much can we actually trust them? In our…

1.0K

Peng Qi@qi2peng2 · Jul 8

People have asked me what the ideal next #agent #benchmark would be. The lazy, short answer is *every benchmark everywhere all at once*. We have already built a great variety of agentic benchmarks targeting different behaviors to evaluate, but we have yet to see a single agent…

PPeng Qi@qi2peng2 · Jul 2

Seven years ago, I co-led a paper called 𝗛𝗼𝘁𝗽𝗼𝘁𝗤𝗔 that has motivated and facilitated many #AI #Agents research works since. Today, I'm asking that you stop using HotpotQA blindly for agents research in 2025 and beyond. In my new blog post, I revisit the brief history of…

2.0K

Peng Qi@qi2peng2 · Jul 7

Interesting result! Reminds me of something similar we discovered back in 2022 where LLMs really struggle to generalize past their training sequence length. Though not as elegant, a simple fix (arxiv.org/abs/2208.02169) could extend the generalization of these LLMs significantly…

RRicardo Buitrago@rbuit_ · Jul 7

Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!

921

Peng Qi@qi2peng2 · Jul 2

"I try to ask myself this question all the time, and I would encourage every #AI researcher to do the same from time to time: what is the problem that we are actually solving here?" #ethics #tech #research

PPeng Qi@qi2peng2 · Jul 2

494

Peng Qi@qi2peng2 · Jul 2

227

122

37.0K

Peng Qi@qi2peng2 · Jun 18

A few weeks ago, I gave myself a "promotion" from "Research Scientist" to "Research Lead" in my online presence. While this is not a huge deal objectively (@OrbyAI is still a relatively small company by any metric and my own personal reach is limited), I think there is…

qi2peng2's tweet image. A few weeks ago, I gave myself a "promotion" from "Research Scientist" to "Research Lead" in my online presence.

While this is not a huge deal objectively (@OrbyAI is still a relatively small company by any metric and my own personal reach is limited), I think there is…

7.0K

Peng Qi@qi2peng2 · May 22

When making great #hiring decisions, we often look for growth potential in a candidate. Will they rise to the occasion when unforeseen challenges arise? Will they grow in the role, and lift up others in the team? Will they still be able to contribute if business direction…

577

Peng Qi@qi2peng2 · May 6

#AI 𝘄𝗿𝗼𝘁𝗲 𝟵𝟵% 𝗼𝗳 𝗺𝘆 𝗰𝗼𝗱𝗲, 𝗻𝗼𝘄 𝘄𝗵𝗮𝘁? Big tech executives and business analysts are racing to share eye-catching statements like "AI will write XX% of the code at MetaCorp by 20YY." How much truth is there to these, and what implications might this have? In…

qi2peng2's tweet image. #AI 𝘄𝗿𝗼𝘁𝗲 𝟵𝟵% 𝗼𝗳 𝗺𝘆 𝗰𝗼𝗱𝗲, 𝗻𝗼𝘄 𝘄𝗵𝗮𝘁?

Big tech executives and business analysts are racing to share eye-catching statements like "AI will write XX% of the code at MetaCorp by 20YY." How much truth is there to these, and what implications might this have?

In…

676

Peng Qi Retweeted

Peter Jansen ( @peterjansen-ai.bsky.social )@peterjansen_ai · Apr 25

I'm recruiting a PhD student in AI & Scientific Discovery (start August 2025), particularly where scientific discovery intersects with code generation. If interested, please e-mail your CV. I'll also be at @naacl organizing the AI & Scientific Discovery Workshop (AISD) & can chat

9.0K

Peng Qi@qi2peng2 · Apr 24

Attending #ICLR2025? Check out @OrbyAI 's work on universal grounding for GUI agents (collab w/ @ysu_nlp & @hhsun1 groups at @osunlp ): x.com/ysu_nlp/status… Also, talk to my colleagues Yanan Xie and Gang Li around the conf or at the Orby booth to hear what we're up to next!

YYu Su@ysu_nlp · Oct 10

People into agents, let me pitch something to you: 🌟 An agent that works across every platform (web, desktop & mobile) 🌟 Visual perception only, no messy & often incomplete HTML or a11y tree 🌟 SOTA performance across 6 agent benchmarks Sounds too good to be true? Continue ⬇️…

886

Peng Qi@qi2peng2 · Apr 14

Are they looking at what's happening in the human world today, and saying "So long, and thanks for all the fish"?

SSundar Pichai@sundarpichai · Apr 14

Introducing DolphinGemma, an LLM fine-tuned on many years of dolphin sound data 🐬 to help advance scientific discovery. We collaborated with @dolphinproject to train a model that learns vocal patterns to predict what sound they might make next. It’s small enough (~400M params)…

985

Peng Qi@qi2peng2 · Apr 10

🚀Big WebDreamer update! We train 💭Dreamer-7B, a small but strong world model for real-world web planning. 💥Beats Qwen2-72B ⚖️Matches #GPT-4o Trained on 3M synthetic examples — and yes, all data + models are open-sourced.

YYu Gu@yugu_nlp · Nov 21

❓Wondering how to scale inference-time compute with advanced planning for language agents? 🙋‍♂️Short answer: Using your LLM as a world model 💡More detailed answer: Using GPT-4o to predict the outcome of actions on a website can deliver strong performance with improved safety and…

15.0K

Peng Qi@qi2peng2 · Mar 28

Non-native speakers sometimes have a unique advantage to language-based humor stemming from their unfamiliarity with idiomatic expressions. I saw an “assembly of god” on the road and thought to myself, “wait, they have a factory to build gods here?”

472

Peng Qi Retweeted

Sasha Rush@srush_nlp · Mar 27

Simons Institute Workshop: "Future of LLMs and Transformers": 21 talks Monday - Friday next week. simons.berkeley.edu/workshops/futu…

527

331

68.0K

Peng Qi@qi2peng2 · Mar 25

So... this happened. Congrats to the @OrbyAI team, great achievement and recognition!

OOrby AI@OrbyAI · Mar 25

We are thrilled to announce that Orby AI has been recognized in the 2025 Enterprise Tech 30 list! So honored to be among such a great group of companies. 🚀 👏 👏 Check out the full report → lnkd.in/gnwYG4Um

245

Peng Qi Retweeted

Conference on Language Modeling@COLM_conf · Mar 20

A bit of a mess around the conflict of COLM with the ARR (and to lesser degree ICML) reviews release. We feel this is creating a lot of pressure and uncertainty. So, we are pushing our deadlines: Abstracts due March 22 AoE (+48hr) Full papers due March 28 AoE (+24hr) Plz RT 🙏

114

21.0K