Chris Fifty (@fifty_chris)

Chris Fifty Retweeted

S

Shirley Wu@ShirleyYXWu · Jun 16

Even the smartest LLMs can fail at basic multiturn communication Ask for grocery help → without asking where you live 🤦‍♀️ Ask to write articles → assumes your preferences 🤷🏻‍♀️ ⭐️CollabLLM (top 1%; oral @icmlconf) transforms LLMs from passive responders into active collaborators.…

8

58

196

98

58.0K

C

Chris Fifty@fifty_chris · Jan 28

My fellow code monkeys (@jordanjuravsky @ryansehrlich) and I are excited to release CodeMonkeys: a system for solving SWE-bench issues specifically designed to leverage test-time compute! CodeMonkeys solves 57.4% of issues on SWE-bench Verified. A core component of our system…

BBradley Brown@brad19brown · Jan 28

My fellow code monkeys (@jordanjuravsky @ryansehrlich) and I are excited to release CodeMonkeys: a system for solving SWE-bench issues specifically designed to leverage test-time compute! CodeMonkeys solves 57.4% of issues on SWE-bench Verified. A core component of our system…

5

15

76

18

11.0K

C

Chris Fifty@fifty_chris · Jan 28

My fellow code monkeys (@jordanjuravsky @ryansehrlich) and I are excited to release CodeMonkeys: a system for solving SWE-bench issues specifically designed to leverage test-time compute! CodeMonkeys solves 57.4% of issues on SWE-bench Verified. A core component of our system…

BBradley Brown@brad19brown · Jan 28

My fellow code monkeys (@jordanjuravsky @ryansehrlich) and I are excited to release CodeMonkeys: a system for solving SWE-bench issues specifically designed to leverage test-time compute! CodeMonkeys solves 57.4% of issues on SWE-bench Verified. A core component of our system…

2

9

24

1

2.0K

Chris Fifty Retweeted

B

Bradley Brown@brad19brown · Jan 28

My fellow code monkeys (@jordanjuravsky @ryansehrlich) and I are excited to release CodeMonkeys: a system for solving SWE-bench issues specifically designed to leverage test-time compute! CodeMonkeys solves 57.4% of issues on SWE-bench Verified. A core component of our system…

6

34

131

56

53.0K

Chris Fifty Retweeted

S

Shirley Wu@ShirleyYXWu · Apr 29, 2024

Thrilled to release 🌟STaRK 🌟 - A large-scale LLM retrieval benchmark on semi-structured knowledge bases. While LLMs excel at reasoning and semantic retrieval, they struggle with more complex tasks. Especially when real-world user queries require a combination of unstructured…

1

56

220

138

83.0K

Chris Fifty Retweeted

C

Camilo Ruiz@_camiloruiz · Dec 13, 2023

Can deep learning work on small data with far more features than samples? We present PLATO: a method that achieves the state-of-the-art on such datasets by using prior domain information! neurips.cc/virtual/2022/p… 🧵 Published in #NeurIPS2023 with @ren_hongyu @kexinhuang5 @jure

3

128

490

317

78.0K