Haize Labs

@haizelabs

build ai systems you can trust.

Joined January 2024

0Following

5KFollowers

Pinned

Haize Labs@haizelabs · Jun 12, 2024

Today is a bad, bad day to be a language model. Today, we announce the Haize Labs manifesto. @haizelabs haizes (automatically red-teams) AI systems to preemptively discover and eliminate any failure mode We showcase below one particular application of haizing: jailbreaking the…

162

1.0K

785

383.0K

Pinned

Haize Labs Retweeted

jacky (:@Jhuang0804 · May 9

With @haizelabs, @leonardtang_ is helping companies build ai systems you can trust. For the AI for Work & life Hackathon, their track is for folks to create intuitive interfaces that allow domain experts to VERIFY and STEER AI systems.

959

Haize Labs Retweeted

jason liu@jxnlco · Jun 30

are you using llms as a judge? come check out our talk with @haizelabs on how to scale them up the talk is this wednesday, make sure to sign up, if you can't make it we'll send you the study notes and recording maven.com/p/4534a3/scali…

6.0K

Haize Labs@haizelabs · Jun 27

spoken is litellm for voice models.

LLeonard Tang@leonardtang_ · Jun 27

New open-source alert! spoken: a unified abstraction over realtime speech-to-speech foundation models. Run any S2S model from OpenAI, Google, Amazon — one interface with one line of code.

880

Haize Labs@haizelabs · Jun 25

Multimodal Verdict is here!

LLeonard Tang@leonardtang_ · Jun 25

Verdict systems can now judge image inputs. Score product photos. Ad creatives. UI mockups. Haize anime birds. Judge any thing for any quality—and understand why.

903

Haize Labs Retweeted

jason liu@jxnlco · Jun 19

if you're thinking about appliny g llm as a judge don't miss our talk with @haizelabs maven.com/p/4534a3/scali…

3.0K

Haize Labs@haizelabs · May 27

We are thrilled to announce j1-nano & j1-micro, two absurdly tiny reward models competitive with Claude Opus, GPT-4o-mini, Llama-3-70B, and more. These models have no business being this powerful. But, with the right form of Judge-Time Scaling via SPCT, j1-nano and j1-micro…

LLeonard Tang@leonardtang_ · May 27

You don’t need frontier lab resources for frontier lab automated LLM evaluation. To prove this, we’re open-sourcing j1-nano and j1-micro: two absurdly tiny (600M & 1.7B parameters) but mighty reward models competitive with orders-of-magnitude larger peers. j1-nano and j1-micro…

3.0K

Haize Labs Retweeted

Connor Shorten@CShorten30 · May 12

Scaling Judge-Time Compute! ⚖️🚀 I am SUPER EXCITED to publish the 121st episode of the Weaviate Podcast featuring Leonard Tang (@leonardtang_), Co-Founder of Haize Labs (@haizelabs)! Evals are one of the hottest topics out there for people building AI systems. Leonard is…

20.0K

Haize Labs Retweeted

jacky (:@Jhuang0804 · May 11

Last night I hosted the AI for Work & Life Hackathon with @rkhkimx from @BainCapVC & @seidtweets from @chapterone for 100+ hackers in NYC! 🌃 It was some of the best project I've ever seen. Here's a thread of the 6 winning team's demos across AI for Life 🧺 & Work👩🏻‍💻:

175

18.0K

Haize Labs Retweeted

jacky (:@Jhuang0804 · May 11

2/ AI Evaluation Platform Alex built a platform that allows teams looking to improve the performance of their AI systems to autonomously conduct expert interviews. They won @haizelabs track on Domain Expertise!

1.0K

Haize Labs@haizelabs · May 4

evalsevalsevals.com

LLeonard Tang@leonardtang_ · May 4

EVALS EVALS EVALS Core Research @AutinMitra

1.0K

Haize Labs Retweeted

jacky (:@Jhuang0804 · May 2

Announcing the 6 tracks & sponsors for the AI for Work & Life Hackathon, happening in NYC next week 5/8 & 5/9 ➡️Register Here: lu.ma/worklifeAI AI for work - tracks & sponsors👨‍💻: 🕊️ @haizelabs: Domain Experts - Build AI tools that demonstrate exceptional domain…

14.0K

Haize Labs Retweeted

Leonard Tang@leonardtang_ · Apr 30

How do we understand & evaluate the fuzzy space of LLM outputs? We clone your Subject Matter Expert annotator into a Judge. Introducing EVALS EVALS EVALS Create a custom Judge that works for you

36.0K

Haize Labs@haizelabs · Apr 27

fun time reading up on Self-Principled Critique Tuning from @deepseek_ai be on the lookout for the next session!

LLeonard Tang@leonardtang_ · Apr 27

nyc ai 🚀🚀🚀 scintillating discussion on this fine sunday morning. much more to come. @qw3rtman @willccbb @haizelabs

2.0K

Haize Labs@haizelabs · Apr 27

we do too

kkhushi@khushkhushkhush · Apr 27

god i love technical women

2.0K

Haize Labs Retweeted

jacky (:@Jhuang0804 · Apr 24

life is work🌱& work is life📷 build for both! Hosting an AI for work & life hackathon 5/8 & 9 in NYC w @seidtweets @rkhkimx from @chapterone & @BainCapVC Founders of @haizelabs @spurtest_ @SilnaHealth @Cassi_Home @florafaunaai & lore are judging Details & Registration Link⬇️

174

48.0K

Haize Labs Retweeted

Outshift by Cisco@outshiftbycisco · Apr 21

What do @crewAIInc, @ag2oss, @boomi, @browserbasehq, @haizelabs, @Komodor_com and Layer have in common? New #AGNTCY members! Alongside @LangChainAI and @rungalileo, we’ve welcomed dozens building the #InternetOfAgents infrastructure. Read our blog: cs.co/60192OX5d

700

Haize Labs Retweeted

Leonard Tang@leonardtang_ · Apr 14

the ultimate extant problem in ai leonardtang.me/blog/ultimate-…

2.0K