Haize Labs
@haizelabs
build ai systems you can trust.
Today is a bad, bad day to be a language model. Today, we announce the Haize Labs manifesto. @haizelabs haizes (automatically red-teams) AI systems to preemptively discover and eliminate any failure mode We showcase below one particular application of haizing: jailbreaking the…
With @haizelabs, @leonardtang_ is helping companies build ai systems you can trust. For the AI for Work & life Hackathon, their track is for folks to create intuitive interfaces that allow domain experts to VERIFY and STEER AI systems.
are you using llms as a judge? come check out our talk with @haizelabs on how to scale them up the talk is this wednesday, make sure to sign up, if you can't make it we'll send you the study notes and recording maven.com/p/4534a3/scali…
spoken is litellm for voice models.
New open-source alert! spoken: a unified abstraction over realtime speech-to-speech foundation models. Run any S2S model from OpenAI, Google, Amazon — one interface with one line of code.
Multimodal Verdict is here!
Verdict systems can now judge image inputs. Score product photos. Ad creatives. UI mockups. Haize anime birds. Judge any thing for any quality—and understand why.
if you're thinking about appliny g llm as a judge don't miss our talk with @haizelabs maven.com/p/4534a3/scali…
We are thrilled to announce j1-nano & j1-micro, two absurdly tiny reward models competitive with Claude Opus, GPT-4o-mini, Llama-3-70B, and more. These models have no business being this powerful. But, with the right form of Judge-Time Scaling via SPCT, j1-nano and j1-micro…
You don’t need frontier lab resources for frontier lab automated LLM evaluation. To prove this, we’re open-sourcing j1-nano and j1-micro: two absurdly tiny (600M & 1.7B parameters) but mighty reward models competitive with orders-of-magnitude larger peers. j1-nano and j1-micro…
Scaling Judge-Time Compute! ⚖️🚀 I am SUPER EXCITED to publish the 121st episode of the Weaviate Podcast featuring Leonard Tang (@leonardtang_), Co-Founder of Haize Labs (@haizelabs)! Evals are one of the hottest topics out there for people building AI systems. Leonard is…
Last night I hosted the AI for Work & Life Hackathon with @rkhkimx from @BainCapVC & @seidtweets from @chapterone for 100+ hackers in NYC! 🌃 It was some of the best project I've ever seen. Here's a thread of the 6 winning team's demos across AI for Life 🧺 & Work👩🏻💻:
2/ AI Evaluation Platform Alex built a platform that allows teams looking to improve the performance of their AI systems to autonomously conduct expert interviews. They won @haizelabs track on Domain Expertise!
Announcing the 6 tracks & sponsors for the AI for Work & Life Hackathon, happening in NYC next week 5/8 & 5/9 ➡️Register Here: lu.ma/worklifeAI AI for work - tracks & sponsors👨💻: 🕊️ @haizelabs: Domain Experts - Build AI tools that demonstrate exceptional domain…
How do we understand & evaluate the fuzzy space of LLM outputs? We clone your Subject Matter Expert annotator into a Judge. Introducing EVALS EVALS EVALS Create a custom Judge that works for you
fun time reading up on Self-Principled Critique Tuning from @deepseek_ai be on the lookout for the next session!
nyc ai 🚀🚀🚀 scintillating discussion on this fine sunday morning. much more to come. @qw3rtman @willccbb @haizelabs
life is work🌱& work is life📷 build for both! Hosting an AI for work & life hackathon 5/8 & 9 in NYC w @seidtweets @rkhkimx from @chapterone & @BainCapVC Founders of @haizelabs @spurtest_ @SilnaHealth @Cassi_Home @florafaunaai & lore are judging Details & Registration Link⬇️
What do @crewAIInc, @ag2oss, @boomi, @browserbasehq, @haizelabs, @Komodor_com and Layer have in common? New #AGNTCY members! Alongside @LangChainAI and @rungalileo, we’ve welcomed dozens building the #InternetOfAgents infrastructure. Read our blog: cs.co/60192OX5d
the ultimate extant problem in ai leonardtang.me/blog/ultimate-…