Pruna AI
@PrunaAI
The AI optimisation framework
๐งโ๐ซย AI Efficiency Fundamentals - Week 4: Quantization We see quantization everywhere, but do you know the difference between static and dynamic quantization? Even if you do, these slides are great for you. At Pruna, we want to educate about efficient AI, so our lead researcherโฆ

๐ฅ Our team just optimized GLiNER for a major cloud monitoring platform processing millions of logs/second for PII detection. Key Highlights: โข 35ms โ 19ms inference time (nearly 2X speedup!) โข 50% memory reduction โข Zero quality degradation โข โฌ28K-โฌ58K annual savingsโฆ
๐ Pruna v0.2.7: Major Breakthroughs in AI Optimization! โก๏ธ @deepseek_ai Janus Support + Quantization Combo โข Autoregressive Image Generation gets massive speed boost โข Memory impact dramatically reduced with lightning-fast latency โข Quantization + torch.compile workingโฆ

๐ฅ Deploy custom AI models with Pruna optimization speed + @lightningai LitServe serving engine! Lightning-Fast AI Deployments! What makes this awesome: โข โก๏ธ FastAPI-powered serving โข ๐ฏ Built-in batching โข โ๏ธย Define and serve any model (vision, audio, text) โข ๐ Easyโฆ

Yesterday we launched wan-image on @replicate and it can generate amazing animal cinematic pictures
๐ท Introducing Wan Image โ the fastest endpoint for generating beautiful 2K images! From Wan Video, we built Wan Image which generates stunning 2K images in just 3.4 seconds on a single H100 ๐ท Try it on @replicate: replicate.com/prunaai/wan-imโฆ Read our blog for details, examples,โฆ
๐งโ๐ซย AI Efficiency Fundamentals - Week 3: Evaluation Do you know your evaluation measures for measuring efficiency instead of focusing on mere quality? Even if you do, these slides are great for you. At Pruna, we want to educate about efficient AI, so our lead researcher andโฆ

๐จ New Tutorial: Complete Image Generation Model Optimization - From Stable Diffusion to Production We just dropped a comprehensive guide showing how to optimize image generation models with zero quality loss and massive performance gains. What we did: ๐ ๐ฎ๐ ๐ณ๐ฎ๐๐๐ฒ๐ฟโฆ

๐ย Scaling has fueled the latest breakthroughs in language, image, and video models. As model sizes increase, so do the computational and energy expenses of running them. But we can do something about it! In this talk at @munichnlp, our very own Nils Fleischmann, exploresโฆ
๐ ๐ฃ๐ฟ๐๐ป๐ฎ ๐ @๐ด๐ผ๐ธ๐ผ๐๐ฒ๐ฏ ๐ฃ๐ฎ๐ฟ๐๐ป๐ฒ๐ฟ๐๐ต๐ถ๐ฝ ๐จ๐ฝ๐ฑ๐ฎ๐๐ฒ! ๐ฅ Early adopters are reporting great results from our lightning-fast inference platform: Performance Breakthrough: โข โก๏ธ Much faster models โข ๐ฐ Cost reduction โข ๐ฏ Minimal quality degradation Letโs talkโฆ

๐ฑย Do you know the ML.Energy - ml.energy - initiative? Even if you do, make sure to watch this webinar we hosted with @jaewon_chung_cs. He explains the daily thing him and his colleagues face during their goal to measure, understand, optimize, andโฆ

๐ฌ New Tutorial: Complete LLM Optimization Workflow We just released a comprehensive guide showing exactly how to compress and evaluate large language models using our open-source library. The pipeline is simple: Load โ Configure โ Compress โ Evaluate โ Deploy! ๐โฆ

๐งJuicy updated from the Pruna team! We've just dropped some major improvements that'll make your models optimizations run smoother than ever: โก ๐๐ฃ๐จ ๐๐ถ๐๐๐ฟ๐ถ๐ฏ๐๐๐ถ๐ผ๐ป ๐ ๐ฎ๐ฑ๐ฒ ๐๐ฎ๐๐: Pruna now supports accelerate for models distributed across multiple GPUs.โฆ

๐งโ๐ซย AI Efficiency Fundamentals - Week 2: Compression Do you know how to maximize compute utilization in your GPUs? Even if you do, these slides are great for you. At Pruna, we want to educate about efficient AI, so our lead researcher and founder @bertrand_charp prepared aโฆ

Say hello to Sara Han, the newest member of our Developer Advocacy team! With a laptop and his puppy on her lap, she'll help build connections between Pruna and developers to make models faster, cheaper, smaller and greener. Her time at @argilla_io and @huggingface, combinedโฆ

๐ฑย Compressing a single AI model endpoint can save 2t CO2e per year! In comparison, a single EU person consumes ~10t CO2 per year. Last week, our compressed Flux-Schnell endpoint on @replicate has run ๐ฎ๐ ๐๐ถ๐บ๐ฒ๐ ๐ผ๐ป ๐๐ญ๐ฌ๐ฌ ๐ผ๐๐ฒ๐ฟ ๐ฎ ๐๐ฒ๐ฒ๐ธ๐. For each run, the modelโฆ

FLUX.1 Kontext [dev] dropped just hours ago and the community is already hacking ๐ Our friends @PrunaAI made it 5x faster in just a few hours. This is what open-source is all about: remix, build, share. We love to see it! Run it here: replicate.com/prunaai/flux-kโฆโฆ
Black Forest Labs have released their much anticipated open source version of Kontext. FLUX.1 Kontext [dev] is now available on Replicate: replicate.com/black-forest-lโฆ We love open source, and we can't wait to see what the community does with this.