Eldar Kurtić

@_EldarKurtic

Researcher on efficient inference @RedHat_AI & @ISTAustria

Joined July 2018

608Following

719Followers

Eldar Kurtić@_EldarKurtic · 6 h

You can now find GuideLLM at its new address: github.com/vllm-project/g…

RRed Hat AI@RedHat_AI · 6 h

BIG NEWS! 🎉 GuideLLM is officially joining the @vLLM_project! This combines vLLM's high-speed inference with a powerful, dedicated toolkit for real-world performance validation. Moving from PoC to production just got a lot more scientific. Here's how, 1/7:

551

Eldar Kurtić Retweeted

Red Hat AI@RedHat_AI · Jul 22

.@vllm_project office hours return next week! Alongside project updates from @mgoin_, vLLM committers and HPC experts @robertshaw21 + @tms_jr will share how to scale MoE models with llm-d and lessons from real world multi-node deployments. Register: red.ht/office-hours

1.0K

Eldar Kurtić@_EldarKurtic · Jul 14

FP4 models and inference kernels ready for Blackwell GPUs! GPTQ and Hadamard for accuracy, and fused Hadamard for runtime. Check out more details about our work in the thread below 👇

DDan Alistarh@DAlistarh · Jul 14

Announcing our early work on FP4 inference for LLMs! - QuTLASS: low-precision kernel support for Blackwell GPUs - FP-Quant: a flexible quantization harness for Llama/Qwen We reach 4x speedup vs BF16, with good accuracy through MXFP4 microscaling + fused Hadamard rotations.

1.0K

Eldar Kurtić@_EldarKurtic · Jul 8

The @huggingface folks deserve far more credit for being a pillar of open-source and still managing to push out SOTA results across the board, along with a full write-up of the entire model’s lifecycle.

cclem 🤗@ClementDelangue · Jul 8

We just released the best 3B model, 100% open-source, open dataset, architecture details, exact data mixtures and full training recipe including pre-training, mid-training, post-training, and synthetic data generation for everyone to train their own. Let's go open-source AI!

20.0K

Eldar Kurtić@_EldarKurtic · Jun 24

Want to learn more about GuideLLM, the tool used by @charles_irl and @modal_labs' LLM Engine Advisor to easily benchmark LLM inference stack? Join the next vLLM office hours with @ZelenovicSasa , @mgoin_ , Jenny Yi, and @markurtz_ . More details in the thread below 👇

RRed Hat AI@RedHat_AI · Jun 24

Heard about GuideLLM? It's Red Hat’s open-source toolkit for benchmarking real-world LLM inference. Simulate traffic, measure throughput & latency, and ensure your deployment meets its service-level objectives (SLOs). Here's how it works and how to get started - a thread:

2.0K

Eldar Kurtić@_EldarKurtic · Jun 23

The recording of @egallen's and my @PyTorch Day France 2025 and @gosimfoundation talk, "Scaling LLM Inference with vLLM," is now available on PyTorch’s YouTube channel. youtube.com/watch?v=XYh6Xf…

2.0K

Eldar Kurtić@_EldarKurtic · Jun 18

Red Hat team absolutely smashing it!! Integration with axolotl is huge for training

RRed Hat AI@RedHat_AI · Jun 18

🚨 Introducing the Axolotl-LLM Compressor integration, designed to make fine-tuning sparse models easier and more efficient than ever! Now you can fine-tune sparse models for specific data while preserving their sparse structure and recovering any accuracy lost during pruning.…

1.0K

Eldar Kurtić@_EldarKurtic · Jun 18

LLM-Compressor now integrated with Axolotl!

RRed Hat AI@RedHat_AI · Jun 18

592

Eldar Kurtić@_EldarKurtic · Jun 8

Join me to hear about decentralised training, why it works and what opportunities it can unlock 🚀. Many thanks to @Sree_Harsha_N for the invitation!

hharsha@Sree_Harsha_N · Jun 8

@itsmaddox_j will be presenting at the ML Efficiency group - this is going to be a very fun session :D Andrej is super passionate and I hope this will be useful for anyone who'd like to make sense of why decentralized training is exciting. Join us cohere.com/events/Cohere-…

5.0K

Eldar Kurtić@_EldarKurtic · Jun 4

Want to quickly get a feeling for how fast an LLM runs under different workloads (and in different engines)? Look no further, @charles_irl and @modal_labs built a really cool app for it. Pro tip: don't skip the "Executive Summary" and "How to Benchmark", well worth the read!

CCharles 🎉 Frye@charles_irl · Jun 3

Dozens of teams have asked my advice on running LLMs. How fast is @deepseek_ai V3 with vLLM on 8 GPUs? What's the max throughput of @Alibaba_Qwen 2.5 Coder with SGLang on one H100? Running & sharing benchmarks ad hoc was too slow So we built a tiny app, the LLM Engine Advisor

1.0K

Eldar Kurtić@_EldarKurtic · Jun 3

Today at 15:00 CEST, I’ll give a talk at OpenSource@Siemens on efficient inference with LLMs. 📺 The talk will be live-streamed at opensource.siemens.com, followed by a live Q&A. Feel free to tune in and bring your questions! It’s a tutorial-style session covering the basics…

601

Eldar Kurtić Retweeted

clem 🤗@ClementDelangue · May 30

Love this approach by @RedHat_AI. We need more trust & validation in AI and this can help! huggingface.co/RedHatAI

18.0K

Eldar Kurtić Retweeted

Nathan@nathanhabib1011 · May 6

Major props to the contributors who made this release happen 🙌 @JoelNiklaus @_lewtun @ailozovskaya @clefourrier @alvind319 HERIUN @_EldarKurtic @mariagrandury jnanliu @qubvelx Check out the release & try it out: 🔗 github.com/huggingface/li…

605