EmbeddedLLM

@EmbeddedLLM

Your open-source AI ally. We specialize in integrating LLM into your business.

Joined October 2023

1KFollowing

802Followers

Pinned

EmbeddedLLM@EmbeddedLLM · Jul 8

Pro-tip for vLLM power-users: free ≈ 90 % of your GPU VRAM in seconds—no restarts required🚀 🚩 Why you’ll want this • Hot-swap new checkpoints on the same card • Rotate multiple LLMs on one GPU (batch jobs, micro-services, A/B tests) • Stage-based pipelines that call…

EmbeddedLLM's tweet image. Pro-tip for vLLM power-users: free ≈ 90 % of your GPU VRAM in seconds—no restarts required🚀

🚩 Why you’ll want this
• Hot-swap new checkpoints on the same card
• Rotate multiple LLMs on one GPU (batch jobs, micro-services, A/B tests)
• Stage-based pipelines that call…

6.0K

Pinned

EmbeddedLLM@EmbeddedLLM · 22 h

🚀 vLLM v0.10.0 is LIVE! Faster, leaner, and more powerful. The TL;DR: ⚡️ Performance & Hardware Highlights: Experimental Async Scheduling: +3-15% throughput by overlapping scheduling & GPU execution. Huge AMD Gains (from our team!): +68.3% throughput on Deepseek-V3/R1 with Full…

EmbeddedLLM's tweet image. 🚀 vLLM v0.10.0 is LIVE! Faster, leaner, and more powerful.

The TL;DR:
⚡️ Performance &amp; Hardware Highlights:
Experimental Async Scheduling: +3-15% throughput by overlapping scheduling &amp; GPU execution.
Huge AMD Gains (from our team!): +68.3% throughput on Deepseek-V3/R1 with Full…

302

Pinned

EmbeddedLLM Retweeted

ZD1908@ZDi____ · Jun 18

So, I know a thing or two about ROCm by now. Therefore, I decided now is a good time to summarize my yappings about AMD, especially in the wake of Advancing AI, into a Substack article. Link in replies⬇️

3.0K

EmbeddedLLM@EmbeddedLLM · 21 h

Intern-S1 is supported in vLLM now, thanks for the joint efforts between the vLLM team and the InternLM team @intern_lm ♥️ The easy way: uv pip install vllm --extra-index-url wheels.vllm.ai/nightly vllm serve internlm/Intern-S1 --tensor-parallel-size 8 --trust-remote-code

IInternLM@intern_lm · Jul 26

🚀Introducing Intern-S1, our most advanced open-source multimodal reasoning model yet! 🥳Strong general-task capabilities + SOTA performance on scientific tasks, rivaling leading closed-source commercial models. 🥰Built upon a 235B MoE language model and a 6B Vision encoder.…

4.0K

EmbeddedLLM Retweeted

vLLM@vllm_project · Jul 26

This amazing Attention-FFN disaggregation implementation from @StepFun_ai , achieves decoding throughput of up to 4,039 tokens per second per GPU under 50ms TPOT SLA, for their 321B-A38B MoE model Step3 served with H800! The implementation is based on vLLM, and we are working…

330

186

27.0K

EmbeddedLLM@EmbeddedLLM · Jul 25

vLLM v0.10.0 just released, and its biggest feature might be a hidden gem: initial support for the OpenAI /responses API. It might sound like a small feature, but this is a huge market signal. The industry is moving in this direction for building the next generation of powerful,…

EmbeddedLLM's tweet image. vLLM v0.10.0 just released, and its biggest feature might be a hidden gem: initial support for the OpenAI /responses API.

It might sound like a small feature, but this is a huge market signal. The industry is moving in this direction for building the next generation of powerful,…

4.0K

EmbeddedLLM Retweeted

vLLM@vllm_project · Jul 22

The @huggingface Transformers ↔️ @vllm_project integration just leveled up: Vision-Language Models are now supported out of the box! If the model is integrated into Transformers, you can now run it directly with vLLM. github.com/vllm-project/v… Great work @RTurganbay 👏

266

21.0K

EmbeddedLLM Retweeted

Red Hat AI@RedHat_AI · Jul 17

Llama 4 quantization support just landed in llm-compressor! ✅ W4A16 quantization ✅ FP4 quantization ✅ Support for Llama 4 tokenizer + model loading This sets the stage for fast, community-optimized Llama 4 models. Jump in to try, test, contribute: github.com/vllm-project/l…

925

EmbeddedLLM@EmbeddedLLM · Jul 8

The two biggest stories in Python performance just collided. vLLM now runs with no GIL.

vvLLM@vllm_project · Jul 8

vLLM runs on free-threaded Python! A group of engineers from @Meta’s Python runtime language team has shown that it’s possible to run vLLM on the nogil distribution of Python. We’re incredibly excited to embrace this future technique and be early adopters 😍

454

EmbeddedLLM@EmbeddedLLM · Jul 4

All the credit goes to the AMD ROCm teams working tirelessly with the feedback. Happy 4th. Run Free and Open.

SSemiAnalysis@SemiAnalysis_ · Jul 4

Happy 4th July! Speed is the Moat & @AnushElangovan & his team Keeps Running Faster & Faster. Still lots of areas where ROCm has gaps but many are already closing

253

15.0K

EmbeddedLLM@EmbeddedLLM · Jul 4

vLLM aura farming with FP8 + DeepSeek-R1 + MTP using 8 GPU.

378

EmbeddedLLM@EmbeddedLLM · Jul 1

Minimax M1 is one of the SOTA open weight model from @MiniMax__AI. Checkout how is it efficiently implemented in vLLM, directly from the team! blog.vllm.ai/2025/06/30/min…

llmarena.ai@lmarena_ai · Jun 27

🔥 Another strong open model with Apache 2.0 license, this one from @MiniMax_AI - places in the top 15. MiniMax-M1 is now live on the Text Arena leaderboard landing at #12. This puts it at equal ranking with Deepseek V3/R1 and Qwen 3! See thread to learn more about its…

120

20.0K

EmbeddedLLM Retweeted

PyTorch@PyTorch · Jun 25

PyTorch and vLLM are both critical to the AI ecosystem and are increasingly being used together for cutting edge generative AI applications, including inference, post-training, and agentic systems at scale. 🔗 Learn more about PyTorch → vLLM integrations and what’s to come:…

318

20.0K

EmbeddedLLM@EmbeddedLLM · Jun 20

Great discussions, @mgoin_! We're thrilled to partner with @RedHat_AI and @AMD to enhance @vllm_project. It's an honor to contribute to such a vibrant and global open-source community. Onwards!

RRed Hat AI@RedHat_AI · Jun 20

vLLM is truly a global phenomenon. From San Francisco to Boston and New York, and across Tokyo, Singapore, and Beijing, meetups are packed with passionate AI developers pushing the boundaries of inference performance. We ❤️ this community!

807

EmbeddedLLM@EmbeddedLLM · Jun 20

All day long. Come on over - ROCm is open.

tterminally onλine εngineer 🇺🇦🇪🇺🇺🇸 ~ new era@tekbog · Jun 20

fighting cuda

122

7.0K

EmbeddedLLM Retweeted

Michael Goin@mgoin_ · Jun 20

Exciting first day talking about @vllm_project in Singapore! I had an great time discussing in depth with @EmbeddedLLM on how we will make @AMD better across the diverse features and workloads in vLLM. So thankful for our vibrant OSS community across the world 🫶

6.0K

EmbeddedLLM@EmbeddedLLM · Jun 19

Let's goooo

vvLLM@vllm_project · Jun 19

vLLM has just reached 50K github stars! Huge thanks to the community!🚀 Together let's bring easy, fast, and cheap LLM serving for everyone✌🏻

232

EmbeddedLLM Retweeted

vLLM@vllm_project · Jun 13

Thank you @AMD @LisaSu @AnushElangovan for Advancing AI together with @vllm_project! We look forward to the continued partnership and pushing the boundary of inference.

218

16.0K