Junyang Lin
@JustinLin610
Building Qwen models @Alibaba_Qwen ❤️ 🍵 ☕️ 🍷 🥃
Qwen2.5-Max is here. Looks good at benchmarks and I hope you guys can give it a try and see how you feel about this new model! Qwen Chat: chat.qwenlm.ai (choose Qwen2.5-Max for the model) API is available through Alibaba Cloud service.| Happy new year!
The burst of DeepSeek V3 has attracted attention from the whole AI community to large-scale MoE models. Concurrently, we have been building Qwen2.5-Max, a large MoE LLM pretrained on massive data and post-trained with curated SFT and RLHF recipes. It achieves competitive…
Qwen 3 just dropped an open-source agentic coding model! Claims it's comparable to Sonnet-4! Will be on LiveBench and CodeLLM shortly Thanks to Qwen for keeping open source alive 👏👏
Thanks for staying up with us!
We're now serving Qwen3-Coder-480B-A35B & Qwen3-235B-A22B-2507 at Hyperbolic! Qwen3-Coder-480B achieves results comparable to Claude Sonnet 4 on coding benchmarks, truly amazing! @JustinLin610 and @huybery are the 420 gang in China, keep shipping models until 6 AM China time!…
Qwen3-Coder is now available in Cline 🧵 New 480B parameter model with 35B active parameters. > 256K context window > comparable performance on SWE-bench to Claude Sonnet 4 > SoTA among open source models
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
Quick adaptation! Thx
✅ We’re excited to support @Qwen’s Qwen3-Coder on SGLang! With tool call parser and expert parallelism enabled, it runs smoothly with flexible configurations. Just give it a try! 🔗 github.com/zhaochenyang20…
💥 BREAKING: @Alibaba_Qwen just dropped the worlds leading coding model - a 480B Qwen3 Coder with 35B active parameters and a huge context window! This non-reasoning coder is getting near SOTA at SWE-bench, 68.7 on BFCL (function calling) and 61.8 on Aider! 🧵
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
🚀
✅ Try out @Alibaba_Qwen 3 Coder on vLLM nightly with "qwen3_coder" tool call parser! Additionally, vLLM offers expert parallelism so you can run this model in flexible configurations where it fits.
Nothing more frustrating than seeing "private scaffold" on public benchmark results I love that model providers like Qwen and Mistral are now reporting their results specifically using OpenHands as the scaffold--feels like we're becoming a standard here x.com/Alibaba_Qwen/s…
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
🥳INT4 model for updated Qwen3-235B-A22B: huggingface.co/Intel/Qwen3-23… vLLM MoE seems not working well; yet HF transformers can run pretty well.
It's out! and you can already run inference on the HF model page thanks to @hyperbolic_labs! huggingface.co/Qwen/Qwen3-Cod…
As always, you'll see it on HF first! huggingface.co/Qwen
🚀 Meet Qwen3-Coder, our most advanced agentic code model yet! Kicking off with the open-sourced model Qwen3-Coder-480B-A35B-Instruct, a 480B MoE with 32B active parameters for top coding & agentic tasks. Plus, we're open-sourcing Qwen Code, a CLI tool for agentic programming!…
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
Qwen 3 Coder is on another level . I had it build a sim based on some scaffolds we are trying . The model left me a message in the sim we built !!!!!
Try here! Pokémon! modelscope.cn/studios/Qwen/Q… huggingface.co/spaces/Qwen/Qw…
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
A perfect coding model for MLX on Apple silicon.. Qwen delivered again. Runs quite fast on an M3 Ultra. Running the 4-bit quantized with mlx-lm:
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
Qwen3-Coder-480B-A35B-Instruct + @hyperbolic_labs is now available in anycoder for vibe coding
this is what is not small! boys spent so much time building the Qwen3-Coder after Qwen2.5-Coder. it is much bigger, but based on MoE, and way stronger and smarter than before! not sure we can say competitive with claude sonnet 4 but might be for sure a really good coding agent.…
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
Note that this is a non-thinking model. Thinking model on the way!
Bye Qwen3-235B-A22B, hello Qwen3-235B-A22B-2507! After talking with the community and thinking it through, we decided to stop using hybrid thinking mode. Instead, we’ll train Instruct and Thinking models separately so we can get the best quality possible. Today, we’re releasing…
A small update on Qwen3-235B-A22B, but a big improvement on its quality! We thought about this decision for a long time, but we believe that providing better-quality performance is more important than the unification at this moment. We are still continuing our research on hybrid…
Bye Qwen3-235B-A22B, hello Qwen3-235B-A22B-2507! After talking with the community and thinking it through, we decided to stop using hybrid thinking mode. Instead, we’ll train Instruct and Thinking models separately so we can get the best quality possible. Today, we’re releasing…