Alan Dao
@alandao_ai
AI Researcher at Menlo Research. Author of Lucy, Jan-nano, Ichigo, AlphaMaze, and various other works at Menlo Research.
Stanford Graduate School of Business students say the MBA they paid about $250,000 for leans on decade‑old slides, and their classes feel locked in 2010. the grade often hinges on writing a clever prompt or paying for a premium AI tool, not on grasping the math behind decisions.…
some new thing behind this release
🚀 GSPO: Group Sequence Policy Optimization — a breakthrough RL algorithm for scaling LMs! 🔹 Sequence-level optimization — theoretically sound & matching reward 🔹 Rock-solid stability for large MoE models — no collapse 🔹 No hacks like Routing Replay — simpler, cleaner…
for those struggling with deep learning math I resurface this tutorial I made last year in 27min I show my process by working through QHAdam optimizer which is this alphabet soup of letters
I believe next month will be more
what an intense month it has been. free models we received in the last few weeks: - kimi k2 1t 32b - qwen3 235b a22b non-thinking - qwen3 coder 480b a35b - new magistral 24b from mistral - qwen3 235b a22b reasoning - step3 321b a38b with new mfa attention - smollm3 3b - intern…
Have been awhile for InternLM, a cool new project!
🚀Introducing Intern-S1, our most advanced open-source multimodal reasoning model yet! 🥳Strong general-task capabilities + SOTA performance on scientific tasks, rivaling leading closed-source commercial models. 🥰Built upon a 235B MoE language model and a 6B Vision encoder.…
🚀Introducing Intern-S1, our most advanced open-source multimodal reasoning model yet! 🥳Strong general-task capabilities + SOTA performance on scientific tasks, rivaling leading closed-source commercial models. 🥰Built upon a 235B MoE language model and a 6B Vision encoder.…
The latest MLX has a CUDA back-end! To get started: pip install "mlx[cuda]" With the same codebase you can develop locally, run your model on Apple silicon, or in the cloud on Nvidia GPUs. MLX is designed around Apple silicon - which has a unified memory architecture. It uses…
TL;DR: Open source AI just closed the gap. (at least on benchmark scores) Qwen3-Thinking (235B) is now shoulder to shoulder with the frontier giants. AI just changed forever. No strings: Apache 2. Download it now.
🚀 We’re excited to introduce Qwen3-235B-A22B-Thinking-2507 — our most advanced reasoning model yet! Over the past 3 months, we’ve significantly scaled and enhanced the thinking capability of Qwen3, achieving: ✅ Improved performance in logical reasoning, math, science & coding…
The wait is over! Meet Step 3 — the groundbreaking multimodal LLM from StepFun! 🚀 MoE architecture (321B total params, 38B active) 💡 Rivals OpenAI o3, Gemini 2.5 Pro, and Claude Opus 4 in performance 🖥️ Optimized for China’s domestic AI chips StepFun just announced: Step 3…
Another massive open-source LLM is coming soon… and it's from a Chinese company too.
The spotlight is now on Qwen3-235B-A22B-Thinking-2507: our most powerful thinking model to date 🎉 This update brings comprehensive improvements in reasoning and general performance, and it's the culmination of our efforts in scaling RL. Enjoy! 🍻
🚀 We’re excited to introduce Qwen3-235B-A22B-Thinking-2507 — our most advanced reasoning model yet! Over the past 3 months, we’ve significantly scaled and enhanced the thinking capability of Qwen3, achieving: ✅ Improved performance in logical reasoning, math, science & coding…
OpenAI may declare AGI achievement this year to exit the Microsoft contract.
In 2025 you can get all the education you need. Just use open weight model to make sure it will not end up with 200$ per month subscription.
AI is the best teacher. You can ask the same thing in 10 different ways, and it’ll keep explaining until the concept is clear. That’s the most valuable aspect of AI for me.
AI is the best teacher. You can ask the same thing in 10 different ways, and it’ll keep explaining until the concept is clear. That’s the most valuable aspect of AI for me.
Check this post-train out if you like small models :) x.com/casper_hansen_…
Recipe to post-train Qwen3 1.7B into a DeepResearch model What does it mean for something small to think deeply? Meet Lucy, a post‑trained Qwen3‑1.7B as a DeepResearch model based on @willccbb's verifiers. Primary Rule-based Rewards: - Answer correctness We check whether the…
This is sick
🎉 Big news! Google Colab now comes with Gradio pre-installed (v5.38)! No more pip install gradio needed - just import and start building AI apps instantly. Thanks to @GoogleColab team and @thechrisperry for making Gradio more accessible to millions of developers worldwide! 🙏