Vision Transformers
@vitransformer
Building in ML with blogs 👇 | agentic workflows @lossfunk
Open source is back baby!
🚀 We’re excited to introduce Qwen3-235B-A22B-Thinking-2507 — our most advanced reasoning model yet! Over the past 3 months, we’ve significantly scaled and enhanced the thinking capability of Qwen3, achieving: ✅ Improved performance in logical reasoning, math, science & coding…
Omg finally!
New video on the details of diffusion models: youtu.be/iv-5mZ_9CPY Produced by @welchlabs, this is the first in a small series of 3b1b this summer. I enjoyed providing editorial feedback throughout the last several months, and couldn't be happier with the result.
The first AI-powered structure prediction editor, powered by Boltz-2 with bulk structure prediction, is now available on the LiteFold Platform. Links in comments. Exciting updates coming soon! 🚀
i just would have gotten the claude max plan
the level of incompetence in this post is so confusing, i mean why would you use a transformer here in the first place?
how to get a job at @xai
230k GPUs, including 30k GB200s, are operational for training Grok @xAI in a single supercluster called Colossus 1 (inference is done by our cloud providers). At Colossus 2, the first batch of 550k GB200s & GB300s, also for training, start going online in a few weeks. As Jensen…
ok but china has more!
India has around 5,000 H100 GPUs in total. Elon Musk alone has 150,000. Let that sink in. We're not just behind in the AI race; we're not even on the track. At this rate, only a miracle can pull us back into the game. And if we lose this race now, forget about catching up…
my reaction after seeing this
Agents aren’t reliable. They don’t learn from experience. At @composiohq, we provide skills that evolve with your agents @lightspeedvp gave us $25M to make agents usable
inference seems cracked with torch.compile
AI researchers when they discovered that torch.compile doesn't scale well to real multi-node production training workloads and is a giant footgun
(man this could have helped me in JEE) ;-;
Excited to share Aryabhatta 1.0, our leading model that scores 90.2% on JEE Mains, outperforming frontier models like o4 mini and Gemini Flash 2.5 Trained by us at @AthenaAgentRL , in collaboration with @physics__wallah, using custom RLVR training on 130K+ curated JEE problems…
yep it is a bit finicky in ai studio But i guess we all have a quant now
🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025
seems like everyone gave into the hype
This past week, Harmonic had the opportunity to represent our advanced mathematical reasoning model, Aristotle, at the International Mathematics Olympiad - the most prestigious mathematics competition in the world. To uphold the sanctity of the student competition, the IMO Board…
The future of AI will not be metered. It will be owned by you. Our work continues in Brooklyn.
seems like the new american dream > you get into a good uni > mess with them academically or ethically > get kicked out > start a company $$$ in the process you piss off the establishment and get SF VCs to notice you rescinded is the new dropout !
got rescinded from columbia
seems like a 33% cuda rest pytorch
took a quick look at this paper (just the convolution section) and I have several concerns about the claims: 1) pytorch by default does not execute synchronously on the GPU (host vs. device) and anyone who has forgotten syncs when benchmarking can tell you so 2) TF32 is enabled…
took a quick look at this paper (just the convolution section) and I have several concerns about the claims: 1) pytorch by default does not execute synchronously on the GPU (host vs. device) and anyone who has forgotten syncs when benchmarking can tell you so 2) TF32 is enabled…
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning Trains a DeepSeek-v3-671B model to optimize CUDA kernels using only execution-time speedup as reward. Pipeline: - SFT: Finetuned on 2.1K correct, executable CUDA variants from 6 LLMs across 250…
Llama2 7b kinda changed my life trajectory got an opportunity to do a pod with @Meta, @AIatMeta on Open source LLMs had a great dicussion with @sunil_abraham and @chheplo (folks i look up to) !!!
In this episode of AI Talks by AIM powered by @Meta with @sunil_abraham , Public Policy Director - Data Economy and Emerging Tech at Meta, India, we dive into a powerful conversation about how open-source generative AI is enabling real-world impact, far beyond Silicon Valley.…