Qubitum
@qubitium
I make models go brrr..... @ModelCloudAI founder Committer to SGLang, vLLM, GPTQModel. AI sw + hw { Go, Python, Kotlin } Quantization Accelerator
World's first, I think, non tensor-parallel based, near linear-gpu-scaling quantization speed-up for massive MoE models (DeepSeek) has dropped on GPTQModel as alpha PR. 🥳 Unlike tensor parallel, you can use odd number of gpus and divisibility. LFG! 🤘🚀 github.com/ModelCloud/GPT…
Another reason AMD is winning orders: cheaper, larger unified HBM now outweighs higher achievable TFLOPS for inference. Fundamentally, I think latest/faster/denser HBM has a power/heat dissipation issue/wall causing this. You cannot scale thermals/physics. Imagine 1500w per gpu.
Got a chance to measure Maximum Achievable Matmul TFLOPS on NVIDIA B200. With each new NVIDIA generation the efficiency keeps on dropping: A100: 86.9% H100: 80.3% B200: 77.6% The updated table is here: github.com/stas00/ml-engi…
Ai, drones, and now authentic retro N64 hw emulation. I have no words. I would not be surprised his drone operators lcd controllers are dual screen designs and inspired by ds.
ModRetro's newest product is M64. The best and most authentic way to play your favorite N64 games, bar none. Prepare your wallet and brace your mind. Launches at the same price as the original Nintendo 64. Inflation isn't nostalgic.
Even if you build this yourself (I have done it) you will not save many bucks (if any) or be more hw stable than their prevalidated and full sw stack bundled solution at 5k. How are they doing this? Wow.
Introducing SENTER We are announcing the availability of SENTER, a powerful workstation we built to perform research and train AI without the extreme costs of cloud and API fees. It's designed to put your intelligence, data, privacy, and productivity back into your hands.…
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
*Ever used asyncio and wished you hadn't* I almost spat our my morning tea. Worth a look.
✨Announcing: tinyio! A tiny barebones event loop library for Python. Born out of my frustration with asyncio... GitHub: github.com/patrick-kidger… It's nothing too fancy, just a little library that does one thing well. 🔥
We’ve updated Qwen3 and made excellent progress. The non‑reasoning model now delivers significant improvements across a wide range of tasks and many of its capabilities already rival those of reasoning models. It’s truly remarkable, and we hope you enjoy it!
Bye Qwen3-235B-A22B, hello Qwen3-235B-A22B-2507! After talking with the community and thinking it through, we decided to stop using hybrid thinking mode. Instead, we’ll train Instruct and Thinking models separately so we can get the best quality possible. Today, we’re releasing…
No. This goes against everything that will make the world a better place. Global supply chain dependency should be reduced post covid but why pay more for cpus assembled in Malaysia or Vietnam? Has anyone checked recently their CPU lids with lasered engravings on origin?
Would you buy a Made In America computer from Anduril for 20% more than Chinese-manufactured options from Apple?
Lip-Bu Tan literally broke down crying when he talked about why he became ceo at this strage of his life/career. Did anyone catch this? He is definitely not in it for the money/pay. 15:00 mark. I hope he turns it around. youtube.com/watch?v=wui5-4…
The delayed OpenAI model was prone to cursing. =) We will never get AGI if cursing is RLed to oblivion. Just saying. Stop with the guard rails ffs. This is totally hypothetical and I have no deep knowledge of the actual model. 😇
Rumors that OpenAI delayed their open-source model because of Kimi are fun, but from what I hear: - the model is much smaller than Kimi K2 (<< 1T parameters) - super powerful - but due to some (frankly absurd) reason I can’t say, they realized a big issue just before release, so…
A close associate's wife got romance scammed via whatsapp using love and crypto. Wife wired money from family accounts for the lover/crypto, secretly borrowed from friends, lost everything, ruined the marriage and lost custody of the child (thankfully). Sophisticated scams are…
This concept and methods like this is the future.
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
🔍 How can we build AI agents that reason about the physical world the way humans do (or better) ? Excited to share Causal-PIK: Causality-based Physical Reasoning with a Physics-Informed Kernel, which will be presented next Thursday July 17th at ICML in Vancouver! 👇(1/6)
Huawei has always been subpar in software and good in hw but if this is true, this is low by even my low sw expectations for Huawei.
Baidu Ernie and Huawei PanGu support just added to GPTQModel dev branch. Please test and send feedbacks.
🔥GPTQModel main branch now has both Baidu Ernie and Huawei PanGu model support. github.com/ModelCloud/GPT…