Qubitum

@qubitium

I make models go brrr..... @ModelCloudAI founder Committer to SGLang, vLLM, GPTQModel. AI sw + hw { Go, Python, Kotlin } Quantization Accelerator

Earth

Joined February 2020

4KFollowing

914Followers

Pinned

Qubitum@qubitium · May 2

World's first, I think, non tensor-parallel based, near linear-gpu-scaling quantization speed-up for massive MoE models (DeepSeek) has dropped on GPTQModel as alpha PR. 🥳 Unlike tensor parallel, you can use odd number of gpus and divisibility. LFG! 🤘🚀 github.com/ModelCloud/GPT…

qubitium's tweet card. Beta Quality: You need to install the python 3.13t t-edition: add-apt-repository ppa:deadsnakes/ppa apt install python3-13-no-gil Install latest rust: curl --proto '=https' --t...

992

Pinned

Qubitum@qubitium · Jul 9

Another reason AMD is winning orders: cheaper, larger unified HBM now outweighs higher achievable TFLOPS for inference. Fundamentally, I think latest/faster/denser HBM has a power/heat dissipation issue/wall causing this. You cannot scale thermals/physics. Imagine 1500w per gpu.

SStas Bekman@StasBekman · Jul 7

Got a chance to measure Maximum Achievable Matmul TFLOPS on NVIDIA B200. With each new NVIDIA generation the efficiency keeps on dropping: A100: 86.9% H100: 80.3% B200: 77.6% The updated table is here: github.com/stas00/ml-engi…

183

Qubitum@qubitium · Jul 24

Ai, drones, and now authentic retro N64 hw emulation. I have no words. I would not be surprised his drone operators lcd controllers are dual screen designs and inspired by ds.

PPalmer Luckey@PalmerLuckey · Jul 24

ModRetro's newest product is M64. The best and most authentic way to play your favorite N64 games, bar none. Prepare your wallet and brace your mind. Launches at the same price as the original Nintendo 64. Inflation isn't nostalgic.

157

Qubitum@qubitium · Jul 22

Even if you build this yourself (I have done it) you will not save many bucks (if any) or be more hw stable than their prevalidated and full sw stack bundled solution at 5k. How are they doing this? Wow.

AAlignment Lab AI@alignment_lab · Jul 22

Introducing SENTER We are announcing the availability of SENTER, a powerful workstation we built to perform research and train AI without the extreme costs of cloud and API fees. It's designed to put your intelligence, data, privacy, and productivity back into your hands.…

851

Qubitum Retweeted

Qwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

272

1.0K

9.0K

4.0K

1.8M

Qubitum@qubitium · Jul 22

*Ever used asyncio and wished you hadn't* I almost spat our my morning tea. Worth a look.

PPatrick Kidger@PatrickKidger · Jul 22

✨Announcing: tinyio! A tiny barebones event loop library for Python. Born out of my frustration with asyncio... GitHub: github.com/patrick-kidger… It's nothing too fancy, just a little library that does one thing well. 🔥

Qubitum@qubitium · Jul 21

We’ve updated Qwen3 and made excellent progress. The non‑reasoning model now delivers significant improvements across a wide range of tasks and many of its capabilities already rival those of reasoning models. It’s truly remarkable, and we hope you enjoy it!

QQwen@Alibaba_Qwen · Jul 21

Bye Qwen3-235B-A22B, hello Qwen3-235B-A22B-2507! After talking with the community and thinking it through, we decided to stop using hybrid thinking mode. Instead, we’ll train Instruct and Thinking models separately so we can get the best quality possible. Today, we’re releasing…

723

34.0K

Qubitum@qubitium · Jul 21

No. This goes against everything that will make the world a better place. Global supply chain dependency should be reduced post covid but why pay more for cpus assembled in Malaysia or Vietnam? Has anyone checked recently their CPU lids with lasered engravings on origin?

PPalmer Luckey@PalmerLuckey · Jul 20

Would you buy a Made In America computer from Anduril for 20% more than Chinese-manufactured options from Apple?

121

Qubitum@qubitium · Jul 20

Lip-Bu Tan literally broke down crying when he talked about why he became ceo at this strage of his life/career. Did anyone catch this? He is definitely not in it for the money/pay. 15:00 mark. I hope he turns it around. youtube.com/watch?v=wui5-4…

134

Qubitum@qubitium · Jul 13

Love go and even I have to cringe at this. It doesn't make any sense. Oof.. go team should fix this asap.

@@fclc@FelixCLC_ · Jul 13

given: float foo = (1/3)*3 and const float bar = (1/3)*3 foo != bar

144

Qubitum@qubitium · Jul 13

The delayed OpenAI model was prone to cursing. =) We will never get AGI if cursing is RLed to oblivion. Just saying. Stop with the guard rails ffs. This is totally hypothetical and I have no deep knowledge of the actual model. 😇

YYuchen Jin@Yuchenj_UW · Jul 13

Rumors that OpenAI delayed their open-source model because of Kimi are fun, but from what I hear: - the model is much smaller than Kimi K2 (<< 1T parameters) - super powerful - but due to some (frankly absurd) reason I can’t say, they realized a big issue just before release, so…

284

Qubitum@qubitium · Jul 13

A close associate's wife got romance scammed via whatsapp using love and crypto. Wife wired money from family accounts for the lover/crypto, secretly borrowed from friends, lost everything, ruined the marriage and lost custody of the child (thankfully). Sophisticated scams are…

282

Qubitum@qubitium · Jul 12

This concept and methods like this is the future.

SSukjun (June) Hwang@sukjun_hwang · Jul 11

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

310

Qubitum Retweeted

Carlota Parés-Morlans@carlotapares · Jul 10

🔍 How can we build AI agents that reason about the physical world the way humans do (or better) ? Excited to share Causal-PIK: Causality-based Physical Reasoning with a Physics-Informed Kernel, which will be presented next Thursday July 17th at ICML in Vancouver! 👇(1/6)

161

102

23.0K

Qubitum@qubitium · Jul 3

Huawei has always been subpar in software and good in hw but if this is true, this is low by even my low sw expectations for Huawei.

T@ ·

162

Qubitum@qubitium · Jul 3

Baidu Ernie and Huawei PanGu support just added to GPTQModel dev branch. Please test and send feedbacks.

MModelCloud@ModelCloudAi · Jul 3

🔥GPTQModel main branch now has both Baidu Ernie and Huawei PanGu model support. github.com/ModelCloud/GPT…

252