XMaster96

@_XMaster96

Former Senior AI researcher @Aleph__Alpha EVE Online player since 2013 Co-Founder Pageshift Entertainment - Building the worst best story telling AI

Joined April 2024

63Following

105Followers

Pinned

XMaster96@_XMaster96 · May 1

Introducing Pageshift AI, the thing I was working on over the last couple of months Generate your own audiobook, with just a simple prompt or listen to an existing one from the community pageshift.ai (Currently only really working on desktop end devices)

_XMaster96's tweet image. Introducing Pageshift AI, the thing I was working on over the last couple of months

Generate your own audiobook, with just a simple prompt or listen to an existing one from the community

pageshift.ai
(Currently only really working on desktop end devices)

6.0K

XMaster96@_XMaster96 · Jul 24

Nice, we just got V5 and V6 TPUs 🥳

103

XMaster96@_XMaster96 · Jul 22

And do we get the base model checkpoint?

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

157

XMaster96@_XMaster96 · Jul 22

I can’t sleep right now so I started to read the source code of chatterbox from @resembleai and I really have to say, their audio tokenizer is damn smart, and the reason why their model sounds this good. They are basically doing diffusion inference steps to clean up their audio.…

XMaster96@_XMaster96 · Jul 22

Oh, damn @Alibaba_Qwen never published their pretrained Qwen3-32B model...

XMaster96@_XMaster96 · Jul 18

Somehow Claude is better in writing wrappers for the Gemini API than Gemini is

121

XMaster96@_XMaster96 · Jul 15

just to quickly explain what I am working on. so I need a dynamic sparse Mixture of Expert (MoE) kernel, that allows for a highly uneven batch based routing behavior. In a normal MoE training setting we assume / force an even usage of all experts across the full batch. Which is…

XXMaster96@_XMaster96 · Jul 14

This is now the third time in a row that I was already lying in bed, and went back up because I had a new idea for a parallel algorithm that would turn a really expensive dense multiplication into a really efficient sparse one. It would be so easy if TPUs would allow Vector…

266

XMaster96@_XMaster96 · Jul 14

399

XMaster96@_XMaster96 · Jul 14

I remember when GPT-1 was a joke

TTanishq Abraham back from ICML@iScienceLuvr · Jul 14

I remember when GPT-2 was dangerous

158

XMaster96@_XMaster96 · Jun 29

Oh great, the normies have finally discovered year-old memes

JJudd Rosenblatt@juddrosenblatt · Jun 28

Current AI “alignment” is just a mask Our findings in @WSJ explore the limitations of today’s alignment techniques and what’s needed to get AI right 🧵

343

XMaster96@_XMaster96 · Jun 23

We are re-writing our code base from Torch to JAX right now. And oh boy, it is a good feeling to finally use a XLA-based framework again. This is like waking up from a really long and bad dream

382

XMaster96@_XMaster96 · Jun 16

I was in the audience, and one key point that wasn’t mentioned here was their argument around the Jevons Paradox: the idea that people will consume more content simply because it’s easier and cheaper to access. At pageshift, we strongly believe this will be the case. People…

aa16z@a16z · Jun 6

As AI and entertainment collide, there are huge opportunities, particularly for creators who are becoming fluent in AI tools, and for those who can distinguish themselves by having a high bar for taste. That was one of the key takeaways from @andrewchen's conversation with…

286

XMaster96@_XMaster96 · Jun 4

Don't worry we are on it! Let me explain why it is hard... While transformers are generally amazing, they do not have length generalisation. This means if you want a model that outputs a long consistent text you need to train it specifically for that.

_XMaster96's tweet image. Don't worry we are on it! Let me explain why it is hard...

While transformers are generally amazing, they do not have length generalisation. This means if you want a model that outputs a long consistent text you need to train it specifically for that.

187

XMaster96@_XMaster96 · May 7

One interesting thing we found out while working on improving existing TTS models, was that basically all of the open source audio encoders are god awful and are really holding back current open source TTS models. The best audio encoder we found was the Moshi one while SNAC was…

292

XMaster96@_XMaster96 · May 4

We’re hyped to be at the frontier of the next chapter in entertainment, pushing to build the models that will empower the greatest stories this world has ever seen.

JJonas Andrulis@JonasAndrulis · May 3

I love the new generation of personalized media. Finally I’m no longer dependent on the taste of the normies.

288

XMaster96 Retweeted

Lucas Nestler@_clashluke · Apr 1

It's fake, o3-mini-high 0-shots these See, for example, Problem 2:

13.0K

XMaster96@_XMaster96 · Mar 25

I bet it is going to be solved by the end of 2025

AARC Prize@arcprize · Mar 24

Excited to have Machine Learning Street Talk (@MLStreetTalk) as a launch partner for ARC-AGI-2, featuring a deep dive interview with co-founders @mikeknoop and @fchollet

453

XMaster96@_XMaster96 · Mar 23

I think the most confusing thing about VLLM in its current state is that they are apparently right doing a major re-factor, and so the code base still has lots of duplicate code

XXMaster96@_XMaster96 · Mar 21

I am right now trying to add some custom feature support to VLLM, and I just realized that this is the first time, I am working with a large ML code base I was not heavily involved in writing, since I joined Aleph Alpha in 2021.

371

XMaster96 Retweeted

DHH@dhh · Mar 21

Our S3 exit is going full steam ahead for a final departure this summer (when our 4-year contract expires!). Look at that beautiful Pure NVMe gear! 😍

170

176

5.0K

724

566.0K