XMaster96
@_XMaster96
Former Senior AI researcher @Aleph__Alpha EVE Online player since 2013 Co-Founder Pageshift Entertainment - Building the worst best story telling AI
Introducing Pageshift AI, the thing I was working on over the last couple of months Generate your own audiobook, with just a simple prompt or listen to an existing one from the community pageshift.ai (Currently only really working on desktop end devices)

And do we get the base model checkpoint?
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
I can’t sleep right now so I started to read the source code of chatterbox from @resembleai and I really have to say, their audio tokenizer is damn smart, and the reason why their model sounds this good. They are basically doing diffusion inference steps to clean up their audio.…
Oh, damn @Alibaba_Qwen never published their pretrained Qwen3-32B model...
Somehow Claude is better in writing wrappers for the Gemini API than Gemini is
just to quickly explain what I am working on. so I need a dynamic sparse Mixture of Expert (MoE) kernel, that allows for a highly uneven batch based routing behavior. In a normal MoE training setting we assume / force an even usage of all experts across the full batch. Which is…
This is now the third time in a row that I was already lying in bed, and went back up because I had a new idea for a parallel algorithm that would turn a really expensive dense multiplication into a really efficient sparse one. It would be so easy if TPUs would allow Vector…
This is now the third time in a row that I was already lying in bed, and went back up because I had a new idea for a parallel algorithm that would turn a really expensive dense multiplication into a really efficient sparse one. It would be so easy if TPUs would allow Vector…
I remember when GPT-1 was a joke
I remember when GPT-2 was dangerous
Oh great, the normies have finally discovered year-old memes
Current AI “alignment” is just a mask Our findings in @WSJ explore the limitations of today’s alignment techniques and what’s needed to get AI right 🧵
We are re-writing our code base from Torch to JAX right now. And oh boy, it is a good feeling to finally use a XLA-based framework again. This is like waking up from a really long and bad dream
I was in the audience, and one key point that wasn’t mentioned here was their argument around the Jevons Paradox: the idea that people will consume more content simply because it’s easier and cheaper to access. At pageshift, we strongly believe this will be the case. People…
As AI and entertainment collide, there are huge opportunities, particularly for creators who are becoming fluent in AI tools, and for those who can distinguish themselves by having a high bar for taste. That was one of the key takeaways from @andrewchen's conversation with…
Don't worry we are on it! Let me explain why it is hard... While transformers are generally amazing, they do not have length generalisation. This means if you want a model that outputs a long consistent text you need to train it specifically for that.

One interesting thing we found out while working on improving existing TTS models, was that basically all of the open source audio encoders are god awful and are really holding back current open source TTS models. The best audio encoder we found was the Moshi one while SNAC was…
We’re hyped to be at the frontier of the next chapter in entertainment, pushing to build the models that will empower the greatest stories this world has ever seen.
I love the new generation of personalized media. Finally I’m no longer dependent on the taste of the normies.
It's fake, o3-mini-high 0-shots these See, for example, Problem 2:
I think the most confusing thing about VLLM in its current state is that they are apparently right doing a major re-factor, and so the code base still has lots of duplicate code
I am right now trying to add some custom feature support to VLLM, and I just realized that this is the first time, I am working with a large ML code base I was not heavily involved in writing, since I joined Aleph Alpha in 2021.
Our S3 exit is going full steam ahead for a final departure this summer (when our 4-year contract expires!). Look at that beautiful Pure NVMe gear! 😍