Matthias Boehm
@matthiasboehm7
Prof at TU Berlin and BIFOLD; research on ML systems and data management. @matthiasboehm7.bsky.social
Update from @SIGMODConf 2025 in Berlin - Day 6: We just ended the conference with another great day of tutorials and workshops. Here is @pinartozun giving an awesome post-lunch @deem_workshop keynote. Thanks to all SIGMOD/PODS participants. 👏👋


Vol:18 No:7 → EinDecomp: Decomposition of Declaratively-Specified Machine Learning and Numerical Computations for Parallel Execution vldb.org/pvldb/vol18/p2…
This amazing Attention-FFN disaggregation implementation from @StepFun_ai , achieves decoding throughput of up to 4,039 tokens per second per GPU under 50ms TPOT SLA, for their 321B-A38B MoE model Step3 served with H800! The implementation is based on vLLM, and we are working…
👋 Attending ACL'2025? Don't miss out the opportunity to ask your questions about careers in NLP on July 30 11-12:00pm from our esteemed panel! Please submit your questions via the form ASAP, the latest by EOD July 28 to be considered. Thanks in advance!
📢ACL 2025 Industry Track 🧠 Do you have questions about a future in industry for NLP ? 👉 Submit your questions in this form: forms.gle/RC1aunT7WTSpfd… #ACL2025NLPcareers #IndustryTrack #NLProc #TechCareers
Congrats to @harrygavr for successfully defending his PhD thesis today. Well done, and there was quite a crowd attending. 👍

Today, Jonas presents a new type of adversarial examples at @icmlconf! We exploit subtle numerical differences between linear algebra backends and craft inputs that yield different predictions from the same model depending on the backend used 🤯 mlsec.org/docs/2025-icml… 1/4
Thanks to all co-authors Florian Barkmann, Philip Toma, @ImantDaunhawer, @vogt_je, @sscdotopen and @val_boeva 📄 Full paper: openreview.net/pdf?id=jnPHZqc… 💻 Code: github.com/BoevaLab/scSSL…
It is a common misconception that @ApacheParquet files are restricted to basic statistics. Footer metadata and offset-based addressing permit user-defined index structures today. Latest @ApacheDataFusio blog from Qi Zhi, Jigao Luo and myself explains how datafusion.apache.org/blog/2025/07/1…
We have another opening for a student assistant position on ML system internals as well as applications. 🎇 Please apply by July 31 here: jobs.tu-berlin.de/en/job-posting…

🚀 Excited to open-source our general-purpose biomedical AI agent Biomni. Biomni A1 (agent) + E1 (env) with 150 specialized tools, 59 databases, and 105 software! With just a few lines of code, you can now automate complex biomedical research with AI agent! E1 only scratches…
🔔VLDB 2025 Keynote Preview: Join Stratos Idreos from @HarvardDASlab as he reimagines systems design in the AI era with his keynote on "Alphabets, Grammars, Calculators & the End of Hand-Crafted Systems". More details: vldb.org/2025/?program-… #VLDB2025 #DataSystems #AI
We're excited to announce the Call for Papers for SaTML 2026, the premier conference on secure and trustworthy machine learning @satml_conf We seek papers on secure, private, and fair learning algorithms and systems. 👉 satml.org/call-for-paper… ⏰ Deadline: Sept 24
One of the first technical reports in a while that has considerable detail. In particular -- preference training tends to be much more painful for multimodal, largely because of lack of calibrated rewards. Some nice details here about overcoming some of those challenges.
Posters speak volumes. 📌 Submit a poster for #PyTorchConf and connect with devs, researchers, and #AI builders at the heart of the #PyTorch community. 🗓️ Conference: Oct 22–23 📍 San Francisco 📌 CFP closes Aug 1 🔗 hubs.ly/Q03tgvdy0
🇪🇺 On 29 June 1985, European leaders chose a symbol that would stand the test of time. Twelve golden stars in a circle. A powerful symbol of who we are and what we stand for: unity and peace, democracy and solidarity. Today, we celebrate 40 years of our common flag.
There is a new open call for a PhD position in our DAMS lab group. 🎇🎆 This position is about research on ML system internals (DSL/APIs, compiler, runtime, accelerators) with involvement in teaching. You can apply until July 18 here tinyurl.com/hyjertvb

“How will my model behave if I change the training data?” Recent(-ish) work w/ @logan_engstrom: we nearly *perfectly* predict ML model behavior as a function of training data, saturating benchmarks for this problem (called “data attribution”).
Update from @SIGMODConf 2025 in Berlin - Day 5: Today, Phil Bernstein gave another awe-inspiring keynote on 50 years of transaction processing (with strong attendance despite 8.30am and the banquet the night before), and Google Spanner won the well-deserved systems award. 👍


