Rohan Ahluwalia
@r0hanahluwalia
something new | prev @yale @pytorch @womplabsai
yes more data better
I don't think we need an American DeepSeek Project, we need an Open-Data DeepSeek. And no we didn't get one yet, despite what you might think, so let me explain. The biggest contributor to the gap between closed-source and open-source AI is, in my opinion, data accessibility and…
this really got us data folk feeling a type of way
Academia must be the only industry where extremely high-skilled PhD students spend much of their time doing low value work (like data cleaning). A 1st year management consultant outsources this immediately. Imagine the productivity gains if PhDs could focus on thinking
claude code in terminal, best developer exp out there tbh
This and @arcee_ai new model showing that we are rapidly improving performance of small models. it'd be exciting to see what on-prem tasks this small models could be used for. highly specialized small models running on edge devices
Really excited to share SmolLM3: a strong, smol reasoner! > SoTA 3B model > dual mode reasoning (think/no_think) > long context, up to 128k > multilingual: en, fr, es, de, it, pt > fully open source (ckpts, data, code, recipes) huggingface.co/HuggingFaceTB/… Details on the…
What I send to people to get them to join @datologyai
If we are thinking in terms of finite data, infinite compute, this is a really interesting read. Great work by @Happylemon56775. arxiv.org/pdf/2507.02754
jax-ml.github.io/scaling-book/ one of the best things that i've worked through doens't matter that it's about TPUs all the concepts are great
tapped in
Excited for the new release of #HuggingFace durant.ly/huggingface - proud investor! durant.ly/hfrelease
the FP8 values in your model after 50 layers of quantize/dequantize operations
Ngmi
My hot startup take is that you should almost never work weekends My even hotter take is that if you are working weekends, you didn't go hard enough during the week or you suck at planning
super sick work by @joemelko @XP_research
Meet DAUNCE🕺: the first method to trace training-data influence inside *proprietary* LLMs (yes, GPT-4o). Full breakdown in @XP_research’s thread - feedback welcome!
good read: developer.nvidia.com/blog/demystify… lots of quality info out there
been working on truly understanding fsdp, damn shit is cool