Eric W. Tramel
@fujikanaeda
Research Scientist @ Nvidia. Ex: Synth Data @ Gretel & Unlearn, Federated Learning @ Amazon Alexa & Owkin. Postdocs @ INRIA & ENS. Views my own.
how did getting a single perplexity number become so dang complicated? sometimes find myself pining for the mnist days
Very excited to announce Llama-Nemotron-Super-V1.5! Super-V1.5 is now better than Ultra-V1. This is currently the best model that can be deployed on a single H100. Reasoning On/Off and drop in replacement for V1. Open-weight, code and data on HF huggingface.co/nvidia/Llama-3…
📣 Announcing Llama Nemotron Super v1.5 📣 This release pushes the boundaries of reasoning model capabilities at the weight class of the model and is ready to power agentic applications from individual developers, all the way to enterprise applications. 📈 The Llama Nemotron…
you want wandb? we have wandb at home. the wandb at home: grep "loss: " *.log
more experiments should run cradle-to-grave and across scale. we're not really doing science well without it. 10-100x more compute required for us to make sustainable open/published advancements as a field in general.
last two runs of the biggest scale project ive ever done 🥲 training 1.5b, 3b, 7b, 14b, 32b models - pretraining + rejection sampling to build a ds + supervised finetuning + reinforcement learning now time to write
the ml research code bases are always “like this”. they all smell like each other and it’s always the same. im in a simulation

the distinction between a research scientist and a research engineer is whether you run towards or away from the docs
this is a really cool building btw, you can go have beers up there any day
Reached the final boss
don’t worry about hle 30% is just the error rate of the human soul
claude its just a distributed all reduce please stop fumbling the bag
HLE has recently become the benchmark to beat for frontier agents. We @FutureHouseSF took a closer look at the chem and bio questions and found about 30% of them are likely invalid based on our analysis and third-party PhD evaluations. 1/7
im ready for the singularity-of-good-software. rate-of-innovation outpacing rate-of-communication. there are probably another 5-10 latent tools right now that are solving problems and you don't even know about it yet.
several other attempts at solving the Python Dependency Problem. one is so much better than the rest that it's not even a competition. we are talking about a hydrogen bomb vs a coughing baby.
entering that stage of life where you're linearly extrapolating losses on a google spreadsheet
please remember to select “b200” on the menu
buy compute on @PrimeIntellect and data at @datologyai