Arjun Ashok
@arjunashok37
Researcher in Time Series Forecasting @ServiceNowRSRCH, PhD Student @Mila_Quebec. Carnatic musician.
📢 Attention Attention @ServiceNowRSRCH is hiring a Research Scientist with a focus on Agent Safety+Security 👩🏻🔬 Join us to work on impactful open research projects like 🔹DoomArena: github.com/ServiceNow/doo… 🔹BrowserGym: github.com/ServiceNow/Bro… Apply: jobs.smartrecruiters.com/ServiceNow/744…
🏯I'm in Beijing, China attending ISF 2025 (@IIForecasters) and other events, giving talks on "𝐂𝐨𝐧𝐭𝐞𝐱𝐭-𝐀𝐢𝐝𝐞𝐝 𝐅𝐨𝐫𝐞𝐜𝐚𝐬𝐭𝐢𝐧𝐠: 𝐏𝐫𝐨𝐠𝐫𝐞𝐬𝐬 𝐒𝐨 𝐅𝐚𝐫 𝐚𝐧𝐝 𝐍𝐞𝐱𝐭 𝐁𝐢𝐠 𝐂𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞𝐬". Let's catch up if you're here! #ISFconf2025 #iifosf2025

Excited that our paper "Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization" was accepted to ICML 2025! We show how Preference Optimization can reduce the impact of noisy concept labels in CBMs. 🧵/9
🚀 New paper from our team at @ServiceNowRSRCH! 💫𝐒𝐭𝐚𝐫𝐅𝐥𝐨𝐰: 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐧𝐠 𝐒𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞𝐝 𝐖𝐨𝐫𝐤𝐟𝐥𝐨𝐰 𝐎𝐮𝐭𝐩𝐮𝐭𝐬 𝐅𝐫𝐨𝐦 𝐒𝐤𝐞𝐭𝐜𝐡 𝐈𝐦𝐚𝐠𝐞𝐬 We use VLMs to turn 𝘩𝘢𝘯𝘥-𝘥𝘳𝘢𝘸𝘯 𝘴𝘬𝘦𝘵𝘤𝘩𝘦𝘴 and diagrams into executable workflows.…
Congrats @TianbaoX and team on this exciting work and release! 🎉 We’re happy to share that Jedi-7B performs on par with UI-Tars-72B agent on our challenging UI-Vision benchmark, with 10x fewer parameters! 👏 Incredible 🤗Dataset: huggingface.co/datasets/Servi… 🌐uivision.github.io
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
🚨🤯 Today Jensen Huang announced SLAM Lab's newest model on the @HelloKnowledge stage: Apriel‑Nemotron‑15B‑Thinker 🚨 A lean, mean reasoning machine punching way above its weight class 👊 Built by SLAM × NVIDIA. Smaller models, bigger impact. 🧵👇
Together with @NVIDIA, we're launching a new class of intelligent AI agents. Our Apriel Nemotron 15B model, co-developed with NVIDIA, offers lower latency, reduced inference costs, and faster agentic AI. This partnership also brings a joint data flywheel architecture powered by…
Context is Key🗝️ is accepted at ICML 2025! 📈 Let's catch up if you'll be at ICML 🛬 See the poster and tweet thread below for a preview of CiK 👇 x.com/arjunashok37/s… And stay tuned for new results ;)
(New paper alert!) Forecasting models typically rely on numerical historical data. However, in many cases, numerical data is insufficient and context is key. E.g., In the series below, would you have predicted the drop? Even the best models do not (forecast in blue).
🚨 SLAM Labs presents Apriel-5B! And it lands right in the green zone 🚨 Speed ⚡ + Accuracy 📈 + Efficiency 💸 This model punches above its weight, beating bigger LLMs while training on a fraction of the compute. Built with Fast-LLM, our in-house training stack. 🧵👇
🚀 Exciting news! Our work LitLLM has been accepted in TMLR! LitLLM helps researchers write literature reviews by combining keyword+embedding-based search, and LLM-powered reasoning to find relevant papers and generate high-quality reviews. LitLLM.github.io 🧵 (1/5)
I’m excited to announce that 💫StarVector has been accepted at CVPR 2025! Over a year in the making, StarVector opens a new paradigm for Scalable Vector Graphics (SVG) generation by harnessing multimodal LLMs to generate SVG code that aesthetically mirrors input images and text.…
Sometimes I wonder if some of these SV people live on another planet. Before worrying about the feelings of an AI shall we please worry about 900k cows, 1.4 m goats, 1.7m sheep, 3.8m pigs, 12m ducks and 202m chicken that are slaughtered *every day*. Their feelings are real.
Should AI have a "I quit this job" button? Anthropic CEO Dario Amodei proposes it as a serious way to explore AI experience. If models frequently hit "quit" for tasks deemed unpleasant, should we pay attention?
LLMs have complex joint beliefs about all sorts of quantities. And my postdoc @jamesrequeima visualized them! In this thread we show LLM predictive distributions conditioned on data and free-form text. LLMs pick up on all kinds of subtle and unusual structure: 🧵
📊 Breaking: Claude 3.7 Sonnet scores 51.5% on WorkArena benchmark! Surprising finding: The newer Claude 3.7 Sonnet (51.5%) performs below Claude 3.5 (56.4%) on our tests! 👀 Maybe newer isn't always better? Both Claude 3.7 and o3-mini are underperforming their predecessors.
A little to the party, but really happy to share that our work (arxiv.org/abs/2407.07341) from @ServiceNowRSRCH got accepted to #NAACL2025 (Findings), where we propose two sample-efficient methods for effective short and long document summarization! @naaclmeeting 1/3
Each night I sit with Claude or any AI and pick something deep I am curious about and talk to it. You can just learn stuff. It’s amazing.
built a tool for finding NeurIPS papers across 40+ years! search indexes across authors, titles and actual text content (yes, across the papers). link: neurips.paperfinder.xyz find what you need—or what you didn’t know you were looking for!
agreed! if you're an undergrad and interested in working on problems in forecasting, reach out!
The only piece of advice I give to undergrads that want to get into research is to cold email PhD students with a good track record. Most undergrads are bottlenecked by research ideas whereas good PhD students have way too many ideas that they cannot execute. If you can code…
One of the best things the U.S. can do is make high-skill immigration easier. @levie is right. It is awful that the wait time for a green card can be over a decade, and that after waiting years someone can still be forced to leave simply because they lost a job. Fixing this is…
High skilled immigration has been central to America leading the world in tech. The biggest misunderstand about high skill immigration stems from people thinking that the market opportunities in tech, and tech-adjacent fields, are zero sum. This essentially imagines innovation…
this is paper is kinda wild. turns out that if you simply ask an LLM to straight out predict a timeseries like this: ``` <history> (t1, v1) (t2, v2) (t3, v3) </history> <forecast> (t4, v4) (t5, v5) </forecast> ``` making sure to prepend the prompt like this: ```…