interstellarninja
@intrstllrninja
growing artificial societies | by the open-source AGI, for the people | building @MarketAgentsAI | github: https://github.com/marketagents-ai/MarketAgents
this interstellarninja is on covert missions right now involving power struggles with closed source AI labs and regulatory bodies plotting against open source AI 🥷
Japan’s ninja are famed for their covert activities over centuries of power struggles in the country, and were highly prized by Tokugawa Ieyasu. nippon.com/en/japan-topic…
There's now support for viewing JSON in string / dict columns in @huggingface datasets!!! 🔍 Great for all the tool calling datasets like the brand new hermes tool use dataset by @intrstllrninja
NEW 🔥!! There's can now view JSON for List cells on @huggingface datasets. Now there's no excuse for looking at your data! 🫣
now you can view json prettified such as the tools list from a string column on huggingface thanks to @calebfahlgren from @huggingface datasets team for implementing this feature in addition to conversation json view 🙌
today i'm releasing 50k rows of tool-use reasoning dataset compilation on huggingface includes following BFCL scenarios: - single turn tool-use - multiturn tool-use - multistep tool-use - relevance reasoning huggingface.co/datasets/inter…
both hermes-3 dataset and my new hermes tool-use reasoning dataset are among #10 trending on huggingface
Now number one trending dataset on @huggingface, out of almost half a million! huggingface.co/datasets
good to see some details on kimi's tool-use data synthesis similar to to the hermes function calling datagen pipeline
Kimi put out their paper :) github.com/MoonshotAI/Kim…
Congrats to our post training team who worked on the Hermes 3's dataset - @Teknium1, @nullvaluetensor, and outside contributor @intrstllrninja - on creating the now #1 Trending dataset on HuggingFace!
Now number one trending dataset on @huggingface, out of almost half a million! huggingface.co/datasets
I’m not against crypto at all either! Agents + programmatic money is a match made in heaven. This one is just not me haha. I’m only here to have fun with my mac mini and build a fun experience for people :)
Agent Leaderboard v2 is here! > GPT-4.1 leads > Gemini-2.5-flash excels at tool selection > Kimi K2 is the top open-source model > Grok 4 falls short > Reasoning models lag behind > No single model dominates all domains More below:
personalized ai w/ memory is better than vanilla sota
in ai, memory is a moat with social, relevant network size correlated with value for the user (network is a moat). with ai, every relevant memory extracted from user interactions increases the product value for the user. true or false?
This chart is even more interesting when you reflect the capex that it has taken to generate these results.
In case the post was too vague, yes - this is the Hermes 3 dataset - 1 Million Samples - Created SOTA without the censorship at it's time on Llama-3 series (8, 70, and 405B) - Has a ton of data for teach system prompt adherence, roleplay, and a great mix of subjective and…
huggingface.co/datasets/NousR…
We've just fixed 2 bugs in Kimi-K2-Instruct huggingface repo. Please update the following files to apply the fix: - tokenizer_config.json: update chat-template so that it works for multi-turn tool calls. - tokenization_kimi.py: update encode method to enable encoding special…
kimi k2 uses chatml like tool calling tokens instead of xml tags; uses separate tokens for the tool call section, tool call and arguments
@Kimi_Moonshot just released a trillion-parameter model with great agentic capability, and it is already supported in vLLM! Have a try with a simple command, and check the doc for more advanced deployment🚀
Kimi K2 is basically DeepSeek V3 but with fewer heads and more experts:
Kimi K2 is so good at tool calling and agentic loops, can call multiple tools in parallel and reliably, and knows "when to stop", which is another important property. It's the first model I feel comfortable using in production since Claude 3.5 Sonnet.
Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…