Pratty ๐๏ธ
@pratty_agi
Human Agent @AgentOpsAI ๐๏ธ pragmatist ๐งฎ core mafia @modelsmafiabs ๐งช agi solver @agihouseindia ๐ฎ๐ณ
Hereโs are some of the experiments and observations I did as part of the initial testers on the locksmith game using within ARC-AGI-3 (my template is available in the repository) ๐งต
Today, we're announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI Weโre releasing: * 3 games (environments) * $10K agent contest * AI agents API Starting scores - Frontier AI: 0%, Humans: 100%
@pratty_agi claiming numerous appearances in the agent thread this week. get' em boi!
This has been one of the biggest weeks in AI Agents. Here is everything that happened this week in AI Agents from Amazon, LlamaIndex, AgentOps, CrewAI, Certus, n8n, Composio, Perplexity Comet, Apify, Arc Prize, Anthropic, & more. ๐งต (save for later)
7/ @kortixai Suna integration with @AgentOpsAI releasing soon ๐ย Just add your AgentOps API key and you are set! @pratty_agi x.com/pratty_agi/staโฆ
5/ Performance improvement noted when using @AgentOpsAI MCP in @arcprize ARC-AGI-3 General Reasoning agent. @pratty_agi x.com/pratty_agi/staโฆ
Performance improvement noted when using @AgentOpsAI MCP in @arcprize ARC-AGI-3 General Reasoning agent. When the agent is able to see its past actions and reasoning, itโs able to perform better since it avoids repeating the same set of events. Prompt engineering must be good toโฆ
New ARC Prize 2025 High Score 19.0% by Giotto. ai (@podesta_aldo)
Huge W for solving a very intricate problem in LLMs. The LLM having a memory is very useful for context engineering since it provides the option to let the LLM reflect on its past behavior and do better in the next actions. Will be very useful when integrated with @cline orโฆ
Introducing the all new supermemory ๐ฅ we killed the AI memory lock-in. A universal memory for all the apps you use: - MCP connect to Claude / Cursor / ChatGPT - Graph view of your life - Projects to organize your life free to use. $9/m for first 100 pro subs. demo below
Loved talking to @rohukrs
@Abhindas1 is redefining intelligent automation with RoboGPT, Watch his journey from Delhi to a YC-backed robotics company @OrangewoodLabs
@Abhindas1 is redefining intelligent automation with RoboGPT, Watch his journey from Delhi to a YC-backed robotics company @OrangewoodLabs
We are @_TheResidency - Bangalore Home to Indiaโs most ambitious Applications now open for the Septโ25 cohort. Link in comments ๐
Two minor releases bring Roo Code Cloud in app waitlist signup, critical bug fixes, and improved command handling! v3.23.19: docs.roocode.com/update-notes/vโฆ v3.23.18: docs.roocode.com/update-notes/vโฆ
Open source ๐
updated data for the cline weekly diff edit success rate metrics now that qwen3 coder has been out for a bit notice anything different?
EB2 NIW approvals are coming ๐ Looking to file EB2 NIW? Reach out to @opensphereai
I was in contact with the Qwen team trying to reproduce their 41% results on ARC-AGI-1 but ultimately couldn't They open sourced their method and code if anyone wants to check it out and confirm We tested their model exactly the same as we test all other models (o3-high, grokโฆ
Qwen3-235b-a22b Instruct-2507 ARC-AGI Semi Private Eval * ARC-AGI-1: 11%, $0.003/task * ARC-AGI-2: 1.3%, $0.004/task
Structured logs, free form aggregations to build log metric based dashboards and alerts + live tailing on the way If you're not using Logs in @getsentry for a reason, We'd love to know why ๐
We love logs, you love logs ๐ชต Our latest logs update allows you to define alerts and build dashboards based on log queries โ now available to all beta users ๐
Here are 20+ examples of why Claude Code Is the New Secret Weapon for developers and non-technical users. (save this + share with your team)
300+ engineers came to SF to test out the latest advancements in AI and multi-agent frameworks The grand prize? A robot dog. Here are the demos from the @weights_biases WeaveHacks AI hackathon (๐งต):
Qwen3-235b-a22b Instruct-2507 ARC-AGI Semi Private Eval * ARC-AGI-1: 11%, $0.003/task * ARC-AGI-2: 1.3%, $0.004/task
good news for @sciraai users from india! ๐ you can now purchase the scira pro plan via upi ๐ช๐ฎ๐ณ โน1299 + 18% gst for 1 month access ๐ฎ๐ณ/acc ๐ซก scira.ai/pricing
Thanks @kakashiii111 for your rigorous investigation
Official export data from Mainland China and Taiwan: These are the numbers I gathered in Excel.