Toshii

@0xToshii

unpaid intern, eng arc

Joined December 2021

115Following

997Followers

Pinned

Toshii@0xToshii · Feb 1, 2023

released the (n)th challenge (20+) for my smart contract CTF: Mr Steal Yo Crypto all challenges are based on real world exploits and are well suited for devs exploring #solidity/#security primer & hints: degenjungle.substack.com/p/mr-steal-yo-… challenges: mrstealyocrypto.xyz

0xToshii's tweet card. Solidity CTF challenges inspired by real world exploits

11.0K

Pinned

Toshii@0xToshii · Jul 6

checked the diff-apply methods used per model(s) for a subset of our internal harder diff-apply data (non-overlapping examples) my guess prior to checking this was that block anchor search was used way more, but evidently was wrong

TToshii@0xToshii · Jul 3

checked the prevalence of diff-apply methods per model for the "hard" benchmark. kinda interesting altho its a pretty small sample diff-apply rails are important fr

1.0K

Toshii@0xToshii · Jul 24

updated data for the cline weekly diff edit success rate metrics now that qwen3 coder has been out for a bit notice anything different?

TToshii@0xToshii · Jun 27

cline diff edit success rate per llm(s) over the last wk an interesting follow up question is how different models might be used more or less often on tasks of varying difficulty, and if/how that skews this data

24.0K

Toshii@0xToshii · Jul 23

been hearing things

CCline@cline · Jul 23

Less than two weeks Kimi K2's release, @Alibaba_Qwen's new Qwen3-Coder surpasses it with half the size and double the context window. Despite a significant initial lead, open source models are catching up to closed source and seem to be reaching escape velocity.

1.0K

Toshii@0xToshii · Jul 19

v excited to see where we go from here with os models

ppash@pashmerepat · Jul 19

I'd like to point out that for the real world tasks (not benchmarks), Kimi K2 outperforms Gemini. This is telemetry across all @cline users, showing diff edit failure rate. Notice how Kimi has about a 6% failure rate, which is significantly better than Gemini's ~ 10% error…

8.0K

Toshii@0xToshii · Jul 18

the bitter lesson will come for you too

jjeff@jeffreyhuber · Jul 18

the bitter lesson will come for you too

734

Toshii@0xToshii · Jul 17

update: will be working on the ground soon as brokies are taking all teh tables

TToshii@0xToshii · Jul 8

one month to make it before homeless lmao

212

Toshii@0xToshii · Jul 16

how this space has been feeling lately

TTBPN@tbpn · Jul 16

BREAKING: Claude Code PMs Boris Cherny and Cat Wu have returned to Anthropic after a brief stint at Cursor.

389

Toshii@0xToshii · Jul 15

end of any possible US DeepSeek/Kimi moment? Not sure who else has the appetite and scale Llama2 was the first ~os model series i was genuinely surprised & impressed by, and Llama3 was a banger. Honestly felt the Meta LLMs could be the standard

0xToshii's tweet image. end of any possible US DeepSeek/Kimi moment? Not sure who else has the appetite and scale

Llama2 was the first ~os model series i was genuinely surprised &amp; impressed by, and Llama3 was a banger. Honestly felt the Meta LLMs could be the standard

203

Toshii@0xToshii · Jul 14

alright Kimi K2 might be cooking ran this through our small Cline diff edit eval set. Again, this just measures success rate for valid diff edit attempts. The percentage of valid / total attempts: Kimi K2 (25%), DeepSeek V3 (35.8%), Sonnet 4 (45.3%)

0xToshii's tweet image. alright Kimi K2 might be cooking

ran this through our small Cline diff edit eval set. Again, this just measures success rate for valid diff edit attempts. The percentage of valid / total attempts: Kimi K2 (25%), DeepSeek V3 (35.8%), Sonnet 4 (45.3%)

4.0K

Toshii@0xToshii · Jul 8

one month to make it before homeless lmao

3.0K

Toshii@0xToshii · Jul 6

hunting for bugs off platform was pretty fun but rug rate was too high to sustain motivation

TToshii@0xToshii · Jul 5

this cursor stuff just reminded me that i submitted a high sev bug to zharta (steal all yield from vault) a year ago and they still haven't paid me they had offered some amount of a token which still doesn't exist lmao

318

Toshii@0xToshii · Jul 5

542

Toshii@0xToshii · Jul 4

it’s literally impossible for cline to rug you

CCline@cline · Jul 4

We architected Cline to make pricing betrayals impossible. Here's why open source + usage-based pricing isn't just our model -- it's our promise to developers. 🧵

10.0K

Toshii@0xToshii · Jul 3

checked the prevalence of diff-apply methods per model for the "hard" benchmark. kinda interesting altho its a pretty small sample diff-apply rails are important fr

TToshii@0xToshii · Jul 2

quick thread on how the @cline diff-apply algo works kinda interesting logic since it supports streamed inputs & has rails for boosting valid applies when the llm output is malformed vv

7.0K