Kazuki Fujii (@okoge_kaz)

Pinned

K

Kazuki Fujii@okoge_kaz · Jul 22

Thrilled to see our SwallowProject paper cited in KIMI K2's Technical Report (2.2 Pre-training Data)! 🙏 Thank you for recognizing our work! @Kimi_Moonshot

okoge_kaz's tweet image. Thrilled to see our SwallowProject paper cited in KIMI K2's Technical Report (2.2 Pre-training Data)! 🙏 Thank you for recognizing our work! @Kimi_Moonshot

1

5

36

1

14.0K

Kazuki Fujii Retweeted

K

Kaiyu Yang@KaiyuYang4 · Jul 23

🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California

7

35

199

43

20.0K

Kazuki Fujii Retweeted

Q

Qwen@Alibaba_Qwen · 10 h

🚀 Introducing Qwen3-MT – our most powerful translation model yet! Trained on trillions of multilingual tokens, it supports 92+ languages—covering 95%+ of the world’s population. 🌍✨ 🔑 Why Qwen3-MT? ✅ Top-tier translation quality ✅ Customizable: terminology control, domain…

61

233

2.0K

573

119.0K

Kazuki Fujii Retweeted

電

電子計算機の沼@Hishinuma_t · Jul 23

最強！！！！！！！！「NVIDIA Blackwell GPU」を搭載した「NVIDIA DGX SuperPOD」として、世界最大のAI計算基盤を構築～4,000基超の「NVIDIA Blackwell GPU」の整備が完了～ | 企業・IR | ソフトバンク share.google/xfkRuQnEKKYojM…

2

43

224

41

22.0K

Kazuki Fujii Retweeted

A

AK@_akhaliq · Jul 23

Nvidia presents ThinkAct Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

10

115

592

379

53.0K

K

Kazuki Fujii@okoge_kaz · Jul 22

After three intense months of hard work with the team, we made it! We hope this release can help drive the progress of Coding Agents. Looking forward to seeing Qwen3-Coder continue creating new possibilities across the digital world!

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

57

82

934

108

59.0K

K

Kazuki Fujii@okoge_kaz · Jul 22

this is what is not small! boys spent so much time building the Qwen3-Coder after Qwen2.5-Coder. it is much bigger, but based on MoE, and way stronger and smarter than before! not sure we can say competitive with claude sonnet 4 but might be for sure a really good coding agent.…

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

58

78

990

104

56.0K

Kazuki Fujii Retweeted

Q

Qwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

262

1.0K

9.0K

4.0K

1.7M

K

Kazuki Fujii@okoge_kaz · Jul 22

SwallowProjectでは、SwallowCode, SwallowMathよりも更に高品質な数学とコードデータの作成に取り組んでいます。日本語能力を強化するだけにとどまらず、Openモデルの数学、コード能力をさらに強くするための方法を今後も研究していきます！

KKazuki Fujii@okoge_kaz · Jul 22

Thrilled to see our SwallowProject paper cited in KIMI K2's Technical Report (2.2 Pre-training Data)! 🙏 Thank you for recognizing our work! @Kimi_Moonshot

0

7

1

1.0K

Kazuki Fujii Retweeted

T

Tanishq Abraham is at ICML@iScienceLuvr · Jul 22

Diffusion Beats Autoregressive in Data-Constrained Settings Comparison of diffusion and autoregressive language models from 7M to 2.5B params and up to 80B training tokens. Key findings: 1. Diffusion models surpass autoregressive models given sufficient compute. Across a wide…

13

118

681

496

45.0K

K

Kazuki Fujii@okoge_kaz · Jul 22

GENIACの3サイクル目に採択されました。今期は自律稼働デバイス（監視カメラ・ロボット・ドローンなど）に向けた軽量で高精度なVLMを作ることを目標にします。 PFNではこのGENIAC 3サイクル目と平行して新規LLM開発や既存モデルの改良、特化モデルの開発を進めていきます。

PPreferred Networks@PreferredNetJP · Jul 22

PFNの「自律稼働デバイスに向けた高精度軽量VLMの開発」が、経済産業省とNEDOによる生成AIの開発力強化に向けたプロジェクト「GENIAC」に採択されました。 VLM: Vision-language model. 視覚情報とテキスト情報を扱うAIのモデル nedo.go.jp/koubo/CD3_1003…

0

21

115

21

19.0K

Kazuki Fujii Retweeted

Q

Qwen@Alibaba_Qwen · Jul 21

Bye Qwen3-235B-A22B, hello Qwen3-235B-A22B-2507! After talking with the community and thinking it through, we decided to stop using hybrid thinking mode. Instead, we’ll train Instruct and Thinking models separately so we can get the best quality possible. Today, we’re releasing…

216

579

4.0K

851

895.0K

Kazuki Fujii Retweeted

K

Kimi.ai@Kimi_Moonshot · Jul 22

Kimi K2 tech report just dropped! Quick hits: - MuonClip optimizer: stable + token-efficient pretraining at trillion-parameter scale - 20K+ tools, real & simulated: unlocking scalable agentic data - Joint RL with verifiable + self-critique rubric rewards: alignment that adapts -…

76

249

2.0K

430

80.0K

K

Kazuki Fujii@okoge_kaz · Jul 22

今話題のKIMI K2のTechnical PaperにてSwallow Projectから先日発表した論文が言及されていました！ SwallowProjectは、Openなモデル開発プロジェクトとして研究的な新規性とともに、実際に使えるモデルの開発を目指して、引き続き研究開発を進めていきます！ (次のモデルも開発中です)

KKazuki Fujii@okoge_kaz · Jul 22

Thrilled to see our SwallowProject paper cited in KIMI K2's Technical Report (2.2 Pre-training Data)! 🙏 Thank you for recognizing our work! @Kimi_Moonshot

0

10

53

13

6.0K

Kazuki Fujii Retweeted

M

Marktechpost AI Dev News ⚡@Marktechpost · Jul 21

TikTok Researchers Introduce SWE-Perf: The First Benchmark for Repository-Level Code Performance Optimization SWE-Perf, introduced by TikTok researchers, is the first benchmark designed to evaluate large language models (LLMs) on repository-level code performance optimization.…

0

10

23

5

1.0K

K

Kazuki Fujii@okoge_kaz · Jul 21

Note that this is a non-thinking model. Thinking model on the way!

QQwen@Alibaba_Qwen · Jul 21

Bye Qwen3-235B-A22B, hello Qwen3-235B-A22B-2507! After talking with the community and thinking it through, we decided to stop using hybrid thinking mode. Instead, we’ll train Instruct and Thinking models separately so we can get the best quality possible. Today, we’re releasing…

48

72

1.0K

118

61.0K

K

Kazuki Fujii@okoge_kaz · Jul 21

A small update on Qwen3-235B-A22B, but a big improvement on its quality! We thought about this decision for a long time, but we believe that providing better-quality performance is more important than the unification at this moment. We are still continuing our research on hybrid…

QQwen@Alibaba_Qwen · Jul 21

Bye Qwen3-235B-A22B, hello Qwen3-235B-A22B-2507! After talking with the community and thinking it through, we decided to stop using hybrid thinking mode. Instead, we’ll train Instruct and Thinking models separately so we can get the best quality possible. Today, we’re releasing…

38

94

894

157

61.0K

Kazuki Fujii Retweeted

e

elie@eliebakouch · Jul 21

We've just release 100+ intermediate checkpoints and our training logs from SmolLM3-3B training. We hope this can be useful to the researcher working on mech interpret, training dynamics, RL and other topics :) Training logs: -> Usual training loss (the gap in the loss are due…

13

59

392

192

31.0K

K

Kazuki Fujii@okoge_kaz · Jul 21

CUTLASS 4.1 is now available, which adds support for ARM systems (GB200) and block scaled MMAs

VVijay@__tensorcore__ · May 13

🚨🔥 CUTLASS 4.0 is released 🔥🚨 pip install nvidia-cutlass-dsl 4.0 marks a major shift for CUTLASS: towards native GPU programming in Python slidehelloworld.png docs.nvidia.com/cutlass/media/…

4

12

122

19

7.0K