Bin Lin

@LinBin46984

Peking University

Joined November 2023

88Following

1KFollowers

Pinned

Bin Lin@LinBin46984 · Jun 3

🚀UniWorld: a unified model that skips VAEs and uses semantic features from SigLIP! Using just 1% of BAGEL’s data, it outperforms on image editing and excels in understanding & generation. 🌟Now data, model, training & evaluation script are open-source! github.com/PKU-YuanGroup/…

188

126

22.0K

Pinned

Bin Lin@LinBin46984 · Mar 14

🚀 SwapAnyone: End-to-end, seamless body-swapping—no more lighting glitches or unnatural blends! 🥇 EnvHarmony for smooth fusion 🥈 HumanAction-32K for diverse training 🥉 SOTA performance, open & closed models Page: pku-yuangroup.github.io/SwapAnyone/ GitHub: github.com/PKU-YuanGroup/…

2.0K

Bin Lin@LinBin46984 · May 29

📊Benchmarking: Evaluated 16 S2V models to reveal strengths and weaknesses in complex scenes. 🎥OpenS2V-5M: 5.4M 720p image-text-video triplets via cross-video linking & multi-view synthesis. 🚀Code & data are open-source. github.com/PKU-YuanGroup/…

LinBin46984's tweet card. OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation - PKU-YuanGroup/OpenS2V-Nexus

307

Bin Lin@LinBin46984 · Apr 4

🚨 Hot Take: GPT-4o might NOT be a purely autoregressive model! 🚨 There’s a high chance it has a diffusion head. 🤯 If true, this could be a game-changer for AI architecture. What do you think? 🤔👇 arxiv.org/pdf/2504.02782

LinBin46984's tweet image. 🚨 Hot Take: GPT-4o might NOT be a purely autoregressive model! 🚨

There’s a high chance it has a diffusion head. 🤯 If true, this could be a game-changer for AI architecture. What do you think? 🤔👇

arxiv.org/pdf/2504.02782

13.0K

Bin Lin@LinBin46984 · Jan 16

👉👉👉A novel perspective uses the Monte Carlo Language Tree to analyze LLMs, revealing that training approximates the Data-Tree. This suggests LLM reasoning is probabilistic pattern-matching, explaining phenomena like hallucinations, CoT, and token bias.

nningkp@PengKun24255 · Jan 15

💡Excited to share our latest research on the explainability of GPT! 🔎 We from a novel perspective to flatten the language dataset and GPT models as the Monte Calo Language Trees, and exhibit their significant similarity. 📰 arxiv.org/pdf/2501.07641 📎 github.com/PKU-YuanGroup/…

757

Bin Lin@LinBin46984 · Dec 9

Excited to share that our latest research Open-Sora Plan report is being featured on the arXiv discussion forum @askalphaxiv @AkshatS07 and I will be on alphaXiv to answer any questions you have on the paper. alphaxiv.org/abs/2412.00131…

1.0K

Bin Lin@LinBin46984 · Dec 3

The Open-Sora Plan team releases Arxiv papers, which include details on WF-VAE model, Diffusion model, training stability, data, prompt enhancement, I2V, and ControlNet. Open-Sora Plan: arxiv.org/abs/2412.00131 WF-VAE: arxiv.org/abs/2411.17459 Feel free to discuss, share and cite.

3.0K

Bin Lin Retweeted

Guowei Xu@Kevin_GuoweiXu · Nov 18

🚀 Introducing LLaVA-o1: The first visual language model capable of spontaneous, systematic reasoning, similar to GPT-o1! 🔍 🎯Our 11B model outperforms Gemini-1.5-pro, GPT-4o-mini, and Llama-3.2-90B-Vision-Instruct! 🔑The key is training on structured data and a novel inference…

239

1.0K

206.0K

Bin Lin@LinBin46984 · Oct 17

Wow! This could very well be the next generation of the large model paradigm!🙌

PPeng Jin@pj95270018 · Oct 16

🎉🎉🎉Thrilled to release MoH that treats attention heads as experts in the MoE mechanism. MoH-LLaMA3-8B outperforms LLaMA3-8B by 2.4% by utilizing only 75% of the heads! 📑arXiv: arxiv.org/pdf/2410.11842 💻github: github.com/SkyworkAI/MoH 🤗huggingface: huggingface.co/collections/Ch…

591

Bin Lin@LinBin46984 · Jul 27, 2024

Video multimodal research focuses on activity recognition and object-centered tasks, often overlooking theme exploration, narrative analysis, and character dynamics. Thanks to @micuelll , CinePile addresses these overlooked areas with fine-tuning Video-LLaVA in their benchmark.

mmeng shao@shao__meng · Jul 27, 2024

Video-LLaVA-7B-hf-CinePile @micuelll Hugging Face 基于 Video-LlaVA 微调的多模态大模型。 -- Video-LlaVA @LinBin46984 开源的多模态模型，通过在多模态指令跟随数据上微调 LLM 而训练得到。它是一个基于 Transformer 架构的自回归语言模型。 huggingface.co/LanguageBind/V… -- CinePile…

618