Yongliang Shen
@itricktreat
Reasoning & Multimodal learning & Agent | Assistant Professor at @ZJU_China | Previously @MSFTResearch
Ant Group researchers just dropped GUI-G² on Hugging Face! This new framework for GUI grounding uses Gaussian reward modeling, transforming sparse binary classification to dense continuous optimization. Achieves state-of-the-art results with a 24.7% boost on ScreenSpot-Pro!
Thanks @_akhaliq for sharing our work!
GUI-G^2 Gaussian Reward Modeling for GUI Grounding
We release our paper "Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model" arxiv.org/pdf/2407.07053. We find that even leading multimodal LMM e.g., Claude 3.5 Sonnet@AnthropicAI or gpt-4o have difficulty recognizing simple…
I think AI agentic workflows will drive massive AI progress this year — perhaps even more than the next generation of foundation models. This is an important trend, and I urge everyone who works in AI to pay attention to it. Today, we mostly use LLMs in zero-shot mode, prompting…
We've unveiled all of our accepted oral and poster presentations for #LLMAgents at #ICLR2024! Visit our page at openreview.net/group?id=ICLR.… to get a glimpse of the array of fantastic research we'll be presenting. Can't wait to meet you all in beautiful Vienna, Austria 🇦🇹!
New from @HuggingFace: Quanto - a versatile PyTorch quantization toolkit! 🤖 Key features: ✅ Works with any model (eager + graph modes) ✅ Supports int2, int4, int8, float8 quantization ✅ Optimized for GPU/CPU/Apple Silicon huggingface.co/blog/quanto-in…
⚒️ EASYTOOL, making tool usage more efficient for LLM-based Agents!
😖 Struggling with complex tool documentation for LLM-based agents? 🚀 Discover EASYTOOL! Streamline unorganized and redundant tool docs to more structured and efficient tool instruction. 📖 Paper: arxiv.org/abs/2401.06201 🔗 code: github.com/microsoft/JARV…. 1/4
Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives - Presents a contrastive strategy that inspires reflection by contrasting the differences between multiple perspectives - Notable improvements on reasoning tasks like GSM8K arxiv.org/abs/2401.02009
Introducing COLM (colmweb.org) the Conference on Language Modeling. A new research venue dedicated to the theory, practice, and applications of language models. Submissions: March 15 (it's pronounced "collum" 🕊️)
I gave a talk at Seoul National University. I titled the talk “Large Language Models (in 2023)”. This was an ambitious attempt to summarize our exploding field. Video: youtu.be/dbo3kNKPaUA Slides: docs.google.com/presentation/d… Trying to summarize the field forced me to think…
ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms). openai.com/blog/chatgpt-c…
🚀
🤗HuggingGPT Another installment to langchain_experimental🧪 Thanks to @itricktreat you can now use an agent which connects to a variety of @huggingface models in @LangChainAI Based on the paper here: arxiv.org/abs/2303.17580 Docs: python.langchain.com/docs/use_cases…
Introducing Custom instructions! This feature lets you give ChatGPT any custom requests or context which you’d like applied to every conversation. Custom instructions are currently available to Plus users, and we plan to roll out to all users soon! openai.com/blog/custom-in… Here…