Ji QI
@miracle_jiqi
PhD student @Tsinghua_Uni. Currently a visiting student @NUSingapore.
Promoting Our Preliminary Work on Efficient Video Understanding of LMMs Grateful for the support from Yuan Yao and all mentors! Since videos inherently exhibit varying temporal density (static/dynamic segments), a natural idea is to dynamically segment and compress a video to…

Today, on April 17th, OpenAI released a new visual reasoning model, o3, capable of solving complex tasks by analyzing and manipulating images. We have observed that the reasoning approach employed by the o3 model bears a striking resemblance to our earlier work, Cogcom…

💥 Introducing MiniCPM-o 2.6: An 8B size, GPT-4o level Omni Model runs on device ✨ Highlights: ~Match GPT-4o-202405 in vision, audio and multimodal live streaming ~End-to-end real-time bilingual audio conversation ~Voice cloning & emotion control ~Advanced OCR & video…
I didn't see the talk, but the images I've seen of the slide seem quite offensive. Such generalizations should have no place in NeurIPS or anywhere else.
Thanks AK! This study tries to alleviate the failures of VLMs on solving detailed visual problems, partially caused by the one-step answering with linguistic/visual priors, and proposes a mechanism, CoM, to enable VLMs to perform reasoning. github.com/THUDM/CogCoM
CogCoM Train Large Vision-Language Models Diving into Details through Chain of Manipulations paper page: huggingface.co/papers/2402.04… Vision-Language Models (VLMs) have demonstrated their widespread viability thanks to extensive training in aligning visual instructions to answers.…
It is a great honor to receive the Outstanding Paper Award from EMNLP2023, I am so grateful for the recognition. Many thanks to my lab @thukeg and our department @thudcst .
Congrats!Qi Ji, a PhD student from #Tsinghua DCST, won the Outstanding Paper Award at #EMNLP2023 for his work on robustness evaluation of knowledge invariance upon different expressions, effectively benchmarking information models including #LLMs. Code:aclanthology.org/2023.emnlp-mai…
Thanks AK! Check out our recipe code at github.com/THUDM/LongAlign. We also release an instruction-following dataset at huggingface.co/datasets/THUDM…, along with a suite of competitive long context LLMs trained with LongAlign!
LongAlign A Recipe for Long Context Alignment of Large Language Models paper page: huggingface.co/papers/2401.18… Extending large language models to effectively handle long contexts requires instruction fine-tuning on input sequences of similar length. To address this, we present…
Everything makes sense.
Can mathematical obfuscation help to get papers accepted?
Discover what Abu Dhabi has to offer. youtube.com/watch?v=3tjSur… #NLProc