Alham Fikri Aji

@AlhamFikri

https://bsky.app/profile/afaji.bsky.social Faculty at @MBZUAI, @MonashIndonesia. Visiting research scientist @Google

Joined February 2012

394Following

4KFollowers

Pinned

Alham Fikri Aji@AlhamFikri · Jun 12, 2024

🎉Happy to share our recent collaborative effort on building a culturally diverse, multilingual visual QA dataset! CVQA consists of over 9,000 questions across 28 countries, covering 26 languages (with more to be added!) 🌐cvqa-benchmark.org 📜arxiv.org/pdf/2406.05967

AlhamFikri's tweet image. 🎉Happy to share our recent collaborative effort on building a culturally diverse, multilingual visual QA dataset!

CVQA consists of over 9,000 questions across 28 countries, covering 26 languages (with more to be added!)

🌐cvqa-benchmark.org
📜arxiv.org/pdf/2406.05967

201

69.0K

Alham Fikri Aji@AlhamFikri · Jul 23

🎉Excited to share new work with undergrads from @FASILKOM_UI Multilingual interpretability is often done at the neuron level; but they are polysemantic We show that using a self-autoencoder first yields more monosemantic, interpretable features More: arxiv.org/pdf/2507.11230

AlhamFikri's tweet image. 🎉Excited to share new work with undergrads from @FASILKOM_UI

Multilingual interpretability is often done at the neuron level; but they are polysemantic

We show that using a self-autoencoder first yields more monosemantic, interpretable features

More: arxiv.org/pdf/2507.11230

4.0K

Alham Fikri Aji@AlhamFikri · Jul 22

is there a superhero name that actually sounds like good name to give to a child?

MMD437 | themd2930.bsky.social 🦋@TheMD2930 · Jul 22

is there a superhero name that actually sounds like good name to give to a child?

937

Alham Fikri Aji@AlhamFikri · Jul 20

Personal take: With AI handles more of our thinking, cognitive decline could become a concern We may soon see a rise in brain gyms or mental fitness businesses designed to keep our minds sharp

RRaj Dabre@prajdabre · Jul 20

This is a global phenomenon. With AI, chatbots and brainrot we are essentially going to a situation where anything with any significant cognitive load is going to be offloaded to machines. If you wish to improve your cognitive load capabilities then I suggest the following: 1.…

3.0K

Alham Fikri Aji Retweeted

Akari Asai@AkariAsai · Jul 15

I'll be hiring a couple of Ph.D. students at CMU (via LTI or MLD) in the upcoming cycle! If you are interested in joining my group, please read the FAQ before reaching out to me via email :) docs.google.com/document/d/12V…

9.0K

Alham Fikri Aji@AlhamFikri · Jul 8

📢New dataset alert! A new long-form video benchmark assessing multimodal LLMs' understanding of Theory of Mind in realistic social interactions

EEmilio Villa Cueva@evllcv · Jul 8

Excited to finally share MOMENTS!! A new human-annotated benchmark to evaluate Theory of Mind in multimodal LLMs using long-form videos with real human actors. 📽️ 2.3K+ MCQA items from 168 short films 🧠 Tests 7 different ToM abilities 🔗 arxiv.org/abs/2507.04415

824

Alham Fikri Aji Retweeted

MELT Workshop@MeltWorkshop · Jun 24

🚨 Deadline Extended 🚨 The MELT Workshop Track submission deadline is now June 30, 2025 (AoE). We still welcome new & unpublished work: abstracts (≤2 p) | shorts (≤5 p) | longs (≤9 p) 📌 Non-archival 🔗 Submit here → openreview.net/group?id=colmw… #MELTWorkshop2025 #COLM2025

3.0K

Alham Fikri Aji@AlhamFikri · Jun 23

Supposedly 87% of Indonesian high school graduates have math skills equivalent to a 3rd grader This will probably get worse in the coming years, with more students relying on LLMs to do their homework without actually learning anything

zzhil@zhil_arf · Jun 22

Halah. Baru membaca. Kemampuan berhitung orang Indonesia lebih hancur lagi. 87% anak lulusan SMA memiliki kemampuan matematika setara anak kelas 3 SD, atau dibawahnya. Kemampuan berhitung yang hanya lulusan SMP dan SD lebih lebih hancur lagi.

2.0K

Alham Fikri Aji@AlhamFikri · Jun 21

Working on multilingual and culturally-aware NLP and want to attend @COLM_conf? Consider submitting to the MELT Workshop 2025! Submissions are non-archival

MMELT Workshop@MeltWorkshop · Jun 4

🚨 Call for Submissions – Melt Workshop @COLM_conf 🌍 We invite papers on regionally grounded, culturally aware language technologies. 🛠️ Two tracks: — Workshop Track — Conference Track Details below ⬇️ #MeltWorkshop2025

5.0K

Alham Fikri Aji@AlhamFikri · Jun 20

There are many interesting questions How people anthropomorphize LLMs, correlation w/ their individual traits, demographics, LLM understanding, perceived trust, etc2. just a random thought, but this whole AI-LLM interaction thing is fascinating

AlhamFikri's tweet image. There are many interesting questions

How people anthropomorphize LLMs, correlation w/ their individual traits, demographics, LLM understanding, perceived trust, etc2.

just a random thought, but this whole AI-LLM interaction thing is fascinating

729

Alham Fikri Aji Retweeted

Ruochen Zhang@ruochenz_ · Jun 4

🤔Ever wonder why LLMs give inconsistent answers in different languages? In our paper, we identify two failure points in the multilingual factual recall process and propose fixes that guide LLMs to the "right path." This can boost performance by 35% in the weakest language! 📈

16.0K

Alham Fikri Aji Retweeted

Yong Zheng-Xin (Yong)@yong_zhengxin · Jun 2

🧵 Multilingual safety training/eval is now standard practice, but a critical question remains: Is multilingual safety actually solved? Our new survey with @Cohere_Labs answers this and dives deep into: - Language gap in safety research - Future priority areas Thread 👇

5.0K

Alham Fikri Aji@AlhamFikri · Jun 2

Research isn’t always about chasing SOTA. It’s about understanding why things work (or don’t) and pushing the boundaries of our knowledge Negative results like this are still valuable and essential for the community Looking forward to the next banger!

zzed@zmkzmkz · Jun 2

sorry for the late update. I bring disappointing news. softpick does NOT scale to larger models. overall training loss and benchmark results are worse than softmax on our 1.8B parameter models. we have updated the preprint on arxiv: arxiv.org/abs/2504.20966

101

7.0K

Alham Fikri Aji@AlhamFikri · May 30

Amidst the evaluation/reproducibility crisis for reasoning LLMs, it's great to see *concurrent independent work (with different models & benchmarks) aligns with our findings*! We reported the same fundamental trade-off: language forcing leads to ✅ compliance, ❌ accuracy!

JJirui Qi@Jirui_Qi · May 30

[1/]💡New Paper Large reasoning models (LRMs) are strong in English — but how well do they reason in your language? Our latest work uncovers their limitation and a clear trade-off: Controlling Thinking Trace Language Comes at the Cost of Accuracy 📄Link: arxiv.org/abs/2505.22888

3.0K

Alham Fikri Aji@AlhamFikri · May 22

Happy to share that we have 8 papers accepted at ACL! Grateful for the amazing team of students and collaborators. Hope to see you in Vienna soon!

AlhamFikri's tweet image. Happy to share that we have 8 papers accepted at ACL! Grateful for the amazing team of students and collaborators. Hope to see you in Vienna soon!

106

6.0K

Alham Fikri Aji@AlhamFikri · May 18

Thanks for covering our work!

SSophia Yang, Ph.D.@sophiamyang · May 18

Can an AI trained in English solve math problems in other languages without extra training?

1.0K

Alham Fikri Aji@AlhamFikri · May 15

Multilingual vision is important! We have developed a multilingual VQA benchmark almost a year ago - CVQA arxiv.org/abs/2406.05967 in NeurIPS.

AAran Komatsuzaki@arankomatsuzaki · May 14

abs: arxiv.org/abs/2505.08751 model: huggingface.co/CohereLabs/aya… bench: huggingface.co/datasets/Coher…

3.0K

Alham Fikri Aji@AlhamFikri · May 15

After publishing 4 EMNLP papers last year, 2 as first author, and now this. His citation has grown 10x in the past year. He's on track to surpass 1,000,000 citations in 5 years! An amazing future lies ahead for Farid.

FFarid Adilazuarda@faridlazuarda · May 14

got a bit emotional as this is my first ever ACL main paper. here’s to more to come!

13.0K

Alham Fikri Aji Retweeted

Cohere Labs@Cohere_Labs · May 9

Reasoning language models are primarily trained on English data, but do they generalize well to multilingual settings in various domains? We show that test-time scaling can improve their zero-shot crosslingual reasoning performance! 🔥

8.0K