Home | Arka Mukherjee

Arka Mukherjee

UG Researcher @ IIT BBS

[GitHub] [Resume]

About Me:

Hi, I'm Arka Mukherjee, a KIIT CS junior passionate about research in multimodal LLMs, evaluation, and reasoning. Currently, I am a funded Research Fellow at IIT Bhubaneswar with Dr. Shreya Ghosh. Previously, I spent a summer at the VLED lab, IIT Ropar, and contributed to AI security research at RespAI Lab, KIIT.

Beyond research, I engage with the tech community as a tech journalist and YouTuber, where I share insights on GPUs and emerging trends in AI.

News📰

Nov 11, 2025: mmJEE-Eval is now live! Check it out: mmjee-eval.github.io
Nov 1, 2025: My team, Blackwell, ranked #6/59 (top 10th percentile) in the NeurIPS 2025 DCVLR data curation challenge!
Oct 19, 2025: Attended ICCV 2025 in Honolulu, Hawaii, to present my social AI research [YouTube].
Oct 17, 2025: My work on alternative eval metrics beyond accuracy for math and science reasoning got accepted to the NeurIPS 2025 MATH-AI workshop!
Jul 12, 2025: My work evaluating the cultural competence of VLMs [ArXiv] got accepted to the Artificial Social Intelligence (ASI) workshop at ICCV 2025!
May 17, 2025: I am starting a summer internship at the DLED Lab, IIT Ropar, with Dr. Sudarshan Iyengar and the IASc-NASI-INSA summer research fellowship!
Feb 08, 2025: KIIT Merit Scholarship Award — Ranked in the top 0.8% of the Computer Science batch.
Dec 11, 2024: I have been selected for a winter research internship at IIT Bhubaneswar on multimodal fake news detection with Dr. Shreya Ghosh.

1 / IIT Bhubaneswar (Dec 2024 - Present)

At IIT Bhubaneswar, I am currently working on VLM evaluation, reasoning benchmarks, and new multimodal LLM applications.

Vision-Language Model Evaluation: Recently, we pioneered a math and science multimodal and bilingual benchmark that tests VLMs in an exam-style evaluation setting. Through extensive testing, we found interesting metacognitive behavior patterns and reasoning gaps between open and closed models. (Accepted to IJCNLP-AACL 2025 and NeurIPS 2025 MATH-AI workshop) [Project Page]
Multimodal Cultural Competence: Created the first systematic evaluation framework for VLM cultural competence through multimodal story generation. Analyzed 5 contemporary VLMs with novel evaluation metrics. (Accepted to ICCV 2025 ASI workshop) [ArXiv] [GitHub]
Modality Translation Framework: Designed UNITE, a VLM-in-the-loop framework that achieved state-of-the-art FakeNewsNet, Fakeddit, and Hateful Memes performance.

2 / IIT Ropar (May - July 2025)

At IIT Ropar, I led the development of EduVLM-Bench, a benchmark for educational prerequisite detection, and evaluated five open-source LLMs. The top model, Gemma3 27B, achieved 38.5% accuracy. Web page

3 / RespAI Lab, KIIT (May - Dec 2024)

At , I worked on developing multimodal unlearning baselines on Llama 3.2 Vision with Dr. M. Mandal.

Research🔬

Writing

I have two tech blogs, one at Sportskeeda and the other on this website. Previously, I also have contributed to QM Games, Gamesbap, Outscal, KineTechBlog, Cryptolka, and Hardware Corpus.