

Arka Mukherjee
UG Researcher @ IIT BBS & RespAI Lab
About Me:
Hi, I'm Arka Mukherjee, a KIIT CS sophomore passionate about research in multimodality, LLMs, and trustworthy AI. I also build practical AI tools, such as PDF parsers using RAG and lightweight multimodal models for image classification.
Beyond research, I engage with the tech community as a tech journalist and YouTuber, where I share insights on GPUs—my first love—and emerging trends in AI.
News📰
-
Feb 19, 2025: "UNITE-FND: Reframing Multimodal Fake News Detection through Unimodal Scene Translation" — My winter internship project is now on ArXiv! (https://arxiv.org/abs/2502.11132)
-
Feb 08, 2025: KIIT Merit Scholarship Award — I am one of the top 3 performers among 3,500 students in Computer Science.
-
Dec 11, 2024: I have been selected for a winter research internship at IIT Bhubaneswar on multimodal fake news detection.
1 / IIT Bhubaneswar
At IIT Bhubaneswar, I worked on multimodal fake news detection with Dr. S. Ghosh. I developed a novel modality translation technique that reduces parameter constraints by 10x while maintaining SoTA-level accuracies of 92.5%. The project has been compiled into a paper and is on ArXiv.
At , I am currently working on the problem of modality-agnostic and easily deployable guardrails for agentic AI systems. Previously, I built guardrails for VLMs and SmartParse, an interactive web app for PDF parsing. This work is being done with Dr. M. Mandal.
Research🔬
Pet Projects⚙️
Built SmartParse, a powerful PDF processing app leveraging Retrieval-Augmented Generation (RAG) to chat with complex documents on an interactive web interface. Used FAISS (Facebook AI Similarity Search) and Mixtral 8x7B LLM. The tool was used by KIIT Medical School.
Built a multimodal natural language-supervised model for medicinal leaf recognition. Implemented efficient image galleries for efficient inference while keeping a tight limit on the parameter count. The 15M model achieved accuracies of up to 99.36% on benchmark datasets.
Set up a Python script to fine-tune the Llama 3.2 Vision Language Model (VLM). Created a dataset with iPhone 16 images and successfully taught the model to detect them given an appropriate image (the model had no prior knowledge of the iPhone 16)
Built Dual-FND, an agentic fake news detection that extracts claims and facts from the news resource to guide zero-shot detection with SoTA LLMs such as Gemini 2.0 Flash. The method achieved higher accuracies than unguided zero-shot detection with language models.
Writing
I have two tech blogs, one at Sportskeeda and the other on this website. Previously, I also have contributed to QM Games, Gamesbap, Outscal, KineTechBlog, Cryptolka, and Hardware Corpus.

Senior tech journalist at Sportskeeda, covering the latest on GPUs and computer hardware.
KIIT University
I'm majoring in Computer Science and Systems Engineering.
GPA: 9.80/10
Education
2023-2027