Publications

You can also find my articles on my Google Scholar profile.

Journal Articles


Vision Transformer and FFT-ReLU Fusion for Advanced Image Deblurring

Preprint, 2024

In this paper, we utilise the FFT-ReLU prior to enhance relevant frequency components using the Fast Fourier Transform (FFT) while applying ReLU sparsity to suppress noise. Our approach utilizes a Vision Transformer as a pre-processing model to generate a less blurry intermediate image by capturing both local and global features, which is then refined through FFT-ReLU, resulting in a sharp, high-quality output. Our experimental results demonstrate that our method consistently outperforms state-of-the-art image deblurring models, providing sharper and more visually compelling images.

Download Paper

Conference Papers


The Visual Grammar of Bias: Investigating Compositional Bias in Generative Vision-Language Models

Under preparation, 2025

Recent advancements in generative Vision-Language Models (VLMs) have enabled the creation of highly realistic and diverse imagery from text prompts. While critical research has focused on representational harms, such as demographic underrepresentation and stereotypical attribute association, this paper argues for a new dimension of analysis: Compositional Bias. We define this as the tendency of VLMs to systematically employ the principles of visual communication—such as subject placement, focus, posture, and environmental context—to reinforce societal hierarchies and stereotypes. This bias operates not at the level of what is depicted, but how it is depicted. We propose a novel research framework to identify and quantify this bias, breaking it down into three core categories: 1) Positional & Focal Bias, 2) Action & Agency Bias, and 3) Environmental & Contextual Bias. By developing a methodology that leverages established computer vision techniques to analyze generated images at scale, this work aims to provide a deeper, more nuanced understanding of how bias is encoded and perpetuated in generative visual media.

Cross-Modal Deception: Hiding Adversarial Text in Images to Jailbreak Multi-Modal LLMs

Under preparation, 2025

We investigate a method to embed a malicious textual payload directly into a completely benign-looking image, making the payload imperceptible to humans but machine-readable by an MLLM’s vision pipeline. The prompt is injected via pixel perturbation in the YCbCr color space after frequency domain transformations. We are making this perturbation trainable using differentiable proxies (such as CRAFT and CRNN OCR) to make the prompt invisible to the human eye yet readable to multimodal LLMs while also surviving its input preprocessing.

Deblurring in the Wild: A Real-World Dataset from Smartphone High-Speed Videos

Under review, 2025

We introduce the largest real-world image deblurring dataset constructed from smartphone slow-motion videos. Using 240 frames captured over one second, we simulate realistic long-exposure blur by averaging frames to produce blurry images, while using the temporally centered frame as the sharp reference. Our dataset contains over 42,000 high-resolution blur-sharp image pairs, making it approximately 10 times larger than widely used datasets, with 8 times the amount of different scenes, including indoor and outdoor environments, with varying object and camera motions. We benchmark multiple state-of-the-art (SOTA) deblurring models on our dataset and observe significant performance degradation, highlighting the complexity and diversity of our benchmark. Our dataset serves as a challenging new benchmark to facilitate robust and generalizable deblurring models.

Download Paper

Blind Image Deblurring With FFT-ReLU Sparsity Prior

IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

The paper introduces a method for blind image deblurring, which is the process of recovering a sharp image from a blurred one without prior knowledge about the blur kernel. The proposed method leverages a prior that targets the blur kernel to achieve effective deblurring across a wide range of image types. The authors' extensive empirical analysis shows that their algorithm achieves results that are competitive with the state-of-the-art blind image deblurring algorithms, and it offers up to two times faster inference, making it a highly efficient solution.

Download Paper