Apellai
A subsonic client, built using Kotlin, for storing, filtering, searching music libraries and podcasts in servers, with additional options for like/dislike, media controls and server switching.
A subsonic client, built using Kotlin, for storing, filtering, searching music libraries and podcasts in servers, with additional options for like/dislike, media controls and server switching.
Built with MERN stack, a web application for managing the residence and meal system for hostels, with separate interfaces for tenants and owners.
A Flutter application for tracking progress of forming new habits and visualising over defined periods of time, with additional features of priority-based task scheduling.
Preprint, 2024
In this paper, we utilise the FFT-ReLU prior to enhance relevant frequency components using the Fast Fourier Transform (FFT) while applying ReLU sparsity to suppress noise. Our approach utilizes a Vision Transformer as a pre-processing model to generate a less blurry intermediate image by capturing both local and global features, which is then refined through FFT-ReLU, resulting in a sharp, high-quality output. Our experimental results demonstrate that our method consistently outperforms state-of-the-art image deblurring models, providing sharper and more visually compelling images.
IEEE/CVF Winter Conference on Applications of Computer Vision, 2025
The paper introduces a method for blind image deblurring, which is the process of recovering a sharp image from a blurred one without prior knowledge about the blur kernel. The proposed method leverages a prior that targets the blur kernel to achieve effective deblurring across a wide range of image types. The authors' extensive empirical analysis shows that their algorithm achieves results that are competitive with the state-of-the-art blind image deblurring algorithms, and it offers up to two times faster inference, making it a highly efficient solution.
Under review, 2025
We introduce the largest real-world image deblurring dataset constructed from smartphone slow-motion videos. Using 240 frames captured over one second, we simulate realistic long-exposure blur by averaging frames to produce blurry images, while using the temporally centered frame as the sharp reference. Our dataset contains over 42,000 high-resolution blur-sharp image pairs, making it approximately 10 times larger than widely used datasets, with 8 times the amount of different scenes, including indoor and outdoor environments, with varying object and camera motions. We benchmark multiple state-of-the-art (SOTA) deblurring models on our dataset and observe significant performance degradation, highlighting the complexity and diversity of our benchmark. Our dataset serves as a challenging new benchmark to facilitate robust and generalizable deblurring models.
Under preparation, 2025
We investigate a method to embed a malicious textual payload directly into a completely benign-looking image, making the payload imperceptible to humans but machine-readable by an MLLM’s vision pipeline. The prompt is injected via pixel perturbation in the YCbCr color space after frequency domain transformations. We are making this perturbation trainable using differentiable proxies (such as CRAFT and CRNN OCR) to make the prompt invisible to the human eye yet readable to multimodal LLMs while also surviving its input preprocessing.
Under preparation, 2025
Recent advancements in generative Vision-Language Models (VLMs) have enabled the creation of highly realistic and diverse imagery from text prompts. While critical research has focused on representational harms, such as demographic underrepresentation and stereotypical attribute association, this paper argues for a new dimension of analysis: Compositional Bias. We define this as the tendency of VLMs to systematically employ the principles of visual communication—such as subject placement, focus, posture, and environmental context—to reinforce societal hierarchies and stereotypes. This bias operates not at the level of what is depicted, but how it is depicted. We propose a novel research framework to identify and quantify this bias, breaking it down into three core categories: 1) Positional & Focal Bias, 2) Action & Agency Bias, and 3) Environmental & Contextual Bias. By developing a methodology that leverages established computer vision techniques to analyze generated images at scale, this work aims to provide a deeper, more nuanced understanding of how bias is encoded and perpetuated in generative visual media.
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Undergraduate Course, CSE Department, BRAC University, 2024
This course introduces the concepts of the fundamental data structures of computer science, such as
Undergraduate Course, CSE Department, BRAC University, 2025
This course digs deep into analysing time and space complexities of algorithms, and introduces some classic algorithms of computer science involving
Undergraduate Course, CSE Department, BRAC University, 2025
This is a foundational course for introducing mathematical concepts for building the basics of problem-solving. The course discusses
Undergraduate Course, CSE Department, BRAC University, 2025
This course discusses the layers of the OSI model and the TCP/IP protocol suite, and discusses different functionalities and mathematical notions of the physical and data link layers. Topics include