Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Posts
Hands-On Deep Learning and Open-source Large Language Model Workshop
Published:
The Cognitive Agents and Interaction Lab at University of Dhaka arranged a workshop on deep learning and open-source LLMs, where I was one of the presenters and trainers. The contents discussed in the workshop included multilayer perceptrons, neural networks and their implementations in PyTorch, transformer architectures, attention mechanisms, LLM fine-tuning and deployments.
Presentation on Restormer: Efficient Transformer for High-Resolution Image Restoration
Published:
Presentation on the paper from Syed Waqas Zamir et al. that I did at Cognitive Agents and Interaction Lab, University of Dhaka.
Presentation on MAXIM - Multi-Axis MLP For Image Processing
Published:
Presentation on the paper from Zhengzhong Tu et al. that I did at Cognitive Agents and Interaction Lab, University of Dhaka.
Presentation on MPRNet: Multi-Stage Progressive Image Restoration
Published:
Presentation on the paper from Syed Waqas Zamir et al. that I did at Cognitive Agents and Interaction Lab, University of Dhaka.
Presentation on Blind Image Deblurring With Dark Channel Prior
Published:
Presentation on the paper from Jinshan Pan et al. that I did at Cognitive Agents and Interaction Lab, University of Dhaka.
portfolio
Apellai
A subsonic client, built using Kotlin, for storing, filtering, searching music libraries and podcasts in servers, with additional options for like/dislike, media controls and server switching.
Deversorium
Built with MERN stack, a web application for managing the residence and meal system for hostels, with separate interfaces for tenants and owners.
Habitrix
A Flutter application for tracking progress of forming new habits and visualising over defined periods of time, with additional features of priority-based task scheduling.
publications
Vision Transformer and FFT-ReLU Fusion for Advanced Image Deblurring
Preprint, 2024
In this paper, we utilise the FFT-ReLU prior to enhance relevant frequency components using the Fast Fourier Transform (FFT) while applying ReLU sparsity to suppress noise. Our approach utilizes a Vision Transformer as a pre-processing model to generate a less blurry intermediate image by capturing both local and global features, which is then refined through FFT-ReLU, resulting in a sharp, high-quality output. Our experimental results demonstrate that our method consistently outperforms state-of-the-art image deblurring models, providing sharper and more visually compelling images.
Blind Image Deblurring With FFT-ReLU Sparsity Prior
IEEE/CVF Winter Conference on Applications of Computer Vision, 2025
The paper introduces a method for blind image deblurring, which is the process of recovering a sharp image from a blurred one without prior knowledge about the blur kernel. The proposed method leverages a prior that targets the blur kernel to achieve effective deblurring across a wide range of image types. The authors' extensive empirical analysis shows that their algorithm achieves results that are competitive with the state-of-the-art blind image deblurring algorithms, and it offers up to two times faster inference, making it a highly efficient solution.
Deblurring in the Wild: A Real-World Dataset from Smartphone High-Speed Videos
Under review, 2025
We introduce the largest real-world image deblurring dataset constructed from smartphone slow-motion videos. Using 240 frames captured over one second, we simulate realistic long-exposure blur by averaging frames to produce blurry images, while using the temporally centered frame as the sharp reference. Our dataset contains over 42,000 high-resolution blur-sharp image pairs, making it approximately 10 times larger than widely used datasets, with 8 times the amount of different scenes, including indoor and outdoor environments, with varying object and camera motions. We benchmark multiple state-of-the-art (SOTA) deblurring models on our dataset and observe significant performance degradation, highlighting the complexity and diversity of our benchmark. Our dataset serves as a challenging new benchmark to facilitate robust and generalizable deblurring models.
Cross-Modal Deception: Hiding Adversarial Text in Images to Jailbreak Multi-Modal LLMs
Under preparation, 2025
We investigate a method to embed a malicious textual payload directly into a completely benign-looking image, making the payload imperceptible to humans but machine-readable by an MLLM’s vision pipeline. The prompt is injected via pixel perturbation in the YCbCr color space after frequency domain transformations. We are making this perturbation trainable using differentiable proxies (such as CRAFT and CRNN OCR) to make the prompt invisible to the human eye yet readable to multimodal LLMs while also surviving its input preprocessing.
The Visual Grammar of Bias: Investigating Compositional Bias in Generative Vision-Language Models
Under preparation, 2025
Recent advancements in generative Vision-Language Models (VLMs) have enabled the creation of highly realistic and diverse imagery from text prompts. While critical research has focused on representational harms, such as demographic underrepresentation and stereotypical attribute association, this paper argues for a new dimension of analysis: Compositional Bias. We define this as the tendency of VLMs to systematically employ the principles of visual communication—such as subject placement, focus, posture, and environmental context—to reinforce societal hierarchies and stereotypes. This bias operates not at the level of what is depicted, but how it is depicted. We propose a novel research framework to identify and quantify this bias, breaking it down into three core categories: 1) Positional & Focal Bias, 2) Action & Agency Bias, and 3) Environmental & Contextual Bias. By developing a methodology that leverages established computer vision techniques to analyze generated images at scale, this work aims to provide a deeper, more nuanced understanding of how bias is encoded and perpetuated in generative visual media.
talks
Talk 1 on Relevant Topic in Your Field
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Conference Proceeding talk 3 on Relevant Topic in Your Field
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
teaching
CSE220: Data Structures
Undergraduate Course, CSE Department, BRAC University, 2024
This course introduces the concepts of the fundamental data structures of computer science, such as
- arrays (linear and multidimensional)
- linked lists (singly and doubly)
- binary trees
- stacks
- heaps
- hashing
- graphs
CSE221: Algorithms
Undergraduate Course, CSE Department, BRAC University, 2025
This course digs deep into analysing time and space complexities of algorithms, and introduces some classic algorithms of computer science involving
- sorting
- searching
- greedy algorithms
- dynamic programming
- graph algorithms
CSE230: Discrete Mathematics
Undergraduate Course, CSE Department, BRAC University, 2025
This is a foundational course for introducing mathematical concepts for building the basics of problem-solving. The course discusses
- propositional logic
- proofs
- sets
- linear homogeneous recurrence relations
- prime numbers and divisibility
- integer representations
- pigeonhole principle
- permutations and combinations
CSE320: Data Communications
Undergraduate Course, CSE Department, BRAC University, 2025
This course discusses the layers of the OSI model and the TCP/IP protocol suite, and discusses different functionalities and mathematical notions of the physical and data link layers. Topics include
- network models
- addressing principles of models
- data and signals
- digital transmissions
- digital-to-analog conversion
- analog transmission
- multiplexing
- error detection in data link layer
- multiple access in data link layer
