Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Hands-On Deep Learning and Open-source Large Language Model Workshop

less than 1 minute read

Published:

The Cognitive Agents and Interaction Lab at University of Dhaka arranged a workshop on deep learning and open-source LLMs, where I was one of the presenters and trainers. The contents discussed in the workshop included multilayer perceptrons, neural networks and their implementations in PyTorch, transformer architectures, attention mechanisms, LLM fine-tuning and deployments.

portfolio

Apellai

A subsonic client, built using Kotlin, for storing, filtering, searching music libraries and podcasts in servers, with additional options for like/dislike, media controls and server switching.

Deversorium

Built with MERN stack, a web application for managing the residence and meal system for hostels, with separate interfaces for tenants and owners.

Habitrix

A Flutter application for tracking progress of forming new habits and visualising over defined periods of time, with additional features of priority-based task scheduling.

publications

Vision Transformer and FFT-ReLU Fusion for Advanced Image Deblurring

Preprint, 2024

In this paper, we utilise the FFT-ReLU prior to enhance relevant frequency components using the Fast Fourier Transform (FFT) while applying ReLU sparsity to suppress noise. Our approach utilizes a Vision Transformer as a pre-processing model to generate a less blurry intermediate image by capturing both local and global features, which is then refined through FFT-ReLU, resulting in a sharp, high-quality output. Our experimental results demonstrate that our method consistently outperforms state-of-the-art image deblurring models, providing sharper and more visually compelling images.

Download Paper

Blind Image Deblurring With FFT-ReLU Sparsity Prior

IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

The paper introduces a method for blind image deblurring, which is the process of recovering a sharp image from a blurred one without prior knowledge about the blur kernel. The proposed method leverages a prior that targets the blur kernel to achieve effective deblurring across a wide range of image types. The authors' extensive empirical analysis shows that their algorithm achieves results that are competitive with the state-of-the-art blind image deblurring algorithms, and it offers up to two times faster inference, making it a highly efficient solution.

Download Paper

Deblurring in the Wild: A Real-World Dataset from Smartphone High-Speed Videos

Under review, 2025

We introduce the largest real-world image deblurring dataset constructed from smartphone slow-motion videos. Using 240 frames captured over one second, we simulate realistic long-exposure blur by averaging frames to produce blurry images, while using the temporally centered frame as the sharp reference. Our dataset contains over 42,000 high-resolution blur-sharp image pairs, making it approximately 10 times larger than widely used datasets, with 8 times the amount of different scenes, including indoor and outdoor environments, with varying object and camera motions. We benchmark multiple state-of-the-art (SOTA) deblurring models on our dataset and observe significant performance degradation, highlighting the complexity and diversity of our benchmark. Our dataset serves as a challenging new benchmark to facilitate robust and generalizable deblurring models.

Download Paper

Cross-Modal Deception: Hiding Adversarial Text in Images to Jailbreak Multi-Modal LLMs

Under preparation, 2025

We investigate a method to embed a malicious textual payload directly into a completely benign-looking image, making the payload imperceptible to humans but machine-readable by an MLLM’s vision pipeline. The prompt is injected via pixel perturbation in the YCbCr color space after frequency domain transformations. We are making this perturbation trainable using differentiable proxies (such as CRAFT and CRNN OCR) to make the prompt invisible to the human eye yet readable to multimodal LLMs while also surviving its input preprocessing.

The Visual Grammar of Bias: Investigating Compositional Bias in Generative Vision-Language Models

Under preparation, 2025

Recent advancements in generative Vision-Language Models (VLMs) have enabled the creation of highly realistic and diverse imagery from text prompts. While critical research has focused on representational harms, such as demographic underrepresentation and stereotypical attribute association, this paper argues for a new dimension of analysis: Compositional Bias. We define this as the tendency of VLMs to systematically employ the principles of visual communication—such as subject placement, focus, posture, and environmental context—to reinforce societal hierarchies and stereotypes. This bias operates not at the level of what is depicted, but how it is depicted. We propose a novel research framework to identify and quantify this bias, breaking it down into three core categories: 1) Positional & Focal Bias, 2) Action & Agency Bias, and 3) Environmental & Contextual Bias. By developing a methodology that leverages established computer vision techniques to analyze generated images at scale, this work aims to provide a deeper, more nuanced understanding of how bias is encoded and perpetuated in generative visual media.

talks

teaching

CSE220: Data Structures

Undergraduate Course, CSE Department, BRAC University, 2024

This course introduces the concepts of the fundamental data structures of computer science, such as

  • arrays (linear and multidimensional)
  • linked lists (singly and doubly)
  • binary trees
  • stacks
  • heaps
  • hashing
  • graphs

CSE221: Algorithms

Undergraduate Course, CSE Department, BRAC University, 2025

This course digs deep into analysing time and space complexities of algorithms, and introduces some classic algorithms of computer science involving

  • sorting
  • searching
  • greedy algorithms
  • dynamic programming
  • graph algorithms

CSE230: Discrete Mathematics

Undergraduate Course, CSE Department, BRAC University, 2025

This is a foundational course for introducing mathematical concepts for building the basics of problem-solving. The course discusses

  • propositional logic
  • proofs
  • sets
  • linear homogeneous recurrence relations
  • prime numbers and divisibility
  • integer representations
  • pigeonhole principle
  • permutations and combinations

CSE320: Data Communications

Undergraduate Course, CSE Department, BRAC University, 2025

This course discusses the layers of the OSI model and the TCP/IP protocol suite, and discusses different functionalities and mathematical notions of the physical and data link layers. Topics include

  • network models
  • addressing principles of models
  • data and signals
  • digital transmissions
  • digital-to-analog conversion
  • analog transmission
  • multiplexing
  • error detection in data link layer
  • multiple access in data link layer