From Attention to Frequency: Integration of Vision Transformer and FFT-ReLU for Enhanced Image Deblurring

International Conference on Agents and Artificial Intelligence, 2026

Image deblurring is a crucial task in computer vision, aiming to recover sharp images from blurry inputs caused by camera shake, motion blur, or other factors. Traditional methods often struggle with complex or severe blur, particularly in high-resolution images. Recent advancements in deep learning, particularly Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), have shown promise but have limitations in capturing long-range dependencies and computational efficiency. In this paper, we propose a novel image deblurring approach that integrates the strengths of Vision Transformers and the Fast Fourier Transform (FFT) with ReLU (Rectified Linear Unit) sparsity. Our method leverages a Vision Transformer architecture designed for image restoration tasks to preprocess blurry images, efficiently capturing both local and global features to reduce blurriness. This is followed by post-processing using FFT with ReLU sparsity, which targets and removes blur-related frequencies while preserving image sharpness and clarity. Extensive experiments on benchmark datasets demonstrate that our method produces sharper, more visually appealing images compared to state-of-the-art models. Furthermore, subjective human evaluations alongside traditional metrics such as PSNR and SSIM provide comprehensive evidence of the practical effectiveness of our deblurring technique. Our results indicate that the proposed method not only excels in quantitative measures but also significantly enhances perceptual image quality, making it highly suitable for real-world applications.

Share on

Twitter Facebook LinkedIn

Prothito Shovon Majumder

Share on