TITLE:
Dual-Dilated Large Kernel Convolution for Visual Attention Network
AUTHORS:
Kwok-Wai Cheung, Yuk Tai Siu, Ka Lok Sobel Chan
KEYWORDS:
Attention, Large Kernel, Dilated Convolution
JOURNAL NAME:
Intelligent Information Management,
Vol.17 No.6,
November
11,
2025
ABSTRACT: Visual Attention Networks (VANs) leveraging Large Kernel Attention (LKA) have demonstrated remarkable performance in diverse computer vision tasks, often outperforming Vision Transformers (ViTs) in some cases. LKA strategically combines the strengths of Convolutional Neural Networks (CNNs), such as local structure information, with the long-range dependency and adaptability of self-attention mechanisms, while maintaining linear computational complexity. This paper introduces Dual-Dilated Large Kernel (D2LK), a novel attention mechanism designed to enhance LKA’s kernel decomposition. D2LK improves upon LKA by incorporating an additional depth-wise dilation convolution layer, which enables the approximation of larger kernel convolutions with further reduced computational requirements. This decomposition allows for a more efficient representation of larger effective receptive fields. Our experiments demonstrate that D2LK achieves a superior balance between efficiency and performance. For instance, a D2LK module configured with a kernel size of 29 and 32 channels reduces parameters by 11% (3,008 parameters) compared to an LKA module with the same specifications (3,392 parameters). When integrated into the VAN-B0 architecture, D2LK with a larger kernel size of 29 yields a Top-1 accuracy of 85.1% on ImageNet100 classification, a slight improvement over the LKA baseline (kernel size 21), which achieved 85.0%. Critically, this performance gain is accomplished with a marginally reduced overall parameter count (3.8649 million for D2LK vs. 3.8745 million for LKA). These results validate D2LK as an efficient and effective attention mechanism for Visual Attention Networks, enabling enhanced receptive fields at lower computational overhead.