Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010. - References

Journals by Subject

Publish with us

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Article citationsMore>>

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010.
https://arxiv.org/abs/1706.03762

has been cited by the following article:

TITLE: Dual-Dilated Large Kernel Convolution for Visual Attention Network

AUTHORS: Kwok-Wai Cheung, Yuk Tai Siu, Ka Lok Sobel Chan

KEYWORDS: Attention, Large Kernel, Dilated Convolution

JOURNAL NAME: Intelligent Information Management, Vol.17 No.6, November 11, 2025

ABSTRACT: Visual Attention Networks (VANs) leveraging Large Kernel Attention (LKA) have demonstrated remarkable performance in diverse computer vision tasks, often outperforming Vision Transformers (ViTs) in some cases. LKA strategically combines the strengths of Convolutional Neural Networks (CNNs), such as local structure information, with the long-range dependency and adaptability of self-attention mechanisms, while maintaining linear computational complexity. This paper introduces Dual-Dilated Large Kernel (D2LK), a novel attention mechanism designed to enhance LKA’s kernel decomposition. D2LK improves upon LKA by incorporating an additional depth-wise dilation convolution layer, which enables the approximation of larger kernel convolutions with further reduced computational requirements. This decomposition allows for a more efficient representation of larger effective receptive fields. Our experiments demonstrate that D2LK achieves a superior balance between efficiency and performance. For instance, a D2LK module configured with a kernel size of 29 and 32 channels reduces parameters by 11% (3,008 parameters) compared to an LKA module with the same specifications (3,392 parameters). When integrated into the VAN-B0 architecture, D2LK with a larger kernel size of 29 yields a Top-1 accuracy of 85.1% on ImageNet100 classification, a slight improvement over the LKA baseline (kernel size 21), which achieved 85.0%. Critically, this performance gain is accomplished with a marginally reduced overall parameter count (3.8649 million for D2LK vs. 3.8745 million for LKA). These results validate D2LK as an efficient and effective attention mechanism for Visual Attention Networks, enabling enhanced receptive fields at lower computational overhead.

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals by Subject

Publish with us

Article citationsMore>>

Home

About SCIRP

Service

Policies