Shortcuts

Visual-Attention-Network

Abstract

While originally designed for natural language processing (NLP) tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, the 2D nature of images brings three challenges for applying self-attention in computer vision. (1) Treating images as 1D sequences neglects their 2D structures. (2) The quadratic complexity is too expensive for high-resolution images. (3) It only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel large kernel attention (LKA) module to enable self-adaptive and long-range correlations in self-attention while avoiding the above issues. We further introduce a novel neural network based on LKA, namely Visual Attention Network (VAN). While extremely simple and efficient, VAN outperforms the state-of-the-art vision transformers and convolutional neural networks with a large margin in extensive experiments, including image classification, object detection, semantic segmentation, instance segmentation, etc.

How to use it?

from mmpretrain import inference_model

predict = inference_model('van-tiny_3rdparty_in1k', 'demo/bird.JPEG')
print(predict['pred_class'])
print(predict['pred_score'])

Models and results

Image Classification on ImageNet-1k

Model

Pretrain

Params (M)

Flops (G)

Top-1 (%)

Top-5 (%)

Config

Download

van-tiny_3rdparty_in1k*

From scratch

4.11

0.88

75.41

93.02

config

model

van-small_3rdparty_in1k*

From scratch

13.86

2.52

81.01

95.63

config

model

van-base_3rdparty_in1k*

From scratch

26.58

5.03

82.80

96.21

config

model

van-large_3rdparty_in1k*

From scratch

44.77

8.99

83.86

96.73

config

model

Models with * are converted from the official repo. The config files of these models are only for inference. We haven’t reproduce the training results.

Citation

@article{guo2022visual,
  title={Visual Attention Network},
  author={Guo, Meng-Hao and Lu, Cheng-Ze and Liu, Zheng-Ning and Cheng, Ming-Ming and Hu, Shi-Min},
  journal={arXiv preprint arXiv:2202.09741},
  year={2022}
}
Read the Docs v: latest
Versions
latest
stable
mmcls-1.x
mmcls-0.x
dev
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.