Shortcuts

HorNet

Abstract

Recent progress in vision Transformers exhibits great success in various tasks driven by the new spatial modeling mechanism based on dot-product self-attention. In this paper, we show that the key ingredients behind the vision Transformers, namely input-adaptive, long-range and high-order spatial interactions, can also be efficiently implemented with a convolution-based framework. We present the Recursive Gated Convolution (g nConv) that performs high-order spatial interactions with gated convolutions and recursive designs. The new operation is highly flexible and customizable, which is compatible with various variants of convolution and extends the two-order interactions in self-attention to arbitrary orders without introducing significant extra computation. g nConv can serve as a plug-and-play module to improve various vision Transformers and convolution-based models. Based on the operation, we construct a new family of generic vision backbones named HorNet. Extensive experiments on ImageNet classification, COCO object detection and ADE20K semantic segmentation show HorNet outperform Swin Transformers and ConvNeXt by a significant margin with similar overall architecture and training configurations. HorNet also shows favorable scalability to more training data and a larger model size. Apart from the effectiveness in visual encoders, we also show g nConv can be applied to task-specific decoders and consistently improve dense prediction performance with less computation. Our results demonstrate that g nConv can be a new basic module for visual modeling that effectively combines the merits of both vision Transformers and CNNs. Code is available at https://github.com/raoyongming/HorNet.

How to use it?

from mmpretrain import inference_model

predict = inference_model('hornet-tiny_3rdparty_in1k', 'demo/bird.JPEG')
print(predict['pred_class'])
print(predict['pred_score'])

Models and results

Image Classification on ImageNet-1k

Model

Pretrain

Params (M)

Flops (G)

Top-1 (%)

Top-5 (%)

Config

Download

hornet-tiny_3rdparty_in1k*

From scratch

22.41

3.98

82.84

96.24

config

model

hornet-tiny-gf_3rdparty_in1k*

From scratch

22.99

3.90

82.98

96.38

config

model

hornet-small_3rdparty_in1k*

From scratch

49.53

8.83

83.79

96.75

config

model

hornet-small-gf_3rdparty_in1k*

From scratch

50.40

8.71

83.98

96.77

config

model

hornet-base_3rdparty_in1k*

From scratch

87.26

15.58

84.24

96.94

config

model

hornet-base-gf_3rdparty_in1k*

From scratch

88.42

15.42

84.32

96.95

config

model

Models with * are converted from the official repo. The config files of these models are only for inference. We haven’t reproduce the training results.

Citation

@article{rao2022hornet,
  title={HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions},
  author={Rao, Yongming and Zhao, Wenliang and Tang, Yansong and Zhou, Jie and Lim, Ser-Lam and Lu, Jiwen},
  journal={arXiv preprint arXiv:2207.14284},
  year={2022}
}
Read the Docs v: latest
Versions
latest
stable
mmcls-1.x
mmcls-0.x
dev
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.