Note
You are reading the documentation for MMClassification 0.x, which will soon be deprecated at the end of 2022. We recommend you upgrade to MMClassification 1.0 to enjoy fruitful new features and better performance brought by OpenMMLab 2.0. Check the installation tutorial, migration tutorial and changelog for more details.
TNT¶
- class mmcls.models.TNT(arch='b', img_size=224, patch_size=16, in_channels=3, ffn_ratio=4, qkv_bias=False, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, first_stride=4, num_fcs=2, init_cfg=[{'type': 'TruncNormal', 'layer': 'Linear', 'std': 0.02}, {'type': 'Constant', 'layer': 'LayerNorm', 'val': 1.0, 'bias': 0.0}])[source]¶
Transformer in Transformer.
A PyTorch implement of: Transformer in Transformer
Inspiration from https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/tnt.py
- Parameters
arch (str | dict) – Vision Transformer architecture Default: ‘b’
in_channels (int) – Number of input channels. Default to 3
ffn_ratio (int) – A ratio to calculate the hidden_dims in ffn layer. Default: 4
qkv_bias (bool) – Enable bias for qkv if True. Default False
drop_rate (float) – Probability of an element to be zeroed after the feed forward layer. Default 0.
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.
drop_path_rate (float) – stochastic depth rate. Default 0.
act_cfg (dict) – The activation config for FFNs. Defaults to GELU.
norm_cfg (dict) – Config dict for normalization layer. Default layer normalization
first_stride (int) – The stride of the conv2d layer. We use a conv2d layer and a unfold layer to implement image to pixel embedding.
num_fcs (int) – The number of fully-connected layers for FFNs. Default 2
init_cfg (dict, optional) – Initialization config dict