Note
You are reading the documentation for MMClassification 0.x, which will soon be deprecated at the end of 2022. We recommend you upgrade to MMClassification 1.0 to enjoy fruitful new features and better performance brought by OpenMMLab 2.0. Check the installation tutorial, migration tutorial and changelog for more details.
T2T_ViT¶
- class mmcls.models.T2T_ViT(img_size=224, in_channels=3, embed_dims=384, num_layers=14, out_indices=- 1, drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, final_norm=True, with_cls_token=True, output_cls_token=True, interpolate_mode='bicubic', t2t_cfg={}, layer_cfgs={}, init_cfg=None)[source]¶
Tokens-to-Token Vision Transformer (T2T-ViT)
A PyTorch implementation of Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
- Parameters
img_size (int | tuple) – The expected input image shape. Because we support dynamic input shape, just set the argument to the most common input image shape. Defaults to 224.
in_channels (int) – Number of input channels.
embed_dims (int) – Embedding dimension.
num_layers (int) – Num of transformer layers in encoder. Defaults to 14.
out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.
drop_rate (float) – Dropout rate after position embedding. Defaults to 0.
drop_path_rate (float) – stochastic depth rate. Defaults to 0.
norm_cfg (dict) – Config dict for normalization layer. Defaults to
dict(type='LN')
.final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.
with_cls_token (bool) – Whether concatenating class token into image tokens as transformer input. Defaults to True.
output_cls_token (bool) – Whether output the cls_token. If set True,
with_cls_token
must be True. Defaults to True.interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Defaults to “bicubic”.
t2t_cfg (dict) – Extra config of Tokens-to-Token module. Defaults to an empty dict.
layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.
init_cfg (dict, optional) – The Config for initialization. Defaults to None.