Shortcuts

TNT

class mmcls.models.TNT(arch='b', img_size=224, patch_size=16, in_channels=3, ffn_ratio=4, qkv_bias=False, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, first_stride=4, num_fcs=2, init_cfg=[{'type': 'TruncNormal', 'layer': 'Linear', 'std': 0.02}, {'type': 'Constant', 'layer': 'LayerNorm', 'val': 1.0, 'bias': 0.0}])[source]

Transformer in Transformer.

A PyTorch implement of: Transformer in Transformer

Inspiration from https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/tnt.py

Parameters
  • arch (str | dict) – Vision Transformer architecture Default: ‘b’

  • img_size (int | tuple) – Input image size. Default to 224

  • patch_size (int | tuple) – The patch size. Deault to 16

  • in_channels (int) – Number of input channels. Default to 3

  • ffn_ratio (int) – A ratio to calculate the hidden_dims in ffn layer. Default: 4

  • qkv_bias (bool) – Enable bias for qkv if True. Default False

  • drop_rate (float) – Probability of an element to be zeroed after the feed forward layer. Default 0.

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.

  • drop_path_rate (float) – stochastic depth rate. Default 0.

  • act_cfg (dict) – The activation config for FFNs. Defaults to GELU.

  • norm_cfg (dict) – Config dict for normalization layer. Default layer normalization

  • first_stride (int) – The stride of the conv2d layer. We use a conv2d layer and a unfold layer to implement image to pixel embedding.

  • num_fcs (int) – The number of fully-connected layers for FFNs. Default 2

  • init_cfg (dict, optional) – Initialization config dict

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

Read the Docs v: latest
Versions
master
latest
1.x
dev-1.x
Downloads
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.