Shortcuts

iTPNPretrainDecoder

class mmpretrain.models.necks.iTPNPretrainDecoder(num_patches=196, patch_size=16, in_chans=3, embed_dim=512, fpn_dim=256, fpn_depth=2, decoder_embed_dim=512, decoder_depth=6, decoder_num_heads=16, mlp_ratio=4, norm_cfg={'eps': 1e-06, 'type': 'LN'}, reconstruction_type='pixel', num_outs=3, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, predict_feature_dim=None, init_cfg=None)[source]

The neck module of iTPN (transformer pyramid network).

Parameters:
  • num_patches (int) – The number of total patches. Defaults to 196.

  • patch_size (int) – Image patch size. Defaults to 16.

  • in_chans (int) – The channel of input image. Defaults to 3.

  • embed_dim (int) – Encoder’s embedding dimension. Defaults to 512.

  • fpn_dim (int) – The fpn dimension (channel number).

  • fpn_depth (int) – The layer number of feature pyramid.

  • decoder_embed_dim (int) – Decoder’s embedding dimension. Defaults to 512.

  • decoder_depth (int) – The depth of decoder. Defaults to 8.

  • decoder_num_heads (int) – Number of attention heads of decoder. Defaults to 16.

  • mlp_ratio (int) – Ratio of mlp hidden dim to decoder’s embedding dim. Defaults to 4.

  • norm_cfg (dict) – Normalization layer. Defaults to LayerNorm.

  • reconstruction_type (str) – The itpn supports 2 kinds of supervisions. Defaults to ‘pixel’.

  • num_outs (int) – The output number of neck (transformer pyramid network). Defaults to 3.

  • predict_feature_dim (int) – The output dimension to supervision. Defaults to None.

  • init_cfg (Union[List[dict], dict], optional) – Initialization config dict. Defaults to None.

property decoder_norm

The normalization layer of decoder.

forward(x, ids_restore=None)[source]

The forward function.

The process computes the visible patches’ features vectors and the mask tokens to output feature vectors, which will be used for reconstruction.

Parameters:
  • x (torch.Tensor) – hidden features, which is of shape B x (L * mask_ratio) x C.

  • ids_restore (torch.Tensor) – ids to restore original image.

Returns:

The reconstructed feature vectors, which is of shape B x (num_patches) x C.

Return type:

torch.Tensor

init_weights()[source]

Initialize position embedding and mask token of MAE decoder.

rescale_init_weight()[source]

Rescale the initialized weights.

Read the Docs v: latest
Versions
latest
stable
mmcls-1.x
mmcls-0.x
dev
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.