iTPNHiViT¶

class mmpretrain.models.selfsup.iTPNHiViT(arch='base', img_size=224, patch_size=16, inner_patches=4, stem_mlp_ratio=3.0, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'eps': 1e-06, 'type': 'LN'}, ape=True, rpe=False, layer_scale_init_value=0.0, mask_ratio=0.75, reconstruction_type='pixel', **kwargs)[source]¶

HiViT for iTPN pre-training.

Parameters:

img_size (int | tuple) – Input image size. Defaults to 224.
patch_size (int | tuple) – The patch size. Defaults to 16.
inner_patches (int) – Inner patch. Defaults to 4.
stem_mlp_ratio (int) – Ratio of MLP hidden dim to embedding dim in the first two stages. Defaults to 3.
mlp_ratio (int) – Ratio of MLP hidden dim to embedding dim in the last stage. Defaults to 4.
qkv_bias (bool) – Enable bias for qkv projections if True.
qk_scale (float) – The number of divider after q@k. Default to None.
drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.
attn_drop_rate (float) – The drop out rate for attention output weights. Defaults to 0.
drop_path_rate (float) – stochastic depth rate. Defaults to 0.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').
ape (bool) – If True, add absolute position embedding to the patch embedding.
rpe (bool) – If True, add relative position embedding to the patch embedding.
layer_scale_init_value (float) – Layer-scale init values. Defaults to 0.
mask_ratio (bool) – The ratio of total number of patches to be masked. Defaults to 0.75.
reconstruction_type (str) – The reconstruction of self-supervised learning. Defaults to ‘pixel’.

forward(x, mask=True)[source]¶

Generate features for masked images.

The function supports two kind of forward behaviors. If the mask is True, the function will generate mask to masking some patches randomly and get the hidden features for visible patches, which means the function will be executed as masked imagemodeling pre-training; if the mask is None or False, the forward function will call super().forward(), which extract features from images without mask.

Parameters:

x (torch.Tensor) – Input images, which is of shape B x C x H x W.
mask (bool, optional) – To indicate whether the forward function generating mask or not.

Returns:

Hidden features, mask and the ids to restore original image.

x (torch.Tensor): hidden features, which is of shape B x (L * mask_ratio) x C.
mask (torch.Tensor): mask used to mask image.
ids_restore (torch.Tensor): ids to restore original image.

Return type:

Tuple[torch.Tensor, torch.Tensor, torch.Tensor]

forward_clip(x, mask=True)[source]¶

Generate features for masked images.

The function supports two kind of forward behaviors. If the mask is True, the function will generate mask to masking some patches randomly and get the hidden features for visible patches, which means the function will be executed as masked imagemodeling pre-training; if the mask is None or False, the forward function will call super().forward(), which extract features from images without mask.

Parameters:

x (torch.Tensor) – Input images, which is of shape B x C x H x W.
mask (bool, optional) – To indicate whether the forward function generating mask or not.

Returns:

Hidden features, mask and the ids to restore original image.

x (torch.Tensor): hidden features, which is of shape B x (L * mask_ratio) x C.
mask (torch.Tensor): mask used to mask image.
ids_restore (torch.Tensor): ids to restore original image.

Return type:

Tuple[torch.Tensor, torch.Tensor, torch.Tensor]

forward_pixel(x, mask=True)[source]¶

Generate features for masked images.

The function supports two kind of forward behaviors. If the mask is True, the function will generate mask to masking some patches randomly and get the hidden features for visible patches, which means the function will be executed as masked imagemodeling pre-training; if the mask is None or False, the forward function will call super().forward(), which extract features from images without mask.

Parameters:

x (torch.Tensor) – Input images, which is of shape B x C x H x W.
mask (bool, optional) – To indicate whether the forward function generating mask or not.

Returns:

Hidden features, mask and the ids to restore original image.

x (torch.Tensor): hidden features, which is of shape B x (L * mask_ratio) x C.
mask (torch.Tensor): mask used to mask image.
ids_restore (torch.Tensor): ids to restore original image.

Return type:

Tuple[torch.Tensor, torch.Tensor, torch.Tensor]

init_weights()[source]¶: Initialize position embedding, patch embedding and cls token.

rescale_init_weight()[source]¶: Rescale the initialized weights.