MAEHiViT¶
- class mmpretrain.models.selfsup.MAEHiViT(arch='b', img_size=224, patch_size=16, inner_patches=4, out_indices=[23], drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'eps': 1e-06, 'type': 'LN'}, ape=True, rpe=False, layer_scale_init_value=0.0, mask_ratio=0.75, init_cfg=None)[source]¶
HiViT for MAE pre-training.
A PyTorch implement of: HiViT: A Simple and More Efficient Design of Hierarchical Vision Transformer. This module implements the patch masking in MAE and initialize the position embedding with sine-cosine position embedding.
- Parameters:
arch (str | dict) – Vision Transformer architecture Default: ‘b’
patch_size (int | tuple) – The patch size Defaults to 4, to downsample 4x at the first stage
inner_patches (int) – The inner patches within a token Defaults to 4
out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.
drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.
drop_path_rate (float) – stochastic depth rate. Defaults to 0.
norm_cfg (dict) – Config dict for normalization layer. Defaults to
dict(type='LN')
.ape (bool) – the absolute position embedding
rpe (bool) – the relative position embedding Defaults to False
layer_scale_init_value (float) – the layer scale init value
mask_ratio (bool) – The ratio of total number of patches to be masked. Defaults to 0.75.
init_cfg (Union[List[dict], dict], optional) – Initialization config dict. Defaults to None.
- forward(x, mask=True)[source]¶
Generate features for masked images.
The function supports two kind of forward behaviors. If the
mask
isTrue
, the function will generate mask to masking some patches randomly and get the hidden features for visible patches, which means the function will be executed as masked imagemodeling pre-training; if themask
isNone
orFalse
, the forward function will callsuper().forward()
, which extract features from images without mask.- Parameters:
x (torch.Tensor) – Input images, which is of shape B x C x H x W.
mask (bool, optional) – To indicate whether the forward function generating
mask
or not.
- Returns:
Hidden features, mask and the ids to restore original image.
x
(torch.Tensor): hidden features, which is of shape B x (L * mask_ratio) x C.mask
(torch.Tensor): mask used to mask image.ids_restore
(torch.Tensor): ids to restore original image.
- Return type:
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]
- masking_id(batch_size, mask_ratio)[source]¶
Generate the mask for MAE Pre-training.
- Parameters:
batch_size – The batch size of input data
mask_ratio – The mask ratio of total patches. Defaults to 0.75.
- Returns:
the ids for the tokens retained, the ids to restore original image, and the mask
- Return type:
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]