PatchMerging¶
- class mmpretrain.models.utils.PatchMerging(in_channels, out_channels, kernel_size=2, stride=None, padding='corner', dilation=1, bias=False, norm_cfg={'type': 'LN'}, use_post_norm=False, init_cfg=None)[source]¶
Merge patch feature map.
Modified from mmcv, and this module supports specifying whether to use post-norm.
This layer groups feature map by kernel_size, and applies norm and linear layers to the grouped feature map ((used in Swin Transformer)). Our implementation uses
torch.nn.Unfold
to merge patches, which is about 25% faster than the original implementation. However, we need to modify pretrained models for compatibility.- Parameters:
in_channels (int) – The num of input channels. To gets fully covered by filter and stride you specified.
out_channels (int) – The num of output channels.
kernel_size (int | tuple, optional) – the kernel size in the unfold layer. Defaults to 2.
stride (int | tuple, optional) – the stride of the sliding blocks in the unfold layer. Defaults to None, which means to be set as
kernel_size
.padding (int | tuple | string) – The padding length of embedding conv. When it is a string, it means the mode of adaptive padding, support “same” and “corner” now. Defaults to “corner”.
dilation (int | tuple, optional) – dilation parameter in the unfold layer. Defaults to 1.
bias (bool, optional) – Whether to add bias in linear layer or not. Defaults to False.
norm_cfg (dict, optional) – Config dict for normalization layer. Defaults to
dict(type='LN')
.use_post_norm (bool) – Whether to use post normalization here. Defaults to False.
init_cfg (dict, optional) – The extra config for initialization. Defaults to None.