PatchMerging¶

class mmpretrain.models.utils.PatchMerging(in_channels, out_channels, kernel_size=2, stride=None, padding='corner', dilation=1, bias=False, norm_cfg={'type': 'LN'}, use_post_norm=False, init_cfg=None)[source]¶

Merge patch feature map.

Modified from mmcv, and this module supports specifying whether to use post-norm.

This layer groups feature map by kernel_size, and applies norm and linear layers to the grouped feature map ((used in Swin Transformer)). Our implementation uses torch.nn.Unfold to merge patches, which is about 25% faster than the original implementation. However, we need to modify pretrained models for compatibility.

Parameters:

in_channels (int) – The num of input channels. To gets fully covered by filter and stride you specified.
out_channels (int) – The num of output channels.
kernel_size (int | tuple, optional) – the kernel size in the unfold layer. Defaults to 2.
stride (int | tuple, optional) – the stride of the sliding blocks in the unfold layer. Defaults to None, which means to be set as kernel_size.
padding (int | tuple | string) – The padding length of embedding conv. When it is a string, it means the mode of adaptive padding, support “same” and “corner” now. Defaults to “corner”.
dilation (int | tuple, optional) – dilation parameter in the unfold layer. Defaults to 1.
bias (bool, optional) – Whether to add bias in linear layer or not. Defaults to False.
norm_cfg (dict, optional) – Config dict for normalization layer. Defaults to dict(type='LN').
use_post_norm (bool) – Whether to use post normalization here. Defaults to False.
init_cfg (dict, optional) – The extra config for initialization. Defaults to None.

forward(x, input_size)[source]¶

Parameters:

x (Tensor) – Has shape (B, H*W, C_in).
input_size (tuple[int]) – The spatial shape of x, arrange as (H, W). Default: None.

Returns:

Contains merged results and its spatial shape.

x (Tensor): Has shape (B, Merged_H * Merged_W, C_out)
out_size (tuple[int]): Spatial shape of x, arrange as (Merged_H, Merged_W).

Return type:

tuple