class mmcls.models.SwinTransformer(arch='tiny', img_size=224, patch_size=4, in_channels=3, window_size=7, drop_rate=0.0, drop_path_rate=0.1, out_indices=(3,), out_after_downsample=False, use_abs_pos_embed=False, interpolate_mode='bicubic', with_cp=False, frozen_stages=- 1, norm_eval=False, pad_small_map=False, norm_cfg={'type': 'LN'}, stage_cfgs={}, patch_cfg={}, init_cfg=None)[source]

Swin Transformer.

A PyTorch implement of : Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Inspiration from

  • arch (str | dict) –

    Swin Transformer architecture. If use string, choose from ‘tiny’, ‘small’, ‘base’ and ‘large’. If use dict, it should have below keys:

    • embed_dims (int): The dimensions of embedding.

    • depths (List[int]): The number of blocks in each stage.

    • num_heads (List[int]): The number of heads in attention modules of each stage.

    Defaults to ‘tiny’.

  • img_size (int | tuple) – The expected input image shape. Because we support dynamic input shape, just set the argument to the most common input image shape. Defaults to 224.

  • patch_size (int | tuple) – The patch size in patch embedding. Defaults to 4.

  • in_channels (int) – The num of input channels. Defaults to 3.

  • window_size (int) – The height and width of the window. Defaults to 7.

  • drop_rate (float) – Dropout rate after embedding. Defaults to 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults to 0.1.

  • out_after_downsample (bool) – Whether to output the feature map of a stage after the following downsample layer. Defaults to False.

  • use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults to False.

  • interpolate_mode (str) – Select the interpolate mode for absolute position embeding vector resize. Defaults to “bicubic”.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • pad_small_map (bool) – If True, pad the small feature map to the window size, which is common used in detection and segmentation. If False, avoid shifting window and shrink the window size to the size of feature map, which is common used in classification. Defaults to False.

  • norm_cfg (dict) – Config dict for normalization layer for all output features. Defaults to dict(type='LN')

  • stage_cfgs (Sequence[dict] | dict) – Extra config dict for each stage. Defaults to an empty dict.

  • patch_cfg (dict) – Extra config dict for patch embedding. Defaults to an empty dict.

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.


>>> from mmcls.models import SwinTransformer
>>> import torch
>>> extra_config = dict(
>>>     arch='tiny',
>>>     stage_cfgs=dict(downsample_cfg={'kernel_size': 3,
>>>                                     'expansion_ratio': 3}))
>>> self = SwinTransformer(**extra_config)
>>> inputs = torch.rand(1, 3, 224, 224)
>>> output = self.forward(inputs)
>>> print(output.shape)
(1, 2592, 4)

Forward computation.


x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.


Initialize the weights.


Set module status before forward computation.


mode (bool) – Whether it is train_mode or test_mode

Read the Docs v: latest
On Read the Docs
Project Home

Free document hosting provided by Read the Docs.