Shortcuts

VisionTransformerClsHead

class mmcls.models.VisionTransformerClsHead(num_classes, in_channels, hidden_dim=None, act_cfg={'type': 'Tanh'}, init_cfg={'layer': 'Linear', 'type': 'Constant', 'val': 0}, *args, **kwargs)[source]

Vision Transformer classifier head.

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • hidden_dim (int) – Number of the dimensions for hidden layer. Defaults to None, which means no extra hidden layer.

  • act_cfg (dict) – The activation config. Only available during pre-training. Defaults to dict(type='Tanh').

  • init_cfg (dict) – The extra initialization configs. Defaults to dict(type='Constant', layer='Linear', val=0).

init_weights()[source]

Initialize the weights.

simple_test(x, softmax=True, post_process=True)[source]

Inference without augmentation.

Parameters
  • x (tuple[tuple[tensor, tensor]]) – The input features. Multi-stage inputs are acceptable but only the last stage will be used to classify. Every item should be a tuple which includes patch token and cls token. The cls token will be used to classify and the shape of it should be (num_samples, in_channels).

  • softmax (bool) – Whether to softmax the classification score.

  • post_process (bool) – Whether to do post processing the inference results. It will convert the output to a list.

Returns

The inference results.

  • If no post processing, the output is a tensor with shape (num_samples, num_classes).

  • If post processing, the output is a multi-dimentional list of float and the dimensions are (num_samples, num_classes).

Return type

Tensor | list

Read the Docs v: latest
Versions
master
latest
1.x
dev-1.x
Downloads
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.