Shortcuts

DeiTClsHead

class mmcls.models.DeiTClsHead(*args, **kwargs)[source]

Distilled Vision Transformer classifier head.

Comparing with the VisionTransformerClsHead, this head adds an extra linear layer to handle the dist token. The final classification score is the average of both linear transformation results of cls_token and dist_token.

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • hidden_dim (int) – Number of the dimensions for hidden layer. Defaults to None, which means no extra hidden layer.

  • act_cfg (dict) – The activation config. Only available during pre-training. Defaults to dict(type='Tanh').

  • init_cfg (dict) – The extra initialization configs. Defaults to dict(type='Constant', layer='Linear', val=0).

simple_test(x, softmax=True, post_process=True)[source]

Inference without augmentation.

Parameters
  • x (tuple[tuple[tensor, tensor, tensor]]) – The input features. Multi-stage inputs are acceptable but only the last stage will be used to classify. Every item should be a tuple which includes patch token, cls token and dist token. The cls token and dist token will be used to classify and the shape of them should be (num_samples, in_channels).

  • softmax (bool) – Whether to softmax the classification score.

  • post_process (bool) – Whether to do post processing the inference results. It will convert the output to a list.

Returns

The inference results.

  • If no post processing, the output is a tensor with shape (num_samples, num_classes).

  • If post processing, the output is a multi-dimentional list of float and the dimensions are (num_samples, num_classes).

Return type

Tensor | list

Read the Docs v: latest
Versions
master
latest
1.x
dev-1.x
Downloads
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.