Shortcuts

mmcls.apis

mmcls.apis.inference_model(model, img)[source]

Inference image(s) with the classifier.

Parameters
  • model (nn.Module) – The loaded classifier.

  • img (str/ndarray) – The image filename or loaded image.

Returns

The classification results that contains

class_name, pred_label and pred_score.

Return type

result (dict)

mmcls.apis.init_model(config, checkpoint=None, device='cuda:0', options=None)[source]

Initialize a classifier from config file.

Parameters
  • config (str or mmcv.Config) – Config file path or the config object.

  • checkpoint (str, optional) – Checkpoint path. If left as None, the model will not load any weights.

  • options (dict) – Options to override some settings in the used config.

Returns

The constructed classifier.

Return type

nn.Module

mmcls.apis.multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False)[source]

Test model with multiple gpus.

This method tests model with multiple gpus and collects the results under two different modes: gpu and cpu modes. By setting ‘gpu_collect=True’ it encodes results to gpu tensors and use gpu communication for results collection. On cpu mode it saves the results on different gpus to ‘tmpdir’ and collects them by the rank 0 worker.

Parameters
  • model (nn.Module) – Model to be tested.

  • data_loader (nn.Dataloader) – Pytorch data loader.

  • tmpdir (str) – Path of directory to save the temporary results from different gpus under cpu mode.

  • gpu_collect (bool) – Option to use either gpu or cpu to collect results.

Returns

The prediction results.

Return type

list

mmcls.apis.set_random_seed(seed, deterministic=False)[source]

Set random seed.

Parameters
  • seed (int) – Seed to be used.

  • deterministic (bool) – Whether to set the deterministic option for CUDNN backend, i.e., set torch.backends.cudnn.deterministic to True and torch.backends.cudnn.benchmark to False. Default: False.

mmcls.apis.show_result_pyplot(model, img, result, fig_size=(15, 10), title='result', wait_time=0)[source]

Visualize the classification results on the image.

Parameters
  • model (nn.Module) – The loaded classifier.

  • img (str or np.ndarray) – Image filename or loaded image.

  • result (list) – The classification result.

  • fig_size (tuple) – Figure size of the pyplot figure. Defaults to (15, 10).

  • title (str) – Title of the pyplot figure. Defaults to ‘result’.

  • wait_time (int) – How many seconds to display the image. Defaults to 0.

mmcls.core

evaluation

class mmcls.core.evaluation.DistEvalHook(dataloader, interval=1, gpu_collect=False, by_epoch=True, **eval_kwargs)[source]

Distributed evaluation hook.

Parameters
  • dataloader (DataLoader) – A PyTorch dataloader.

  • interval (int) – Evaluation interval (by epochs). Default: 1.

  • tmpdir (str, optional) – Temporary directory to save the results of all processes. Default: None.

  • gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.

class mmcls.core.evaluation.EvalHook(dataloader, interval=1, by_epoch=True, **eval_kwargs)[source]

Evaluation hook.

Parameters
  • dataloader (DataLoader) – A PyTorch dataloader.

  • interval (int) – Evaluation interval (by epochs). Default: 1.

mmcls.core.evaluation.average_performance(pred, target, thr=None, k=None)[source]

Calculate CP, CR, CF1, OP, OR, OF1, where C stands for per-class average, O stands for overall average, P stands for precision, R stands for recall and F1 stands for F1-score.

Parameters
  • pred (torch.Tensor | np.ndarray) – The model prediction with shape (N, C), where C is the number of classes.

  • target (torch.Tensor | np.ndarray) – The target of each prediction with shape (N, C), where C is the number of classes. 1 stands for positive examples, 0 stands for negative examples and -1 stands for difficult examples.

  • thr (float) – The confidence threshold. Defaults to None.

  • k (int) – Top-k performance. Note that if thr and k are both given, k will be ignored. Defaults to None.

Returns

(CP, CR, CF1, OP, OR, OF1)

Return type

tuple

mmcls.core.evaluation.average_precision(pred, target)[source]

Calculate the average precision for a single class.

AP summarizes a precision-recall curve as the weighted mean of maximum precisions obtained for any r’>r, where r is the recall:

\[\text{AP} = \sum_n (R_n - R_{n-1}) P_n\]

Note that no approximation is involved since the curve is piecewise constant.

Parameters
  • pred (np.ndarray) – The model prediction with shape (N, ).

  • target (np.ndarray) – The target of each prediction with shape (N, ).

Returns

a single float as average precision value.

Return type

float

mmcls.core.evaluation.calculate_confusion_matrix(pred, target)[source]

Calculate confusion matrix according to the prediction and target.

Parameters
  • pred (torch.Tensor | np.array) – The model prediction with shape (N, C).

  • target (torch.Tensor | np.array) – The target of each prediction with shape (N, 1) or (N,).

Returns

Confusion matrix

The shape is (C, C), where C is the number of classes.

Return type

torch.Tensor

mmcls.core.evaluation.f1_score(pred, target, average_mode='macro', thrs=0.0)[source]

Calculate F1 score according to the prediction and target.

Parameters
  • pred (torch.Tensor | np.array) – The model prediction with shape (N, C).

  • target (torch.Tensor | np.array) – The target of each prediction with shape (N, 1) or (N,).

  • average_mode (str) – The type of averaging performed on the result. Options are ‘macro’ and ‘none’. If ‘none’, the scores for each class are returned. If ‘macro’, calculate metrics for each class, and find their unweighted mean. Defaults to ‘macro’.

  • thrs (Number | tuple[Number], optional) – Predictions with scores under the thresholds are considered negative. Default to 0.

Returns

F1 score.

Return type

float | np.array | list[float | np.array]

Args

thrs is number

thrs is tuple

average_mode = “macro”

float

list[float]

average_mode = “none”

np.array

list[np.array]

mmcls.core.evaluation.mAP(pred, target)[source]

Calculate the mean average precision with respect of classes.

Parameters
  • pred (torch.Tensor | np.ndarray) – The model prediction with shape (N, C), where C is the number of classes.

  • target (torch.Tensor | np.ndarray) – The target of each prediction with shape (N, C), where C is the number of classes. 1 stands for positive examples, 0 stands for negative examples and -1 stands for difficult examples.

Returns

A single float as mAP value.

Return type

float

mmcls.core.evaluation.precision(pred, target, average_mode='macro', thrs=0.0)[source]

Calculate precision according to the prediction and target.

Parameters
  • pred (torch.Tensor | np.array) – The model prediction with shape (N, C).

  • target (torch.Tensor | np.array) – The target of each prediction with shape (N, 1) or (N,).

  • average_mode (str) – The type of averaging performed on the result. Options are ‘macro’ and ‘none’. If ‘none’, the scores for each class are returned. If ‘macro’, calculate metrics for each class, and find their unweighted mean. Defaults to ‘macro’.

  • thrs (Number | tuple[Number], optional) – Predictions with scores under the thresholds are considered negative. Default to 0.

Returns

Precision.

Return type

float | np.array | list[float | np.array]

Args

thrs is number

thrs is tuple

average_mode = “macro”

float

list[float]

average_mode = “none”

np.array

list[np.array]

mmcls.core.evaluation.precision_recall_f1(pred, target, average_mode='macro', thrs=0.0)[source]

Calculate precision, recall and f1 score according to the prediction and target.

Parameters
  • pred (torch.Tensor | np.array) – The model prediction with shape (N, C).

  • target (torch.Tensor | np.array) – The target of each prediction with shape (N, 1) or (N,).

  • average_mode (str) – The type of averaging performed on the result. Options are ‘macro’ and ‘none’. If ‘none’, the scores for each class are returned. If ‘macro’, calculate metrics for each class, and find their unweighted mean. Defaults to ‘macro’.

  • thrs (Number | tuple[Number], optional) – Predictions with scores under the thresholds are considered negative. Default to 0.

Returns

tuple containing precision, recall, f1 score.

The type of precision, recall, f1 score is one of the following:

Args

thrs is number

thrs is tuple

average_mode = “macro”

float

list[float]

average_mode = “none”

np.array

list[np.array]

Return type

tuple

mmcls.core.evaluation.recall(pred, target, average_mode='macro', thrs=0.0)[source]

Calculate recall according to the prediction and target.

Parameters
  • pred (torch.Tensor | np.array) – The model prediction with shape (N, C).

  • target (torch.Tensor | np.array) – The target of each prediction with shape (N, 1) or (N,).

  • average_mode (str) – The type of averaging performed on the result. Options are ‘macro’ and ‘none’. If ‘none’, the scores for each class are returned. If ‘macro’, calculate metrics for each class, and find their unweighted mean. Defaults to ‘macro’.

  • thrs (Number | tuple[Number], optional) – Predictions with scores under the thresholds are considered negative. Default to 0.

Returns

Recall.

Return type

float | np.array | list[float | np.array]

Args

thrs is number

thrs is tuple

average_mode = “macro”

float

list[float]

average_mode = “none”

np.array

list[np.array]

mmcls.core.evaluation.support(pred, target, average_mode='macro')[source]

Calculate the total number of occurrences of each label according to the prediction and target.

Parameters
  • pred (torch.Tensor | np.array) – The model prediction with shape (N, C).

  • target (torch.Tensor | np.array) – The target of each prediction with shape (N, 1) or (N,).

  • average_mode (str) – The type of averaging performed on the result. Options are ‘macro’ and ‘none’. If ‘none’, the scores for each class are returned. If ‘macro’, calculate metrics for each class, and find their unweighted sum. Defaults to ‘macro’.

Returns

Support.

  • If the average_mode is set to macro, the function returns a single float.

  • If the average_mode is set to none, the function returns a np.array with shape C.

Return type

float | np.array

mmcls.models

models

mmcls.models.build_backbone(cfg)[source]

Build backbone.

mmcls.models.build_head(cfg)[source]

Build head.

mmcls.models.build_loss(cfg)[source]

Build loss.

mmcls.models.build_neck(cfg)[source]

Build neck.

classifiers

class mmcls.models.classifiers.BaseClassifier(init_cfg=None)[source]

Base class for classifiers.

forward(img, return_loss=True, **kwargs)[source]

Calls either forward_train or forward_test depending on whether return_loss=True.

Note this setting will change the expected inputs. When return_loss=True, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when resturn_loss=False, img and img_meta should be double nested (i.e. List[Tensor], List[List[dict]]), with the outer list indicating test time augmentations.

forward_test(imgs, **kwargs)[source]
Parameters

imgs (List[Tensor]) – the outer list indicates test-time augmentations and inner Tensor should have a shape NxCxHxW, which contains all images in the batch.

abstract forward_train(imgs, **kwargs)[source]
Parameters
  • img (list[Tensor]) – List of tensors of shape (1, C, H, W). Typically these should be mean centered and std scaled.

  • kwargs (keyword arguments) – Specific to concrete implementation.

show_result(img, result, text_color='white', font_scale=0.5, row_width=20, show=False, fig_size=(15, 10), win_name='', wait_time=0, out_file=None)[source]

Draw result over img.

Parameters
  • img (str or ndarray) – The image to be displayed.

  • result (dict) – The classification results to draw over img.

  • text_color (str or tuple or Color) – Color of texts.

  • font_scale (float) – Font scales of texts.

  • row_width (int) – width between each row of results on the image.

  • show (bool) – Whether to show the image. Default: False.

  • fig_size (tuple) – Image show figure size. Defaults to (15, 10).

  • win_name (str) – The window name.

  • wait_time (int) – How many seconds to display the image. Defaults to 0.

  • out_file (str or None) – The filename to write the image. Default: None.

Returns

Image with overlaid results.

Return type

img (ndarray)

train_step(data, optimizer=None, **kwargs)[source]

The iteration step during training.

This method defines an iteration step during training, except for the back propagation and optimizer updating, which are done in an optimizer hook. Note that in some complicated cases or models, the whole process including back propagation and optimizer updating are also defined in this method, such as GAN.

Parameters
  • data (dict) – The output of dataloader.

  • optimizer (torch.optim.Optimizer | dict, optional) – The optimizer of runner is passed to train_step(). This argument is unused and reserved.

Returns

Dict of outputs. The following fields are contained.
  • loss (torch.Tensor): A tensor for back propagation, which can be a weighted sum of multiple losses.

  • log_vars (dict): Dict contains all the variables to be sent to the logger.

  • num_samples (int): Indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.

Return type

dict

val_step(data, optimizer=None, **kwargs)[source]

The iteration step during validation.

This method shares the same signature as train_step(), but used during val epochs. Note that the evaluation after training epochs is not implemented with this method, but an evaluation hook.

Parameters
  • data (dict) – The output of dataloader.

  • optimizer (torch.optim.Optimizer | dict, optional) – The optimizer of runner is passed to train_step(). This argument is unused and reserved.

Returns

Dict of outputs. The following fields are contained.
  • loss (torch.Tensor): A tensor for back propagation, which can be a weighted sum of multiple losses.

  • log_vars (dict): Dict contains all the variables to be sent to the logger.

  • num_samples (int): Indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.

Return type

dict

class mmcls.models.classifiers.ImageClassifier(backbone, neck=None, head=None, pretrained=None, train_cfg=None, init_cfg=None)[source]
extract_feat(img)[source]

Directly extract features from the backbone + neck.

forward_train(img, gt_label, **kwargs)[source]

Forward computation during training.

Parameters
  • img (Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.

  • gt_label (Tensor) – It should be of shape (N, 1) encoding the ground-truth label of input images for single label task. It shoulf be of shape (N, C) encoding the ground-truth label of input images for multi-labels task.

Returns

a dictionary of loss components

Return type

dict[str, Tensor]

simple_test(img, img_metas)[source]

Test without augmentation.

backbones

class mmcls.models.backbones.AlexNet(num_classes=- 1)[source]

AlexNet backbone.

The input for AlexNet is a 224x224 RGB image.

Parameters

num_classes (int) – number of classes for classification. The default value is -1, which uses the backbone as a feature extractor without the top classifier.

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

class mmcls.models.backbones.LeNet5(num_classes=- 1)[source]

LeNet5 backbone.

The input for LeNet-5 is a 32×32 grayscale image.

Parameters

num_classes (int) – number of classes for classification. The default value is -1, which uses the backbone as a feature extractor without the top classifier.

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

class mmcls.models.backbones.MlpMixer(arch='b', img_size=224, patch_size=16, out_indices=- 1, drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_cfg={}, layer_cfgs={}, init_cfg=None)[source]

Mlp-Mixer backbone.

Pytorch implementation of MLP-Mixer: An all-MLP Architecture for Vision

Parameters
  • arch (str | dict) – MLP Mixer architecture Defaults to ‘b’.

  • img_size (int | tuple) – Input image size.

  • patch_size (int | tuple) – The patch size.

  • out_indices (Sequence | int) – Output from which layer. Defaults to -1, means the last layer.

  • drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.

  • drop_path_rate (float) – stochastic depth rate. Defaults to 0.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • act_cfg (dict) – The activation config for FFNs. Default GELU.

  • patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.

  • layer_cfgs (Sequence | dict) – Configs of each mixer block layer. Defaults to an empty dict.

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

class mmcls.models.backbones.MobileNetV2(widen_factor=1.0, out_indices=(7), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]

MobileNetV2 backbone.

Parameters
  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.

  • out_indices (None or Sequence[int]) – Output from which stages. Default: (7, ).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

make_layer(out_channels, num_blocks, stride, expand_ratio)[source]

Stack InvertedResidual blocks to build a layer for MobileNetV2.

Parameters
  • out_channels (int) – out_channels of block.

  • num_blocks (int) – number of blocks.

  • stride (int) – stride of the first block. Default: 1

  • expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio. Default: 6.

train(mode=True)[source]

Set module status before forward computation.

Parameters

mode (bool) – Whether it is train_mode or test_mode

class mmcls.models.backbones.MobileNetV3(arch='small', conv_cfg=None, norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN'}, out_indices=None, frozen_stages=- 1, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d'], 'nonlinearity': 'leaky_relu'}, {'type': 'Normal', 'layer': ['Linear'], 'std': 0.01}, {'type': 'Constant', 'layer': ['BatchNorm2d'], 'val': 1}])[source]

MobileNetV3 backbone.

Parameters
  • arch (str) – Architecture of mobilnetv3, from {small, large}. Default: small.

  • conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • out_indices (None or Sequence[int]) – Output from which stages. Default: None, which means output tensors from final stage.

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

train(mode=True)[source]

Set module status before forward computation.

Parameters

mode (bool) – Whether it is train_mode or test_mode

class mmcls.models.backbones.RegNet(arch, in_channels=3, stem_channels=32, base_channels=32, strides=(2, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, init_cfg=None)[source]

RegNet backbone.

More details can be found in paper .

Parameters
  • arch (dict) – The parameter of RegNets. - w0 (int): initial width - wa (float): slope of width - wm (float): quantization parameter to quantize the width - depth (int): depth of the backbone - group_w (int): width of group - bot_mul (float): bottleneck ratio, i.e. expansion of bottleneck.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • base_channels (int) – Base channels after stem layer.

  • in_channels (int) – Number of input image channels. Default: 3.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: “pytorch”.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters. Default: -1.

  • norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

Example

>>> from mmcls.models import RegNet
>>> import torch
>>> self = RegNet(
        arch=dict(
            w0=88,
            wa=26.31,
            wm=2.25,
            group_w=48,
            depth=25,
            bot_mul=1.0))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 96, 8, 8)
(1, 192, 4, 4)
(1, 432, 2, 2)
(1, 1008, 1, 1)
adjust_width_group(widths, bottleneck_ratio, groups)[source]

Adjusts the compatibility of widths and groups.

Parameters
  • widths (list[int]) – Width of each stage.

  • bottleneck_ratio (float) – Bottleneck ratio.

  • groups (int) – number of groups in each stage

Returns

The adjusted widths and groups of each stage.

Return type

tuple(list)

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

generate_regnet(initial_width, width_slope, width_parameter, depth, divisor=8)[source]

Generates per block width from RegNet parameters.

Parameters
  • initial_width ([int]) – Initial width of the backbone

  • width_slope ([float]) – Slope of the quantized linear function

  • width_parameter ([int]) – Parameter used to quantize the width.

  • depth ([int]) – Depth of the backbone.

  • divisor (int) – The divisor of channels. Defaults to 8.

Returns

tuple containing:
  • list: Widths of each stage.

  • int: The number of stages.

Return type

tuple

get_stages_from_blocks(widths)[source]

Gets widths/stage_blocks of network at each stage.

Parameters

widths (list[int]) – Width in each stage.

Returns

width and depth of each stage

Return type

tuple(list)

static quantize_float(number, divisor)[source]

Converts a float to closest non-zero int divisible by divior.

Parameters
  • number (int) – Original number to be quantized.

  • divisor (int) – Divisor used to quantize the number.

Returns

quantized number that is divisible by devisor.

Return type

int

class mmcls.models.backbones.RepVGG(arch, in_channels=3, base_channels=64, out_indices=(3), strides=(2, 2, 2, 2), dilations=(1, 1, 1, 1), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, with_cp=False, deploy=False, norm_eval=False, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]

RepVGG backbone.

A PyTorch impl of : RepVGG: Making VGG-style ConvNets Great Again

Parameters
  • arch (str | dict) –

    The parameter of RepVGG. If it’s a dict, it should contain the following keys:

    • num_blocks (Sequence[int]): Number of blocks in each stage.

    • width_factor (Sequence[float]): Width deflator in each stage.

    • group_layer_map (dict | None): RepVGG Block that declares the need to apply group convolution.

    • se_cfg (dict | None): Se Layer config

  • in_channels (int) – Number of input image channels. Default: 3.

  • base_channels (int) – Base channels of RepVGG backbone, work with width_factor together. Default: 64.

  • out_indices (Sequence[int]) – Output from which stages. Default: (3, ).

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (2, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • deploy (bool) – Whether to switch the model structure to deployment mode. Default: False.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • init_cfg (dict or list[dict], optional) – Initialization config dict.

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

train(mode=True)[source]

Set module status before forward computation.

Parameters

mode (bool) – Whether it is train_mode or test_mode

class mmcls.models.backbones.Res2Net(scales=4, base_width=26, style='pytorch', deep_stem=True, avg_down=True, init_cfg=None, **kwargs)[source]

Res2Net backbone.

A PyTorch implement of : Res2Net: A New Multi-scale Backbone Architecture

Parameters
  • depth (int) – Depth of Res2Net, choose from {50, 101, 152}.

  • scales (int) – Scales used in Res2Net. Defaults to 4.

  • base_width (int) – Basic width of each scale. Defaults to 26.

  • in_channels (int) – Number of input image channels. Defaults to 3.

  • num_stages (int) – Number of Res2Net stages. Defaults to 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Defaults to (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Defaults to (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. Defaults to (3, ).

  • style (str) – “pytorch” or “caffe”. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Defaults to “pytorch”.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Defaults to True.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottle2neck. Defaults to True.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type='BN', requires_grad=True).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Defaults to True.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

Example

>>> from mmcls.models import Res2Net
>>> import torch
>>> model = Res2Net(depth=50,
...                 scales=4,
...                 base_width=26,
...                 out_indices=(0, 1, 2, 3))
>>> model.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = model.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 8, 8)
(1, 512, 4, 4)
(1, 1024, 2, 2)
(1, 2048, 1, 1)
class mmcls.models.backbones.ResNeSt(depth, groups=1, width_per_group=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs)[source]

ResNeSt backbone.

Please refer to the paper for details.

Parameters
  • depth (int) – Network depth, from {50, 101, 152, 200}.

  • groups (int) – Groups of conv2 in Bottleneck. Default: 32.

  • width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.

  • radix (int) – Radix of SpltAtConv2d. Default: 2

  • reduction_factor (int) – Reduction factor of SplitAttentionConv2d. Default: 4.

  • avg_down_stride (bool) – Whether to use average pool for stride in Bottleneck. Default: True.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Output channels of the stem layer. Default: 64.

  • num_stages (int) – Stages of the network. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

class mmcls.models.backbones.ResNeXt(depth, groups=32, width_per_group=4, **kwargs)[source]

ResNeXt backbone.

Please refer to the paper for details.

Parameters
  • depth (int) – Network depth, from {50, 101, 152}.

  • groups (int) – Groups of conv2 in Bottleneck. Default: 32.

  • width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Output channels of the stem layer. Default: 64.

  • num_stages (int) – Stages of the network. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

class mmcls.models.backbones.ResNet(depth, in_channels=3, stem_channels=64, base_channels=64, expansion=None, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]

ResNet backbone.

Please refer to the paper for details.

Parameters
  • depth (int) – Network depth, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Output channels of the stem layer. Default: 64.

  • base_channels (int) – Middle channels of the first stage. Default: 64.

  • num_stages (int) – Stages of the network. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. Default: (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

Example

>>> from mmcls.models import ResNet
>>> import torch
>>> self = ResNet(depth=18)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 8, 8)
(1, 128, 4, 4)
(1, 256, 2, 2)
(1, 512, 1, 1)
forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

init_weights()[source]

Initialize the weights.

train(mode=True)[source]

Set module status before forward computation.

Parameters

mode (bool) – Whether it is train_mode or test_mode

class mmcls.models.backbones.ResNetV1d(**kwargs)[source]

ResNetV1d backbone.

This variant is described in Bag of Tricks..

Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.

class mmcls.models.backbones.ResNet_CIFAR(depth, deep_stem=False, **kwargs)[source]

ResNet backbone for CIFAR.

Compared to standard ResNet, it uses kernel_size=3 and stride=1 in conv1, and does not apply MaxPoolinng after stem. It has been proven to be more efficient than standard ResNet in other public codebase, e.g., https://github.com/kuangliu/pytorch-cifar/blob/master/models/resnet.py.

Parameters
  • depth (int) – Network depth, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Output channels of the stem layer. Default: 64.

  • base_channels (int) – Middle channels of the first stage. Default: 64.

  • num_stages (int) – Stages of the network. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – This network has specific designed stem, thus it is asserted to be False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

class mmcls.models.backbones.SEResNeXt(depth, groups=32, width_per_group=4, **kwargs)[source]

SEResNeXt backbone.

Please refer to the paper for details.

Parameters
  • depth (int) – Network depth, from {50, 101, 152}.

  • groups (int) – Groups of conv2 in Bottleneck. Default: 32.

  • width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.

  • se_ratio (int) – Squeeze ratio in SELayer. Default: 16.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Output channels of the stem layer. Default: 64.

  • num_stages (int) – Stages of the network. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

class mmcls.models.backbones.SEResNet(depth, se_ratio=16, **kwargs)[source]

SEResNet backbone.

Please refer to the paper for details.

Parameters
  • depth (int) – Network depth, from {50, 101, 152}.

  • se_ratio (int) – Squeeze ratio in SELayer. Default: 16.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Output channels of the stem layer. Default: 64.

  • num_stages (int) – Stages of the network. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

Example

>>> from mmcls.models import SEResNet
>>> import torch
>>> self = SEResNet(depth=50)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 224, 224)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 56, 56)
(1, 128, 28, 28)
(1, 256, 14, 14)
(1, 512, 7, 7)
class mmcls.models.backbones.ShuffleNetV1(groups=3, widen_factor=1.0, out_indices=(2), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False, init_cfg=None)[source]

ShuffleNetV1 backbone.

Parameters
  • groups (int) – The number of groups to be used in grouped 1x1 convolutions in each ShuffleUnit. Default: 3.

  • widen_factor (float) – Width multiplier - adjusts the number of channels in each layer by this amount. Default: 1.0.

  • out_indices (Sequence[int]) – Output from which stages. Default: (2, )

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

init_weights()[source]

Initialize the weights.

make_layer(out_channels, num_blocks, first_block=False)[source]

Stack ShuffleUnit blocks to make a layer.

Parameters
  • out_channels (int) – out_channels of the block.

  • num_blocks (int) – Number of blocks.

  • first_block (bool) – Whether is the first ShuffleUnit of a sequential ShuffleUnits. Default: False, which means using the grouped 1x1 convolution.

train(mode=True)[source]

Set module status before forward computation.

Parameters

mode (bool) – Whether it is train_mode or test_mode

class mmcls.models.backbones.ShuffleNetV2(widen_factor=1.0, out_indices=(3), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False, init_cfg=None)[source]

ShuffleNetV2 backbone.

Parameters
  • widen_factor (float) – Width multiplier - adjusts the number of channels in each layer by this amount. Default: 1.0.

  • out_indices (Sequence[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

init_weights()[source]

Initialize the weights.

train(mode=True)[source]

Set module status before forward computation.

Parameters

mode (bool) – Whether it is train_mode or test_mode

class mmcls.models.backbones.SwinTransformer(arch='T', img_size=224, in_channels=3, drop_rate=0.0, drop_path_rate=0.1, out_indices=(3), use_abs_pos_embed=False, auto_pad=False, with_cp=False, norm_cfg={'type': 'LN'}, stage_cfgs={}, patch_cfg={}, init_cfg=None)[source]

Swin Transformer A PyTorch implement of : Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Inspiration from https://github.com/microsoft/Swin-Transformer

Parameters
  • arch (str | dict) – Swin Transformer architecture Defaults to ‘T’.

  • img_size (int | tuple) – The size of input image. Defaults to 224.

  • in_channels (int) – The num of input channels. Defaults to 3.

  • drop_rate (float) – Dropout rate after embedding. Defaults to 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults to 0.1.

  • use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults to False.

  • with_cp (bool, optional) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.

  • auto_pad (bool) – If True, auto pad feature map to fit window_size. Defaults to False.

  • norm_cfg (dict, optional) – Config dict for normalization layer at end of backone. Defaults to dict(type=’LN’)

  • stage_cfgs (Sequence | dict, optional) – Extra config dict for each stage. Defaults to empty dict.

  • patch_cfg (dict, optional) – Extra config dict for patch embedding. Defaults to empty dict.

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

Examples

>>> from mmcls.models import SwinTransformer
>>> import torch
>>> extra_config = dict(
>>>     arch='tiny',
>>>     stage_cfgs=dict(downsample_cfg={'kernel_size': 3,
>>>                                     'expansion_ratio': 3}),
>>>     auto_pad=True)
>>> self = SwinTransformer(**extra_config)
>>> inputs = torch.rand(1, 3, 224, 224)
>>> output = self.forward(inputs)
>>> print(output.shape)
(1, 2592, 4)
forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

init_weights()[source]

Initialize the weights.

class mmcls.models.backbones.T2T_ViT(img_size=224, in_channels=3, embed_dims=384, t2t_cfg={}, drop_rate=0.0, num_layers=14, out_indices=- 1, layer_cfgs={}, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, final_norm=True, output_cls_token=True, init_cfg=None)[source]

Tokens-to-Token Vision Transformer (T2T-ViT)

A PyTorch implementation of `Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet<https://arxiv.org/abs/2101.11986>`_

Parameters
  • img_size (int) – Input image size.

  • in_channels (int) – Number of input channels.

  • embed_dims (int) – Embedding dimension.

  • t2t_cfg (dict) – Extra config of Tokens-to-Token module. Defaults to an empty dict.

  • drop_rate (float) – Dropout rate after position embedding. Defaults to 0.

  • num_layers (int) – Num of transformer layers in encoder. Defaults to 14.

  • out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.

  • layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.

  • drop_path_rate (float) – stochastic depth rate. Defaults to 0.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.

  • output_cls_token (bool) – Whether output the cls_token. Defaults to True.

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

init_weights()[source]

Initialize the weights.

class mmcls.models.backbones.TIMMBackbone(model_name, pretrained=False, checkpoint_path='', in_channels=3, init_cfg=None, **kwargs)[source]

Wrapper to use backbones from timm library. More details can be found in timm .

Parameters
  • model_name (str) – Name of timm model to instantiate.

  • pretrained (bool) – Load pretrained weights if True.

  • checkpoint_path (str) – Path of checkpoint to load after model is initialized.

  • in_channels (int) – Number of input image channels. Default: 3.

  • init_cfg (dict, optional) – Initialization config dict

  • **kwargs – Other timm & model specific arguments.

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

class mmcls.models.backbones.TNT(arch='b', img_size=224, patch_size=16, in_channels=3, ffn_ratio=4, qkv_bias=False, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, first_stride=4, num_fcs=2, init_cfg=[{'type': 'TruncNormal', 'layer': 'Linear', 'std': 0.02}, {'type': 'Constant', 'layer': 'LayerNorm', 'val': 1.0, 'bias': 0.0}])[source]

Transformer in Transformer A PyTorch implement of : Transformer in Transformer

Inspiration from https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/tnt.py

Parameters
  • arch (str | dict) – Vision Transformer architecture Default: ‘b’

  • img_size (int | tuple) – Input image size. Default to 224

  • patch_size (int | tuple) – The patch size. Deault to 16

  • in_channels (int) – Number of input channels. Default to 3

  • ffn_ratio (int) – A ratio to calculate the hidden_dims in ffn layer. Default: 4

  • qkv_bias (bool) – Enable bias for qkv if True. Default False

  • drop_rate (float) – Probability of an element to be zeroed after the feed forward layer. Default 0.

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.

  • drop_path_rate (float) – stochastic depth rate. Default 0.

  • act_cfg (dict) – The activation config for FFNs. Defaults to GELU.

  • norm_cfg (dict) – Config dict for normalization layer. Default layer normalization

  • first_stride (int) – The stride of the conv2d layer. We use a conv2d layer and a unfold layer to implement image to pixel embedding.

  • num_fcs (int) – The number of fully-connected layers for FFNs. Default 2

  • init_cfg (dict, optional) – Initialization config dict

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

class mmcls.models.backbones.VGG(depth, num_classes=- 1, num_stages=5, dilations=(1, 1, 1, 1, 1), out_indices=None, frozen_stages=- 1, conv_cfg=None, norm_cfg=None, act_cfg={'type': 'ReLU'}, norm_eval=False, ceil_mode=False, with_last_pool=True, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1.0, 'layer': ['_BatchNorm']}, {'type': 'Normal', 'std': 0.01, 'layer': ['Linear']}])[source]

VGG backbone.

Parameters
  • depth (int) – Depth of vgg, from {11, 13, 16, 19}.

  • with_norm (bool) – Use BatchNorm or not.

  • num_classes (int) – number of classes for classification.

  • num_stages (int) – VGG stages, normally 5.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int], optional) – Output from which stages. When it is None, the default behavior depends on whether num_classes is specified. If num_classes <= 0, the default value is (4, ), output the last feature map before classifier. If num_classes > 0, the default value is (5, ), output the classification score. Default: None.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • ceil_mode (bool) – Whether to use ceil_mode of MaxPool. Default: False.

  • with_last_pool (bool) – Whether to keep the last pooling before classifier. Default: True.

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

train(mode=True)[source]

Set module status before forward computation.

Parameters

mode (bool) – Whether it is train_mode or test_mode

class mmcls.models.backbones.VisionTransformer(arch='b', img_size=224, patch_size=16, out_indices=- 1, drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'eps': 1e-06, 'type': 'LN'}, final_norm=True, output_cls_token=True, interpolate_mode='bicubic', patch_cfg={}, layer_cfgs={}, init_cfg=None)[source]

Vision Transformer.

A PyTorch implement of : `An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale<https://arxiv.org/abs/2010.11929>`_

Parameters
  • arch (str | dict) – Vision Transformer architecture Default: ‘b’

  • img_size (int | tuple) – Input image size

  • patch_size (int | tuple) – The patch size

  • out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.

  • drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.

  • drop_path_rate (float) – stochastic depth rate. Defaults to 0.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.

  • output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Defaults to True.

  • interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Defaults to “bicubic”.

  • patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.

  • layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(x)[source]

Forward computation.

Parameters

x (tensor | tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.

init_weights()[source]

Initialize the weights.

static resize_pos_embed(pos_embed, src_shape, dst_shape, mode='bicubic')[source]

Resize pos_embed weights.

Parameters
  • pos_embed (torch.Tensor) – Position embedding weights with shape [1, L, C].

  • src_shape (tuple) – The resolution of downsampled origin training image.

  • dst_shape (tuple) – The resolution of downsampled new training image.

  • mode (str) – Algorithm used for upsampling: 'nearest' | 'linear' | 'bilinear' | 'bicubic' | 'trilinear'. Default: 'bicubic'

Returns

The resized pos_embed of shape [1, L_new, C]

Return type

torch.Tensor

necks

class mmcls.models.necks.GlobalAveragePooling(dim=2)[source]

Global Average Pooling neck.

Note that we use view to remove extra channel after pooling. We do not use squeeze as it will also remove the batch dimension when the tensor has a batch dimension of size 1, which can lead to unexpected errors.

Parameters

dim (int) – Dimensions of each sample channel, can be one of {1, 2, 3}. Default: 2

forward(inputs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

heads

class mmcls.models.heads.ClsHead(loss={'loss_weight': 1.0, 'type': 'CrossEntropyLoss'}, topk=(1), cal_acc=False, init_cfg=None)[source]

classification head.

Parameters
  • loss (dict) – Config of classification loss.

  • topk (int | tuple) – Top-k accuracy.

  • cal_acc (bool) – Whether to calculate accuracy during training. If you use Mixup/CutMix or something like that during training, it is not reasonable to calculate accuracy. Defaults to False.

simple_test(cls_score)[source]

Test without augmentation.

class mmcls.models.heads.LinearClsHead(num_classes, in_channels, init_cfg={'layer': 'Linear', 'std': 0.01, 'type': 'Normal'}, *args, **kwargs)[source]

Linear classifier head.

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • init_cfg (dict | optional) – The extra init config of layers. Defaults to use dict(type=’Normal’, layer=’Linear’, std=0.01).

simple_test(x)[source]

Test without augmentation.

class mmcls.models.heads.MultiLabelClsHead(loss={'loss_weight': 1.0, 'reduction': 'mean', 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, init_cfg=None)[source]

Classification head for multilabel task.

Parameters

loss (dict) – Config of classification loss.

class mmcls.models.heads.MultiLabelLinearClsHead(num_classes, in_channels, loss={'loss_weight': 1.0, 'reduction': 'mean', 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, init_cfg={'layer': 'Linear', 'std': 0.01, 'type': 'Normal'})[source]

Linear classification head for multilabel task.

Parameters
  • num_classes (int) – Number of categories.

  • in_channels (int) – Number of channels in the input feature map.

  • loss (dict) – Config of classification loss.

  • init_cfg (dict | optional) – The extra init config of layers. Defaults to use dict(type=’Normal’, layer=’Linear’, std=0.01).

simple_test(x)[source]

Test without augmentation.

class mmcls.models.heads.StackedLinearClsHead(num_classes: int, in_channels: int, mid_channels: Sequence, dropout_rate: float = 0.0, norm_cfg: Optional[Dict] = None, act_cfg: Dict = {'type': 'ReLU'}, **kwargs)[source]

Classifier head with several hidden fc layer and a output fc layer.

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • mid_channels (Sequence) – Number of channels in the hidden fc layers.

  • dropout_rate (float) – Dropout rate after each hidden fc layer, except the last layer. Defaults to 0.

  • norm_cfg (dict, optional) – Config dict of normalization layer after each hidden fc layer, except the last layer. Defaults to None.

  • act_cfg (dict, optional) – Config dict of activation function after each hidden layer, except the last layer. Defaults to use “ReLU”.

init_weights()[source]

Initialize the weights.

simple_test(x)[source]

Test without augmentation.

class mmcls.models.heads.VisionTransformerClsHead(num_classes, in_channels, hidden_dim=None, act_cfg={'type': 'Tanh'}, init_cfg={'layer': 'Linear', 'type': 'Constant', 'val': 0}, *args, **kwargs)[source]

Vision Transformer classifier head.

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • hidden_dim (int) – Number of the dimensions for hidden layer. Only available during pre-training. Default None.

  • act_cfg (dict) – The activation config. Only available during pre-training. Defaults to Tanh.

init_weights()[source]

Initialize the weights.

simple_test(x)[source]

Test without augmentation.

losses

class mmcls.models.losses.Accuracy(topk=(1))[source]
forward(pred, target)[source]

Forward function to calculate accuracy.

Parameters
  • pred (torch.Tensor) – Prediction of models.

  • target (torch.Tensor) – Target for each prediction.

Returns

The accuracies under different topk criterions.

Return type

list[float]

class mmcls.models.losses.AsymmetricLoss(gamma_pos=0.0, gamma_neg=4.0, clip=0.05, reduction='mean', loss_weight=1.0)[source]

asymmetric loss.

Parameters
  • gamma_pos (float) – positive focusing parameter. Defaults to 0.0.

  • gamma_neg (float) – Negative focusing parameter. We usually set gamma_neg > gamma_pos. Defaults to 4.0.

  • clip (float, optional) – Probability margin. Defaults to 0.05.

  • reduction (str) – The method used to reduce the loss into a scalar.

  • loss_weight (float) – Weight of loss. Defaults to 1.0.

forward(pred, target, weight=None, avg_factor=None, reduction_override=None)[source]

asymmetric loss.

class mmcls.models.losses.CrossEntropyLoss(use_sigmoid=False, use_soft=False, reduction='mean', loss_weight=1.0, class_weight=None, pos_weight=None)[source]

Cross entropy loss.

Parameters
  • use_sigmoid (bool) – Whether the prediction uses sigmoid of softmax. Defaults to False.

  • use_soft (bool) – Whether to use the soft version of CrossEntropyLoss. Defaults to False.

  • reduction (str) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. Defaults to ‘mean’.

  • loss_weight (float) – Weight of the loss. Defaults to 1.0.

  • class_weight (List[float], optional) – The weight for each class with shape (C), C is the number of classes. Default None.

  • pos_weight (List[float], optional) – The positive weight for each class with shape (C), C is the number of classes. Only enabled in BCE loss when use_sigmoid is True. Default None.

forward(cls_score, label, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcls.models.losses.FocalLoss(gamma=2.0, alpha=0.25, reduction='mean', loss_weight=1.0)[source]

Focal loss.

Parameters
  • gamma (float) – Focusing parameter in focal loss. Defaults to 2.0.

  • alpha (float) – The parameter in balanced form of focal loss. Defaults to 0.25.

  • reduction (str) – The method used to reduce the loss into a scalar. Options are “none” and “mean”. Defaults to ‘mean’.

  • loss_weight (float) – Weight of loss. Defaults to 1.0.

forward(pred, target, weight=None, avg_factor=None, reduction_override=None)[source]

Sigmoid focal loss.

Parameters
  • pred (torch.Tensor) – The prediction with shape (N, *).

  • target (torch.Tensor) – The ground truth label of the prediction with shape (N, *), N or (N,1).

  • weight (torch.Tensor, optional) – Sample-wise loss weight with shape (N, *). Defaults to None.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • reduction_override (str, optional) – The method used to reduce the loss into a scalar. Options are “none”, “mean” and “sum”. Defaults to None.

Returns

Loss.

Return type

torch.Tensor

class mmcls.models.losses.LabelSmoothLoss(label_smooth_val, num_classes=None, mode=None, reduction='mean', loss_weight=1.0)[source]

Initializer for the label smoothed cross entropy loss.

Refers to Rethinking the Inception Architecture for Computer Vision

This decreases gap between output scores and encourages generalization. Labels provided to forward can be one-hot like vectors (NxC) or class indices (Nx1). And this accepts linear combination of one-hot like labels from mixup or cutmix except multi-label task.

Parameters
  • label_smooth_val (float) – The degree of label smoothing.

  • num_classes (int, optional) – Number of classes. Defaults to None.

  • mode (str) – Refers to notes, Options are ‘original’, ‘classy_vision’, ‘multi_label’. Defaults to ‘classy_vision’

  • reduction (str) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. Defaults to ‘mean’.

  • loss_weight (float) – Weight of the loss. Defaults to 1.0.

Notes

if the mode is “original”, this will use the same label smooth method as the original paper as:

\[(1-\epsilon)\delta_{k, y} + \frac{\epsilon}{K}\]

where epsilon is the label_smooth_val, K is the num_classes and delta(k,y) is Dirac delta, which equals 1 for k=y and 0 otherwise.

if the mode is “classy_vision”, this will use the same label smooth method as the facebookresearch/ClassyVision repo as:

\[\frac{\delta_{k, y} + \epsilon/K}{1+\epsilon}\]

if the mode is “multi_label”, this will accept labels from multi-label task and smoothing them as:

\[(1-2\epsilon)\delta_{k, y} + \epsilon\]
forward(cls_score, label, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]

Label smooth loss.

Parameters
  • pred (torch.Tensor) – The prediction with shape (N, *).

  • label (torch.Tensor) – The ground truth label of the prediction with shape (N, *).

  • weight (torch.Tensor, optional) – Sample-wise loss weight with shape (N, *). Defaults to None.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • reduction_override (str, optional) – The method used to reduce the loss into a scalar. Options are “none”, “mean” and “sum”. Defaults to None.

Returns

Loss.

Return type

torch.Tensor

generate_one_hot_like_label(label)[source]

This function takes one-hot or index label vectors and computes one- hot like label vectors (float)

class mmcls.models.losses.SeesawLoss(use_sigmoid=False, p=0.8, q=2.0, num_classes=1000, eps=0.01, reduction='mean', loss_weight=1.0)[source]

Implementation of seesaw loss.

Refers to Seesaw Loss for Long-Tailed Instance Segmentation (CVPR 2021)

Parameters
  • use_sigmoid (bool) – Whether the prediction uses sigmoid of softmax. Only False is supported. Defaults to False.

  • p (float) – The p in the mitigation factor. Defaults to 0.8.

  • q (float) – The q in the compenstation factor. Defaults to 2.0.

  • num_classes (int) – The number of classes. Default to 1000 for the ImageNet dataset.

  • eps (float) – The minimal value of divisor to smooth the computation of compensation factor, default to 1e-2.

  • reduction (str) – The method that reduces the loss to a scalar. Options are “none”, “mean” and “sum”. Default to “mean”.

  • loss_weight (float) – The weight of the loss. Defaults to 1.0

forward(cls_score, labels, weight=None, avg_factor=None, reduction_override=None)[source]

Forward function.

Parameters
  • cls_score (torch.Tensor) – The prediction with shape (N, C).

  • labels (torch.Tensor) – The learning label of the prediction.

  • weight (torch.Tensor, optional) – Sample-wise loss weight.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • reduction (str, optional) – The method used to reduce the loss. Options are “none”, “mean” and “sum”.

Returns

The calculated loss

Return type

torch.Tensor

mmcls.models.losses.accuracy(pred, target, topk=1, thrs=0.0)[source]

Calculate accuracy according to the prediction and target.

Parameters
  • pred (torch.Tensor | np.array) – The model prediction.

  • target (torch.Tensor | np.array) – The target of each prediction

  • topk (int | tuple[int]) – If the predictions in topk matches the target, the predictions will be regarded as correct ones. Defaults to 1.

  • thrs (Number | tuple[Number], optional) – Predictions with scores under the thresholds are considered negative. Default to 0.

Returns

Accuracy
  • float: If both topk and thrs is a single value.

  • list[float]: If one of topk or thrs is a tuple.

  • list[list[float]]: If both topk and thrs is a tuple. And the first dim is topk, the second dim is thrs.

Return type

float | list[float] | list[list[float]]

mmcls.models.losses.asymmetric_loss(pred, target, weight=None, gamma_pos=1.0, gamma_neg=4.0, clip=0.05, reduction='mean', avg_factor=None)[source]

asymmetric loss.

Please refer to the paper for details.

Parameters
  • pred (torch.Tensor) – The prediction with shape (N, *).

  • target (torch.Tensor) – The ground truth label of the prediction with shape (N, *).

  • weight (torch.Tensor, optional) – Sample-wise loss weight with shape (N, ). Defaults to None.

  • gamma_pos (float) – positive focusing parameter. Defaults to 0.0.

  • gamma_neg (float) – Negative focusing parameter. We usually set gamma_neg > gamma_pos. Defaults to 4.0.

  • clip (float, optional) – Probability margin. Defaults to 0.05.

  • reduction (str) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. If reduction is ‘none’ , loss is same shape as pred and label. Defaults to ‘mean’.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

Returns

Loss.

Return type

torch.Tensor

mmcls.models.losses.binary_cross_entropy(pred, label, weight=None, reduction='mean', avg_factor=None, class_weight=None, pos_weight=None)[source]

Calculate the binary CrossEntropy loss with logits.

Parameters
  • pred (torch.Tensor) – The prediction with shape (N, *).

  • label (torch.Tensor) – The gt label with shape (N, *).

  • weight (torch.Tensor, optional) – Element-wise weight of loss with shape (N, ). Defaults to None.

  • reduction (str) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. If reduction is ‘none’ , loss is same shape as pred and label. Defaults to ‘mean’.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • class_weight (torch.Tensor, optional) – The weight for each class with shape (C), C is the number of classes. Default None.

  • pos_weight (torch.Tensor, optional) – The positive weight for each class with shape (C), C is the number of classes. Default None.

Returns

The calculated loss

Return type

torch.Tensor

mmcls.models.losses.convert_to_one_hot(targets: torch.Tensor, classes)torch.Tensor[source]

This function converts target class indices to one-hot vectors, given the number of classes.

Parameters
  • targets (Tensor) – The ground truth label of the prediction with shape (N, 1)

  • classes (int) – the number of classes.

Returns

Processed loss values.

Return type

Tensor

mmcls.models.losses.cross_entropy(pred, label, weight=None, reduction='mean', avg_factor=None, class_weight=None)[source]

Calculate the CrossEntropy loss.

Parameters
  • pred (torch.Tensor) – The prediction with shape (N, C), C is the number of classes.

  • label (torch.Tensor) – The gt label of the prediction.

  • weight (torch.Tensor, optional) – Sample-wise loss weight.

  • reduction (str) – The method used to reduce the loss.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • class_weight (torch.Tensor, optional) – The weight for each class with shape (C), C is the number of classes. Default None.

Returns

The calculated loss

Return type

torch.Tensor

mmcls.models.losses.reduce_loss(loss, reduction)[source]

Reduce loss as specified.

Parameters
  • loss (Tensor) – Elementwise loss tensor.

  • reduction (str) – Options are “none”, “mean” and “sum”.

Returns

Reduced loss tensor.

Return type

Tensor

mmcls.models.losses.sigmoid_focal_loss(pred, target, weight=None, gamma=2.0, alpha=0.25, reduction='mean', avg_factor=None)[source]

Sigmoid focal loss.

Parameters
  • pred (torch.Tensor) – The prediction with shape (N, *).

  • target (torch.Tensor) – The ground truth label of the prediction with shape (N, *).

  • weight (torch.Tensor, optional) – Sample-wise loss weight with shape (N, ). Defaults to None.

  • gamma (float) – The gamma for calculating the modulating factor. Defaults to 2.0.

  • alpha (float) – A balanced form for Focal Loss. Defaults to 0.25.

  • reduction (str) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. If reduction is ‘none’ , loss is same shape as pred and label. Defaults to ‘mean’.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

Returns

Loss.

Return type

torch.Tensor

mmcls.models.losses.weight_reduce_loss(loss, weight=None, reduction='mean', avg_factor=None)[source]

Apply element-wise weight and reduce loss.

Parameters
  • loss (Tensor) – Element-wise loss.

  • weight (Tensor) – Element-wise weights.

  • reduction (str) – Same as built-in losses of PyTorch.

  • avg_factor (float) – Average factor when computing the mean of losses.

Returns

Processed loss values.

Return type

Tensor

mmcls.models.losses.weighted_loss(loss_func)[source]

Create a weighted version of a given loss function.

To use this decorator, the loss function must have the signature like loss_func(pred, target, **kwargs). The function only needs to compute element-wise loss without any reduction. This decorator will add weight and reduction arguments to the function. The decorated function will have the signature like loss_func(pred, target, weight=None, reduction='mean', avg_factor=None, **kwargs).

Example

>>> import torch
>>> @weighted_loss
>>> def l1_loss(pred, target):
>>>     return (pred - target).abs()
>>> pred = torch.Tensor([0, 2, 3])
>>> target = torch.Tensor([1, 1, 1])
>>> weight = torch.Tensor([1, 0, 1])
>>> l1_loss(pred, target)
tensor(1.3333)
>>> l1_loss(pred, target, weight)
tensor(1.)
>>> l1_loss(pred, target, reduction='none')
tensor([1., 1., 2.])
>>> l1_loss(pred, target, weight, avg_factor=2)
tensor(1.5000)

utils

class mmcls.models.utils.Augments(augments_cfg)[source]

Data augments.

We implement some data augmentation methods, such as mixup, cutmix.

Parameters

(list[mmcv.ConfigDict] | obj (augments_cfg) – mmcv.ConfigDict): Config dict of augments

Example

>>> augments_cfg = [
        dict(type='BatchCutMix', alpha=1., num_classes=10, prob=0.5),
        dict(type='BatchMixup', alpha=1., num_classes=10, prob=0.3)
    ]
>>> augments = Augments(augments_cfg)
>>> imgs = torch.randn(16, 3, 32, 32)
>>> label = torch.randint(0, 10, (16, ))
>>> imgs, label = augments(imgs, label)

To decide which augmentation within Augments block is used the following rule is applied. We pick augmentation based on the probabilities. In the example above, we decide if we should use BatchCutMix with probability 0.5, BatchMixup 0.3. As Identity is not in augments_cfg, we use Identity with probability 1 - 0.5 - 0.3 = 0.2.

class mmcls.models.utils.HybridEmbed(backbone, img_size=224, feature_size=None, in_channels=3, embed_dims=768, conv_cfg=None, init_cfg=None)[source]

CNN Feature Map Embedding.

Extract feature map from CNN, flatten, project to embedding dim.

Parameters
  • backbone (nn.Module) – CNN backbone

  • img_size (int | tuple) – The size of input image. Default: 224

  • feature_size (int | tuple, optional) – Size of feature map extracted by CNN backbone. Default: None

  • in_channels (int) – The num of input channels. Default: 3

  • embed_dims (int) – The dimensions of embedding. Default: 768

  • conv_cfg (dict, optional) – The config dict for conv layers. Default: None.

  • init_cfg (mmcv.ConfigDict, optional) – The Config for initialization. Default: None.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcls.models.utils.InvertedResidual(in_channels, out_channels, mid_channels, kernel_size=3, stride=1, se_cfg=None, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, with_cp=False, init_cfg=None)[source]

Inverted Residual Block.

Parameters
  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The output channels of this Module.

  • mid_channels (int) – The input channels of the depthwise convolution.

  • kernel_size (int) – The kernel size of the depthwise convolution. Default: 3.

  • stride (int) – The stride of the depthwise convolution. Default: 1.

  • se_cfg (dict) – Config dict for se layer. Default: None, which means no se layer.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

Returns

The output tensor.

Return type

Tensor

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcls.models.utils.MultiheadAttention(embed_dims, num_heads, input_dims=None, attn_drop=0.0, proj_drop=0.0, dropout_layer={'drop_prob': 0.0, 'type': 'Dropout'}, qkv_bias=True, qk_scale=None, proj_bias=True, v_shortcut=False, init_cfg=None)[source]

Multi-head Attention Module.

This module implements multi-head attention that supports different input dims and embed dims. And it also supports a shortcut from value, which is useful if input dims is not the same with embed dims.

Parameters
  • embed_dims (int) – The embedding dimension.

  • num_heads (int) – Parallel attention heads.

  • input_dims (int, optional) – The input dimension, and if None, use embed_dims. Defaults to None.

  • attn_drop (float) – Dropout rate of the dropout layer after the attention calculation of query and key. Defaults to 0.

  • proj_drop (float) – Dropout rate of the dropout layer after the output projection. Defaults to 0.

  • dropout_layer (dict) – The dropout config before adding the shortcut. Defaults to dict(type='Dropout', drop_prob=0.).

  • qkv_bias (bool) – If True, add a learnable bias to q, k, v. Defaults to True.

  • qk_scale (float, optional) – Override default qk scale of head_dim ** -0.5 if set. Defaults to None.

  • proj_bias (bool) – Defaults to True.

  • v_shortcut (bool) – Add a shortcut from value to output. It’s usually used if input_dims is different from embed_dims. Defaults to False.

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcls.models.utils.PatchEmbed(img_size=224, in_channels=3, embed_dims=768, norm_cfg=None, conv_cfg=None, init_cfg=None)[source]

Image to Patch Embedding.

We use a conv layer to implement PatchEmbed.

Parameters
  • img_size (int | tuple) – The size of input image. Default: 224

  • in_channels (int) – The num of input channels. Default: 3

  • embed_dims (int) – The dimensions of embedding. Default: 768

  • norm_cfg (dict, optional) – Config dict for normalization layer. Default: None

  • conv_cfg (dict, optional) – The config dict for conv layers. Default: None

  • init_cfg (mmcv.ConfigDict, optional) – The Config for initialization. Default: None

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcls.models.utils.PatchMerging(input_resolution, in_channels, expansion_ratio, kernel_size=2, stride=None, padding=0, dilation=1, bias=False, norm_cfg={'type': 'LN'}, init_cfg=None)[source]

Merge patch feature map.

This layer use nn.Unfold to group feature map by kernel_size, and use norm and linear layer to embed grouped feature map.

Parameters
  • input_resolution (tuple) – The size of input patch resolution.

  • in_channels (int) – The num of input channels.

  • expansion_ratio (Number) – Expansion ratio of output channels. The num of output channels is equal to int(expansion_ratio * in_channels).

  • kernel_size (int | tuple, optional) – the kernel size in the unfold layer. Defaults to 2.

  • stride (int | tuple, optional) – the stride of the sliding blocks in the unfold layer. Defaults to be equal with kernel_size.

  • padding (int | tuple, optional) – zero padding width in the unfold layer. Defaults to 0.

  • dilation (int | tuple, optional) – dilation parameter in the unfold layer. Defaults to 1.

  • bias (bool, optional) – Whether to add bias in linear layer or not. Defaults to False.

  • norm_cfg (dict, optional) – Config dict for normalization layer. Defaults to dict(type=’LN’).

  • init_cfg (dict, optional) – The extra config for initialization. Defaults to None.

forward(x)[source]

x: B, H*W, C

class mmcls.models.utils.SELayer(channels, squeeze_channels=None, ratio=16, divisor=8, bias='auto', conv_cfg=None, act_cfg=({'type': 'ReLU'}, {'type': 'Sigmoid'}), init_cfg=None)[source]

Squeeze-and-Excitation Module.

Parameters
  • channels (int) – The input (and output) channels of the SE layer.

  • squeeze_channels (None or int) – The intermediate channel number of SElayer. Default: None, means the value of squeeze_channels is make_divisible(channels // ratio, divisor).

  • ratio (int) – Squeeze ratio in SELayer, the intermediate channel will be make_divisible(channels // ratio, divisor). Only used when squeeze_channels is None. Default: 16.

  • divisor (int) – The divisor to true divide the channel number. Only used when squeeze_channels is None. Default: 8.

  • conv_cfg (None or dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • act_cfg (dict or Sequence[dict]) – Config dict for activation layer. If act_cfg is a dict, two activation layers will be configurated by this dict. If act_cfg is a sequence of dicts, the first activation layer will be configurated by the first dict and the second activation layer will be configurated by the second dict. Default: (dict(type=’ReLU’), dict(type=’Sigmoid’))

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcls.models.utils.ShiftWindowMSA(embed_dims, input_resolution, num_heads, window_size, shift_size=0, qkv_bias=True, qk_scale=None, attn_drop=0, proj_drop=0, dropout_layer={'drop_prob': 0.0, 'type': 'DropPath'}, auto_pad=False, init_cfg=None)[source]

Shift Window Multihead Self-Attention Module.

Parameters
  • embed_dims (int) – Number of input channels.

  • input_resolution (Tuple[int, int]) – The resolution of the input feature map.

  • num_heads (int) – Number of attention heads.

  • window_size (int) – The height and width of the window.

  • shift_size (int, optional) – The shift step of each window towards right-bottom. If zero, act as regular window-msa. Defaults to 0.

  • qkv_bias (bool, optional) – If True, add a learnable bias to q, k, v. Default: True

  • qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set. Defaults to None.

  • attn_drop (float, optional) – Dropout ratio of attention weight. Defaults to 0.0.

  • proj_drop (float, optional) – Dropout ratio of output. Defaults to 0.

  • dropout_layer (dict, optional) – The dropout_layer used before output. Defaults to dict(type=’DropPath’, drop_prob=0.).

  • auto_pad (bool, optional) – Auto pad the feature map to be divisible by window_size, Defaults to False.

  • init_cfg (dict, optional) – The extra config for initialization. Default: None.

forward(query)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmcls.models.utils.channel_shuffle(x, groups)[source]

Channel Shuffle operation.

This function enables cross-group information flow for multiple groups convolution layers.

Parameters
  • x (Tensor) – The input tensor.

  • groups (int) – The number of groups to divide the input tensor in the channel dimension.

Returns

The output tensor after channel shuffle operation.

Return type

Tensor

mmcls.models.utils.make_divisible(value, divisor, min_value=None, min_ratio=0.9)[source]

Make divisible function.

This function rounds the channel number down to the nearest value that can be divisible by the divisor.

Parameters
  • value (int) – The original channel number.

  • divisor (int) – The divisor to fully divide the channel number.

  • min_value (int, optional) – The minimum value of the output channel. Default: None, means that the minimum value equal to the divisor.

  • min_ratio (float) – The minimum ratio of the rounded channel number to the original channel number. Default: 0.9.

Returns

The modified output channel number

Return type

int

mmcls.datasets

datasets

class mmcls.datasets.BaseDataset(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[source]

Base dataset.

Parameters
  • data_prefix (str) – the prefix of data path

  • pipeline (list) – a list of dict, where each element represents a operation defined in mmcls.datasets.pipelines

  • ann_file (str | None) – the annotation file. When ann_file is str, the subclass is expected to read from the ann_file. When ann_file is None, the subclass is expected to read according to data_prefix

  • test_mode (bool) – in train mode or test mode

property class_to_idx

Map mapping class name to class index.

Returns

mapping from class name to class index.

Return type

dict

evaluate(results, metric='accuracy', metric_options=None, logger=None)[source]

Evaluate the dataset.

Parameters
  • results (list) – Testing results of the dataset.

  • metric (str | list[str]) – Metrics to be evaluated. Default value is accuracy.

  • metric_options (dict, optional) – Options for calculating metrics. Allowed keys are ‘topk’, ‘thrs’ and ‘average_mode’. Defaults to None.

  • logger (logging.Logger | str, optional) – Logger used for printing related information during evaluation. Defaults to None.

Returns

evaluation results

Return type

dict

get_cat_ids(idx: int)List[int][source]

Get category id by index.

Parameters

idx (int) – Index of data.

Returns

Image category of specified index.

Return type

cat_ids (List[int])

classmethod get_classes(classes=None)[source]

Get class names of current dataset.

Parameters

classes (Sequence[str] | str | None) – If classes is None, use default CLASSES defined by builtin dataset. If classes is a string, take it as a file name. The file contains the name of classes where each line contains one class name. If classes is a tuple or list, override the CLASSES defined by the dataset.

Returns

Names of categories of the dataset.

Return type

tuple[str] or list[str]

get_gt_labels()[source]

Get all ground-truth labels (categories).

Returns

categories for all images.

Return type

list[int]

class mmcls.datasets.CIFAR10(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[source]

CIFAR10 Dataset.

This implementation is modified from https://github.com/pytorch/vision/blob/master/torchvision/datasets/cifar.py

class mmcls.datasets.CIFAR100(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[source]

CIFAR100 Dataset.

class mmcls.datasets.ClassBalancedDataset(dataset, oversample_thr)[source]

A wrapper of repeated dataset with repeat factor.

Suitable for training on class imbalanced datasets like LVIS. Following the sampling strategy in 2, in each epoch, an image may appear multiple times based on its “repeat factor”.

The repeat factor for an image is a function of the frequency the rarest category labeled in that image. The “frequency of category c” in [0, 1] is defined by the fraction of images in the training set (without repeats) in which category c appears.

The dataset needs to implement self.get_cat_ids() to support ClassBalancedDataset.

The repeat factor is computed as followed.

  1. For each category c, compute the fraction \(f(c)\) of images that contain it.

  2. For each category c, compute the category-level repeat factor

    \[r(c) = \max(1, \sqrt{\frac{t}{f(c)}})\]
  3. For each image I and its labels \(L(I)\), compute the image-level repeat factor

    \[r(I) = \max_{c \in L(I)} r(c)\]

References

2

https://arxiv.org/pdf/1908.03195.pdf

Parameters
  • dataset (CustomDataset) – The dataset to be repeated.

  • oversample_thr (float) – frequency threshold below which data is repeated. For categories with f_c >= oversample_thr, there is no oversampling. For categories with f_c < oversample_thr, the degree of oversampling following the square-root inverse frequency heuristic above.

class mmcls.datasets.ConcatDataset(datasets)[source]

A wrapper of concatenated dataset.

Same as torch.utils.data.dataset.ConcatDataset, but add get_cat_ids function.

Parameters

datasets (list[Dataset]) – A list of datasets.

class mmcls.datasets.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, round_up=True)[source]
class mmcls.datasets.FashionMNIST(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[source]

Fashion-MNIST Dataset.

class mmcls.datasets.ImageNet(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[source]

ImageNet Dataset.

This implementation is modified from https://github.com/pytorch/vision/blob/master/torchvision/datasets/imagenet.py

class mmcls.datasets.ImageNet21k(data_prefix, pipeline, classes=None, ann_file=None, multi_label=False, recursion_subdir=False, test_mode=False)[source]

ImageNet21k Dataset.

Since the dataset ImageNet21k is extremely big, cantains 21k+ classes and 1.4B files. This class has improved the following points on the basis of the class ImageNet, in order to save memory usage and time

required :

  • Delete the samples attribute

  • using ‘slots’ create a Data_item tp replace dict

  • Modify setting info dict from function load_annotations to function prepare_data

  • using int instead of np.array(…, np.int64)

Args: data_prefix (str): the prefix of data path pipeline (list): a list of dict, where each element represents

a operation defined in mmcls.datasets.pipelines

ann_file (str | None): the annotation file. When ann_file is str,

the subclass is expected to read from the ann_file. When ann_file is None, the subclass is expected to read according to data_prefix

test_mode (bool): in train mode or test mode multi_label (bool): use multi label or not. recursion_subdir(bool): whether to use sub-directory pictures, which

are meet the conditions in the folder under category directory.

get_cat_ids(idx: int)List[int][source]

Get category id by index.

Parameters

idx (int) – Index of data.

Returns

Image category of specified index.

Return type

cat_ids (List[int])

load_annotations()[source]

load dataset annotations.

class mmcls.datasets.MNIST(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[source]

MNIST Dataset.

This implementation is modified from https://github.com/pytorch/vision/blob/master/torchvision/datasets/mnist.py

class mmcls.datasets.MultiLabelDataset(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[source]

Multi-label Dataset.

evaluate(results, metric='mAP', metric_options=None, logger=None, **deprecated_kwargs)[source]

Evaluate the dataset.

Parameters
  • results (list) – Testing results of the dataset.

  • metric (str | list[str]) – Metrics to be evaluated. Default value is ‘mAP’. Options are ‘mAP’, ‘CP’, ‘CR’, ‘CF1’, ‘OP’, ‘OR’ and ‘OF1’.

  • metric_options (dict, optional) – Options for calculating metrics. Allowed keys are ‘k’ and ‘thr’. Defaults to None

  • logger (logging.Logger | str, optional) – Logger used for printing related information during evaluation. Defaults to None.

  • deprecated_kwargs (dict) – Used for containing deprecated arguments.

Returns

evaluation results

Return type

dict

get_cat_ids(idx: int)List[int][source]

Get category ids by index.

Parameters

idx (int) – Index of data.

Returns

Image categories of specified index.

Return type

cat_ids (List[int])

class mmcls.datasets.RepeatDataset(dataset, times)[source]

A wrapper of repeated dataset.

The length of repeated dataset will be times larger than the original dataset. This is useful when the data loading time is long but the dataset is small. Using RepeatDataset can reduce the data loading time between epochs.

Parameters
  • dataset (Dataset) – The dataset to be repeated.

  • times (int) – Repeat times.

class mmcls.datasets.VOC(**kwargs)[source]

Pascal VOC Dataset.

load_annotations()[source]

Load annotations.

Returns

Annotation info from XML file.

Return type

list[dict]

mmcls.datasets.build_dataloader(dataset, samples_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, round_up=True, seed=None, pin_memory=True, persistent_workers=True, **kwargs)[source]

Build PyTorch DataLoader.

In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.

Parameters
  • dataset (Dataset) – A PyTorch dataset.

  • samples_per_gpu (int) – Number of training samples on each GPU, i.e., batch size of each GPU.

  • workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.

  • num_gpus (int) – Number of GPUs. Only used in non-distributed training.

  • dist (bool) – Distributed training/test or not. Default: True.

  • shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.

  • round_up (bool) – Whether to round up the length of dataset by adding extra samples to make it evenly divisible. Default: True.

  • pin_memory (bool) – Whether to use pin_memory in DataLoader. Default: True

  • persistent_workers (bool) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. The argument also has effect in PyTorch>=1.7.0. Default: True

  • kwargs – any keyword argument to be used to initialize DataLoader

Returns

A PyTorch dataloader.

Return type

DataLoader

pipelines

class mmcls.datasets.pipelines.AutoAugment(policies, hparams={'pad_val': 128})[source]

Auto augmentation.

This data augmentation is proposed in AutoAugment: Learning Augmentation Policies from Data.

Parameters
  • policies (list[list[dict]]) – The policies of auto augmentation. Each policy in policies is a specific augmentation policy, and is composed by several augmentations (dict). When AutoAugment is called, a random policy in policies will be selected to augment images.

  • hparams (dict) – Configs of hyperparameters. Hyperparameters will be used in policies that require these arguments if these arguments are not set in policy dicts. Defaults to use _HPARAMS_DEFAULT.

class mmcls.datasets.pipelines.AutoContrast(prob=0.5)[source]

Auto adjust image contrast.

Parameters

prob (float) – The probability for performing invert therefore should be in range [0, 1]. Defaults to 0.5.

class mmcls.datasets.pipelines.Brightness(magnitude, prob=0.5, random_negative_prob=0.5)[source]

Adjust images brightness.

Parameters
  • magnitude (int | float) – The magnitude used for adjusting brightness. A positive magnitude would enhance the brightness and a negative magnitude would make the image darker. A magnitude=0 gives the origin img.

  • prob (float) – The probability for performing contrast adjusting therefore should be in range [0, 1]. Defaults to 0.5.

  • random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.

class mmcls.datasets.pipelines.CenterCrop(crop_size, efficientnet_style=False, crop_padding=32, interpolation='bilinear', backend='cv2')[source]

Center crop the image.

Parameters
  • crop_size (int | tuple) – Expected size after cropping with the format of (h, w).

  • efficientnet_style (bool) – Whether to use efficientnet style center crop. Defaults to False.

  • crop_padding (int) – The crop padding parameter in efficientnet style center crop. Only valid if efficientnet style is True. Defaults to 32.

  • interpolation (str) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Only valid if efficientnet_style is True. Defaults to ‘bilinear’.

  • backend (str) – The image resize backend type, accepted values are cv2 and pillow. Only valid if efficientnet style is True. Defaults to cv2.

Notes

  • If the image is smaller than the crop size, return the original image.

  • If efficientnet_style is set to False, the pipeline would be a simple center crop using the crop_size.

  • If efficientnet_style is set to True, the pipeline will be to first to perform the center crop with the crop_size_ as:

\[\text{crop\_size\_} = \frac{\text{crop\_size}}{\text{crop\_size} + \text{crop\_padding}} \times \text{short\_edge}\]

And then the pipeline resizes the img to the input crop size.

class mmcls.datasets.pipelines.Collect(keys, meta_keys=('filename', 'ori_filename', 'ori_shape', 'img_shape', 'flip', 'flip_direction', 'img_norm_cfg'))[source]

Collect data from the loader relevant to the specific task.

This is usually the last stage of the data loader pipeline. Typically keys is set to some subset of “img” and “gt_label”.

Parameters
  • keys (Sequence[str]) – Keys of results to be collected in data.

  • meta_keys (Sequence[str], optional) – Meta keys to be converted to mmcv.DataContainer and collected in data[img_metas]. Default: (‘filename’, ‘ori_shape’, ‘img_shape’, ‘flip’, ‘flip_direction’, ‘img_norm_cfg’)

Returns

The result dict contains the following keys

  • keys in self.keys

  • img_metas if available

Return type

dict

class mmcls.datasets.pipelines.ColorJitter(brightness, contrast, saturation)[source]

Randomly change the brightness, contrast and saturation of an image.

Parameters
  • brightness (float) – How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness].

  • contrast (float) – How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast].

  • saturation (float) – How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation].

class mmcls.datasets.pipelines.ColorTransform(magnitude, prob=0.5, random_negative_prob=0.5)[source]

Adjust images color balance.

Parameters
  • magnitude (int | float) – The magnitude used for color transform. A positive magnitude would enhance the color and a negative magnitude would make the image grayer. A magnitude=0 gives the origin img.

  • prob (float) – The probability for performing ColorTransform therefore should be in range [0, 1]. Defaults to 0.5.

  • random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.

class mmcls.datasets.pipelines.Compose(transforms)[source]

Compose a data pipeline with a sequence of transforms.

Parameters

transforms (list[dict | callable]) – Either config dicts of transforms or transform objects.

class mmcls.datasets.pipelines.Contrast(magnitude, prob=0.5, random_negative_prob=0.5)[source]

Adjust images contrast.

Parameters
  • magnitude (int | float) – The magnitude used for adjusting contrast. A positive magnitude would enhance the contrast and a negative magnitude would make the image grayer. A magnitude=0 gives the origin img.

  • prob (float) – The probability for performing contrast adjusting therefore should be in range [0, 1]. Defaults to 0.5.

  • random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.

class mmcls.datasets.pipelines.Cutout(shape, pad_val=128, prob=0.5)[source]

Cutout images.

Parameters
  • shape (int | float | tuple(int | float)) – Expected cutout shape (h, w). If given as a single value, the value will be used for both h and w.

  • pad_val (int, Sequence[int]) – Pixel pad_val value for constant fill. If it is a sequence, it must have the same length with the image channels. Defaults to 128.

  • prob (float) – The probability for performing cutout therefore should be in range [0, 1]. Defaults to 0.5.

class mmcls.datasets.pipelines.Equalize(prob=0.5)[source]

Equalize the image histogram.

Parameters

prob (float) – The probability for performing invert therefore should be in range [0, 1]. Defaults to 0.5.

class mmcls.datasets.pipelines.Invert(prob=0.5)[source]

Invert images.

Parameters

prob (float) – The probability for performing invert therefore should be in range [0, 1]. Defaults to 0.5.

class mmcls.datasets.pipelines.Lighting(eigval, eigvec, alphastd=0.1, to_rgb=True)[source]

Adjust images lighting using AlexNet-style PCA jitter.

Parameters
  • eigval (list) – the eigenvalue of the convariance matrix of pixel values, respectively.

  • eigvec (list[list]) – the eigenvector of the convariance matrix of pixel values, respectively.

  • alphastd (float) – The standard deviation for distribution of alpha. Defaults to 0.1

  • to_rgb (bool) – Whether to convert img to rgb.

class mmcls.datasets.pipelines.LoadImageFromFile(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]

Load an image from file.

Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape) and “img_norm_cfg” (means=0 and stds=1).

Parameters
  • to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

  • color_type (str) – The flag argument for mmcv.imfrombytes(). Defaults to ‘color’.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

class mmcls.datasets.pipelines.Normalize(mean, std, to_rgb=True)[source]

Normalize the image.

Parameters
  • mean (sequence) – Mean values of 3 channels.

  • std (sequence) – Std values of 3 channels.

  • to_rgb (bool) – Whether to convert the image from BGR to RGB, default is true.

class mmcls.datasets.pipelines.Pad(size=None, pad_to_square=False, pad_val=0, padding_mode='constant')[source]

Pad images.

Parameters
  • size (tuple[int] | None) – Expected padding size (h, w). Conflicts with pad_to_square. Defaults to None.

  • pad_to_square (bool) – Pad any image to square shape. Defaults to False.

  • pad_val (Number | Sequence[Number]) – Values to be filled in padding areas when padding_mode is ‘constant’. Default to 0.

  • padding_mode (str) – Type of padding. Should be: constant, edge, reflect or symmetric. Default to “constant”.

class mmcls.datasets.pipelines.Posterize(bits, prob=0.5)[source]

Posterize images (reduce the number of bits for each color channel).

Parameters
  • bits (int | float) – Number of bits for each pixel in the output img, which should be less or equal to 8.

  • prob (float) – The probability for posterizing therefore should be in range [0, 1]. Defaults to 0.5.

class mmcls.datasets.pipelines.RandAugment(policies, num_policies, magnitude_level, magnitude_std=0.0, total_level=30, hparams={'pad_val': 128})[source]

Random augmentation.

This data augmentation is proposed in RandAugment: Practical automated data augmentation with a reduced search space.

Parameters
  • policies (list[dict]) – The policies of random augmentation. Each policy in policies is one specific augmentation policy (dict). The policy shall at least have key type, indicating the type of augmentation. For those which have magnitude, (given to the fact they are named differently in different augmentation, ) magnitude_key and magnitude_range shall be the magnitude argument (str) and the range of magnitude (tuple in the format of (val1, val2)), respectively. Note that val1 is not necessarily less than val2.

  • num_policies (int) – Number of policies to select from policies each time.

  • magnitude_level (int | float) – Magnitude level for all the augmentation selected.

  • total_level (int | float) – Total level for the magnitude. Defaults to 30.

  • magnitude_std (Number | str) –

    Deviation of magnitude noise applied.

    • If positive number, magnitude is sampled from normal distribution (mean=magnitude, std=magnitude_std).

    • If 0 or negative number, magnitude remains unchanged.

    • If str “inf”, magnitude is sampled from uniform distribution (range=[min, magnitude]).

  • hparams (dict) – Configs of hyperparameters. Hyperparameters will be used in policies that require these arguments if these arguments are not set in policy dicts. Defaults to use _HPARAMS_DEFAULT.

Note

magnitude_std will introduce some randomness to policy, modified by https://github.com/rwightman/pytorch-image-models.

When magnitude_std=0, we calculate the magnitude as follows:

\[\text{magnitude} = \frac{\text{magnitude\_level}} {\text{total\_level}} \times (\text{val2} - \text{val1}) + \text{val1}\]
class mmcls.datasets.pipelines.RandomCrop(size, padding=None, pad_if_needed=False, pad_val=0, padding_mode='constant')[source]

Crop the given Image at a random location.

Parameters
  • size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.

  • padding (int or sequence, optional) – Optional padding on each border of the image. If a sequence of length 4 is provided, it is used to pad left, top, right, bottom borders respectively. If a sequence of length 2 is provided, it is used to pad left/right, top/bottom borders, respectively. Default: None, which means no padding.

  • pad_if_needed (boolean) – It will pad the image if smaller than the desired size to avoid raising an exception. Since cropping is done after padding, the padding seems to be done at a random offset. Default: False.

  • pad_val (Number | Sequence[Number]) – Pixel pad_val value for constant fill. If a tuple of length 3, it is used to pad_val R, G, B channels respectively. Default: 0.

  • padding_mode (str) –

    Type of padding. Defaults to “constant”. Should be one of the following:

    • constant: Pads with a constant value, this value is specified with pad_val.

    • edge: pads with the last value at the edge of the image.

    • reflect: Pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2].

    • symmetric: Pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3].

static get_params(img, output_size)[source]

Get parameters for crop for a random crop.

Parameters
  • img (ndarray) – Image to be cropped.

  • output_size (tuple) – Expected output size of the crop.

Returns

Params (xmin, ymin, target_height, target_width) to be

passed to crop for random crop.

Return type

tuple

class mmcls.datasets.pipelines.RandomErasing(erase_prob=0.5, min_area_ratio=0.02, max_area_ratio=0.4, aspect_range=(0.3, 3.3333333333333335), mode='const', fill_color=(128, 128, 128), fill_std=None)[source]

Randomly selects a rectangle region in an image and erase pixels.

Parameters
  • erase_prob (float) – Probability that image will be randomly erased. Default: 0.5

  • min_area_ratio (float) – Minimum erased area / input image area Default: 0.02

  • max_area_ratio (float) – Maximum erased area / input image area Default: 0.4

  • aspect_range (sequence | float) – Aspect ratio range of erased area. if float, it will be converted to (aspect_ratio, 1/aspect_ratio) Default: (3/10, 10/3)

  • mode (str) –

    Fill method in erased area, can be:

    • const (default): All pixels are assign with the same value.

    • rand: each pixel is assigned with a random value in [0, 255]

  • fill_color (sequence | Number) – Base color filled in erased area. Defaults to (128, 128, 128).

  • fill_std (sequence | Number, optional) – If set and mode is ‘rand’, fill erased area with random color from normal distribution (mean=fill_color, std=fill_std); If not set, fill erased area with random color from uniform distribution (0~255). Defaults to None.

Note

See Random Erasing Data Augmentation

This paper provided 4 modes: RE-R, RE-M, RE-0, RE-255, and use RE-M as default. The config of these 4 modes are:

  • RE-R: RandomErasing(mode=’rand’)

  • RE-M: RandomErasing(mode=’const’, fill_color=(123.67, 116.3, 103.5))

  • RE-0: RandomErasing(mode=’const’, fill_color=0)

  • RE-255: RandomErasing(mode=’const’, fill_color=255)

class mmcls.datasets.pipelines.RandomFlip(flip_prob=0.5, direction='horizontal')[source]

Flip the image randomly.

Flip the image randomly based on flip probaility and flip direction.

Parameters
  • flip_prob (float) – probability of the image being flipped. Default: 0.5

  • direction (str) – The flipping direction. Options are ‘horizontal’ and ‘vertical’. Default: ‘horizontal’.

class mmcls.datasets.pipelines.RandomGrayscale(gray_prob=0.1)[source]

Randomly convert image to grayscale with a probability of gray_prob.

Parameters

gray_prob (float) – Probability that image should be converted to grayscale. Default: 0.1.

Returns

Image after randomly grayscale transform.

Return type

ndarray

Notes

  • If input image is 1 channel: grayscale version is 1 channel.

  • If input image is 3 channel: grayscale version is 3 channel with r == g == b.

class mmcls.datasets.pipelines.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), max_attempts=10, efficientnet_style=False, min_covered=0.1, crop_padding=32, interpolation='bilinear', backend='cv2')[source]

Crop the given image to random size and aspect ratio.

A crop of random size (default: of 0.08 to 1.0) of the original size and a random aspect ratio (default: of 3/4 to 4/3) of the original aspect ratio is made. This crop is finally resized to given size.

Parameters
  • size (sequence | int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.

  • scale (tuple) – Range of the random size of the cropped image compared to the original image. Defaults to (0.08, 1.0).

  • ratio (tuple) – Range of the random aspect ratio of the cropped image compared to the original image. Defaults to (3. / 4., 4. / 3.).

  • max_attempts (int) – Maximum number of attempts before falling back to Central Crop. Defaults to 10.

  • efficientnet_style (bool) – Whether to use efficientnet style Random ResizedCrop. Defaults to False.

  • min_covered (Number) – Minimum ratio of the cropped area to the original area. Only valid if efficientnet_style is true. Defaults to 0.1.

  • crop_padding (int) – The crop padding parameter in efficientnet style center crop. Only valid if efficientnet_style is true. Defaults to 32.

  • interpolation (str) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘bilinear’.

  • backend (str) – The image resize backend type, accepted values are cv2 and pillow. Defaults to cv2.

static get_params(img, scale, ratio, max_attempts=10)[source]

Get parameters for crop for a random sized crop.

Parameters
  • img (ndarray) – Image to be cropped.

  • scale (tuple) – Range of the random size of the cropped image compared to the original image size.

  • ratio (tuple) – Range of the random aspect ratio of the cropped image compared to the original image area.

  • max_attempts (int) – Maximum number of attempts before falling back to central crop. Defaults to 10.

Returns

Params (ymin, xmin, ymax, xmax) to be passed to crop for

a random sized crop.

Return type

tuple

static get_params_efficientnet_style(img, size, scale, ratio, max_attempts=10, min_covered=0.1, crop_padding=32)[source]

Get parameters for crop for a random sized crop in efficientnet style.

Parameters
  • img (ndarray) – Image to be cropped.

  • size (sequence) – Desired output size of the crop.

  • scale (tuple) – Range of the random size of the cropped image compared to the original image size.

  • ratio (tuple) – Range of the random aspect ratio of the cropped image compared to the original image area.

  • max_attempts (int) – Maximum number of attempts before falling back to central crop. Defaults to 10.

  • min_covered (Number) – Minimum ratio of the cropped area to the original area. Only valid if efficientnet_style is true. Defaults to 0.1.

  • crop_padding (int) – The crop padding parameter in efficientnet style center crop. Defaults to 32.

Returns

Params (ymin, xmin, ymax, xmax) to be passed to crop for

a random sized crop.

Return type

tuple

class mmcls.datasets.pipelines.Resize(size, interpolation='bilinear', adaptive_side='short', backend='cv2')[source]

Resize images.

Parameters
  • size (int | tuple) – Images scales for resizing (h, w). When size is int, the default behavior is to resize an image to (size, size). When size is tuple and the second value is -1, the image will be resized according to adaptive_side. For example, when size is 224, the image is resized to 224x224. When size is (224, -1) and adaptive_size is “short”, the short side is resized to 224 and the other side is computed based on the short side, maintaining the aspect ratio.

  • interpolation (str) – Interpolation method. For “cv2” backend, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos”. For “pillow” backend, accepted values are “nearest”, “bilinear”, “bicubic”, “box”, “lanczos”, “hamming”. More details can be found in mmcv.image.geometric.

  • adaptive_side (str) – Adaptive resize policy, accepted values are “short”, “long”, “height”, “width”. Default to “short”.

  • backend (str) – The image resize backend type, accepted values are cv2 and pillow. Default: cv2.

class mmcls.datasets.pipelines.Rotate(angle, center=None, scale=1.0, pad_val=128, prob=0.5, random_negative_prob=0.5, interpolation='nearest')[source]

Rotate images.

Parameters
  • angle (float) – The angle used for rotate. Positive values stand for clockwise rotation.

  • center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If None, the center of the image will be used. Defaults to None.

  • scale (float) – Isotropic scale factor. Defaults to 1.0.

  • pad_val (int, Sequence[int]) – Pixel pad_val value for constant fill. If a sequence of length 3, it is used to pad_val R, G, B channels respectively. Defaults to 128.

  • prob (float) – The probability for performing Rotate therefore should be in range [0, 1]. Defaults to 0.5.

  • random_negative_prob (float) – The probability that turns the angle negative, which should be in range [0,1]. Defaults to 0.5.

  • interpolation (str) – Interpolation method. Options are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘nearest’.

class mmcls.datasets.pipelines.Sharpness(magnitude, prob=0.5, random_negative_prob=0.5)[source]

Adjust images sharpness.

Parameters
  • magnitude (int | float) – The magnitude used for adjusting sharpness. A positive magnitude would enhance the sharpness and a negative magnitude would make the image bulr. A magnitude=0 gives the origin img.

  • prob (float) – The probability for performing contrast adjusting therefore should be in range [0, 1]. Defaults to 0.5.

  • random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.

class mmcls.datasets.pipelines.Shear(magnitude, pad_val=128, prob=0.5, direction='horizontal', random_negative_prob=0.5, interpolation='bicubic')[source]

Shear images.

Parameters
  • magnitude (int | float) – The magnitude used for shear.

  • pad_val (int, Sequence[int]) – Pixel pad_val value for constant fill. If a sequence of length 3, it is used to pad_val R, G, B channels respectively. Defaults to 128.

  • prob (float) – The probability for performing Shear therefore should be in range [0, 1]. Defaults to 0.5.

  • direction (str) – The shearing direction. Options are ‘horizontal’ and ‘vertical’. Defaults to ‘horizontal’.

  • random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.

  • interpolation (str) – Interpolation method. Options are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘bicubic’.

class mmcls.datasets.pipelines.Solarize(thr, prob=0.5)[source]

Solarize images (invert all pixel values above a threshold).

Parameters
  • thr (int | float) – The threshold above which the pixels value will be inverted.

  • prob (float) – The probability for solarizing therefore should be in range [0, 1]. Defaults to 0.5.

class mmcls.datasets.pipelines.SolarizeAdd(magnitude, thr=128, prob=0.5)[source]

SolarizeAdd images (add a certain value to pixels below a threshold).

Parameters
  • magnitude (int | float) – The value to be added to pixels below the thr.

  • thr (int | float) – The threshold below which the pixels value will be adjusted.

  • prob (float) – The probability for solarizing therefore should be in range [0, 1]. Defaults to 0.5.

class mmcls.datasets.pipelines.Translate(magnitude, pad_val=128, prob=0.5, direction='horizontal', random_negative_prob=0.5, interpolation='nearest')[source]

Translate images.

Parameters
  • magnitude (int | float) – The magnitude used for translate. Note that the offset is calculated by magnitude * size in the corresponding direction. With a magnitude of 1, the whole image will be moved out of the range.

  • pad_val (int, Sequence[int]) – Pixel pad_val value for constant fill. If a sequence of length 3, it is used to pad_val R, G, B channels respectively. Defaults to 128.

  • prob (float) – The probability for performing translate therefore should be in range [0, 1]. Defaults to 0.5.

  • direction (str) – The translating direction. Options are ‘horizontal’ and ‘vertical’. Defaults to ‘horizontal’.

  • random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.

  • interpolation (str) – Interpolation method. Options are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘nearest’.

mmcls.datasets.pipelines.to_tensor(data)[source]

Convert objects of various python types to torch.Tensor.

Supported types are: numpy.ndarray, torch.Tensor, Sequence, int and float.

mmcls.utils

mmcls.utils.collect_env()[source]

Collect the information of the running environments.

mmcls.utils.load_json_logs(json_logs)[source]

load and convert json_logs to log_dicts.

Parameters

json_logs (str) – paths of json_logs.

Returns

dict())]: key is epoch, value is a sub dict keys of

sub dict is different metrics, e.g. memory, bbox_mAP, value of sub dict is a list of corresponding values of all iterations.

Return type

list[dict(int

Read the Docs v: latest
Versions
master
latest
stable
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.