mmcls.apis¶
 mmcls.apis.inference_model(model, img)[source]¶
Inference image(s) with the classifier.
 Parameters
model (nn.Module) – The loaded classifier.
img (str/ndarray) – The image filename or loaded image.
 Returns
 The classification results that contains
class_name, pred_label and pred_score.
 Return type
result (dict)
 mmcls.apis.init_model(config, checkpoint=None, device='cuda:0', options=None)[source]¶
Initialize a classifier from config file.
 Parameters
config (str or
mmcv.Config
) – Config file path or the config object.checkpoint (str, optional) – Checkpoint path. If left as None, the model will not load any weights.
options (dict) – Options to override some settings in the used config.
 Returns
The constructed classifier.
 Return type
nn.Module
 mmcls.apis.multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False)[source]¶
Test model with multiple gpus.
This method tests model with multiple gpus and collects the results under two different modes: gpu and cpu modes. By setting ‘gpu_collect=True’ it encodes results to gpu tensors and use gpu communication for results collection. On cpu mode it saves the results on different gpus to ‘tmpdir’ and collects them by the rank 0 worker.
 Parameters
model (nn.Module) – Model to be tested.
data_loader (nn.Dataloader) – Pytorch data loader.
tmpdir (str) – Path of directory to save the temporary results from different gpus under cpu mode.
gpu_collect (bool) – Option to use either gpu or cpu to collect results.
 Returns
The prediction results.
 Return type
list
 mmcls.apis.set_random_seed(seed, deterministic=False)[source]¶
Set random seed.
 Parameters
seed (int) – Seed to be used.
deterministic (bool) – Whether to set the deterministic option for CUDNN backend, i.e., set torch.backends.cudnn.deterministic to True and torch.backends.cudnn.benchmark to False. Default: False.
 mmcls.apis.show_result_pyplot(model, img, result, fig_size=(15, 10), title='result', wait_time=0)[source]¶
Visualize the classification results on the image.
 Parameters
model (nn.Module) – The loaded classifier.
img (str or np.ndarray) – Image filename or loaded image.
result (list) – The classification result.
fig_size (tuple) – Figure size of the pyplot figure. Defaults to (15, 10).
title (str) – Title of the pyplot figure. Defaults to ‘result’.
wait_time (int) – How many seconds to display the image. Defaults to 0.
mmcls.core¶
evaluation¶
 class mmcls.core.evaluation.DistEvalHook(dataloader, interval=1, gpu_collect=False, by_epoch=True, **eval_kwargs)[source]¶
Distributed evaluation hook.
 Parameters
dataloader (DataLoader) – A PyTorch dataloader.
interval (int) – Evaluation interval (by epochs). Default: 1.
tmpdir (str, optional) – Temporary directory to save the results of all processes. Default: None.
gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.
 class mmcls.core.evaluation.EvalHook(dataloader, interval=1, by_epoch=True, **eval_kwargs)[source]¶
Evaluation hook.
 Parameters
dataloader (DataLoader) – A PyTorch dataloader.
interval (int) – Evaluation interval (by epochs). Default: 1.
 mmcls.core.evaluation.average_performance(pred, target, thr=None, k=None)[source]¶
Calculate CP, CR, CF1, OP, OR, OF1, where C stands for perclass average, O stands for overall average, P stands for precision, R stands for recall and F1 stands for F1score.
 Parameters
pred (torch.Tensor  np.ndarray) – The model prediction with shape (N, C), where C is the number of classes.
target (torch.Tensor  np.ndarray) – The target of each prediction with shape (N, C), where C is the number of classes. 1 stands for positive examples, 0 stands for negative examples and 1 stands for difficult examples.
thr (float) – The confidence threshold. Defaults to None.
k (int) – Topk performance. Note that if thr and k are both given, k will be ignored. Defaults to None.
 Returns
(CP, CR, CF1, OP, OR, OF1)
 Return type
tuple
 mmcls.core.evaluation.average_precision(pred, target)[source]¶
Calculate the average precision for a single class.
AP summarizes a precisionrecall curve as the weighted mean of maximum precisions obtained for any r’>r, where r is the recall:
\[\text{AP} = \sum_n (R_n  R_{n1}) P_n\]Note that no approximation is involved since the curve is piecewise constant.
 Parameters
pred (np.ndarray) – The model prediction with shape (N, ).
target (np.ndarray) – The target of each prediction with shape (N, ).
 Returns
a single float as average precision value.
 Return type
float
 mmcls.core.evaluation.calculate_confusion_matrix(pred, target)[source]¶
Calculate confusion matrix according to the prediction and target.
 Parameters
pred (torch.Tensor  np.array) – The model prediction with shape (N, C).
target (torch.Tensor  np.array) – The target of each prediction with shape (N, 1) or (N,).
 Returns
 Confusion matrix
The shape is (C, C), where C is the number of classes.
 Return type
torch.Tensor
 mmcls.core.evaluation.f1_score(pred, target, average_mode='macro', thrs=0.0)[source]¶
Calculate F1 score according to the prediction and target.
 Parameters
pred (torch.Tensor  np.array) – The model prediction with shape (N, C).
target (torch.Tensor  np.array) – The target of each prediction with shape (N, 1) or (N,).
average_mode (str) – The type of averaging performed on the result. Options are ‘macro’ and ‘none’. If ‘none’, the scores for each class are returned. If ‘macro’, calculate metrics for each class, and find their unweighted mean. Defaults to ‘macro’.
thrs (Number  tuple[Number], optional) – Predictions with scores under the thresholds are considered negative. Default to 0.
 Returns
F1 score.
 Return type
float  np.array  list[float  np.array]
Args
thrs
is numberthrs
is tupleaverage_mode
= “macro”float
list[float]
average_mode
= “none”np.array
list[np.array]
 mmcls.core.evaluation.mAP(pred, target)[source]¶
Calculate the mean average precision with respect of classes.
 Parameters
pred (torch.Tensor  np.ndarray) – The model prediction with shape (N, C), where C is the number of classes.
target (torch.Tensor  np.ndarray) – The target of each prediction with shape (N, C), where C is the number of classes. 1 stands for positive examples, 0 stands for negative examples and 1 stands for difficult examples.
 Returns
A single float as mAP value.
 Return type
float
 mmcls.core.evaluation.precision(pred, target, average_mode='macro', thrs=0.0)[source]¶
Calculate precision according to the prediction and target.
 Parameters
pred (torch.Tensor  np.array) – The model prediction with shape (N, C).
target (torch.Tensor  np.array) – The target of each prediction with shape (N, 1) or (N,).
average_mode (str) – The type of averaging performed on the result. Options are ‘macro’ and ‘none’. If ‘none’, the scores for each class are returned. If ‘macro’, calculate metrics for each class, and find their unweighted mean. Defaults to ‘macro’.
thrs (Number  tuple[Number], optional) – Predictions with scores under the thresholds are considered negative. Default to 0.
 Returns
Precision.
 Return type
float  np.array  list[float  np.array]
Args
thrs
is numberthrs
is tupleaverage_mode
= “macro”float
list[float]
average_mode
= “none”np.array
list[np.array]
 mmcls.core.evaluation.precision_recall_f1(pred, target, average_mode='macro', thrs=0.0)[source]¶
Calculate precision, recall and f1 score according to the prediction and target.
 Parameters
pred (torch.Tensor  np.array) – The model prediction with shape (N, C).
target (torch.Tensor  np.array) – The target of each prediction with shape (N, 1) or (N,).
average_mode (str) – The type of averaging performed on the result. Options are ‘macro’ and ‘none’. If ‘none’, the scores for each class are returned. If ‘macro’, calculate metrics for each class, and find their unweighted mean. Defaults to ‘macro’.
thrs (Number  tuple[Number], optional) – Predictions with scores under the thresholds are considered negative. Default to 0.
 Returns
tuple containing precision, recall, f1 score.
The type of precision, recall, f1 score is one of the following:
Args
thrs
is numberthrs
is tupleaverage_mode
= “macro”float
list[float]
average_mode
= “none”np.array
list[np.array]
 Return type
tuple
 mmcls.core.evaluation.recall(pred, target, average_mode='macro', thrs=0.0)[source]¶
Calculate recall according to the prediction and target.
 Parameters
pred (torch.Tensor  np.array) – The model prediction with shape (N, C).
target (torch.Tensor  np.array) – The target of each prediction with shape (N, 1) or (N,).
average_mode (str) – The type of averaging performed on the result. Options are ‘macro’ and ‘none’. If ‘none’, the scores for each class are returned. If ‘macro’, calculate metrics for each class, and find their unweighted mean. Defaults to ‘macro’.
thrs (Number  tuple[Number], optional) – Predictions with scores under the thresholds are considered negative. Default to 0.
 Returns
Recall.
 Return type
float  np.array  list[float  np.array]
Args
thrs
is numberthrs
is tupleaverage_mode
= “macro”float
list[float]
average_mode
= “none”np.array
list[np.array]
 mmcls.core.evaluation.support(pred, target, average_mode='macro')[source]¶
Calculate the total number of occurrences of each label according to the prediction and target.
 Parameters
pred (torch.Tensor  np.array) – The model prediction with shape (N, C).
target (torch.Tensor  np.array) – The target of each prediction with shape (N, 1) or (N,).
average_mode (str) – The type of averaging performed on the result. Options are ‘macro’ and ‘none’. If ‘none’, the scores for each class are returned. If ‘macro’, calculate metrics for each class, and find their unweighted sum. Defaults to ‘macro’.
 Returns
Support.
If the
average_mode
is set to macro, the function returns a single float.If the
average_mode
is set to none, the function returns a np.array with shape C.
 Return type
float  np.array
mmcls.models¶
models¶
classifiers¶
 class mmcls.models.classifiers.BaseClassifier(init_cfg=None)[source]¶
Base class for classifiers.
 forward(img, return_loss=True, **kwargs)[source]¶
Calls either forward_train or forward_test depending on whether return_loss=True.
Note this setting will change the expected inputs. When return_loss=True, img and img_meta are singlenested (i.e. Tensor and List[dict]), and when resturn_loss=False, img and img_meta should be double nested (i.e. List[Tensor], List[List[dict]]), with the outer list indicating test time augmentations.
 forward_test(imgs, **kwargs)[source]¶
 Parameters
imgs (List[Tensor]) – the outer list indicates testtime augmentations and inner Tensor should have a shape NxCxHxW, which contains all images in the batch.
 abstract forward_train(imgs, **kwargs)[source]¶
 Parameters
img (list[Tensor]) – List of tensors of shape (1, C, H, W). Typically these should be mean centered and std scaled.
kwargs (keyword arguments) – Specific to concrete implementation.
 show_result(img, result, text_color='white', font_scale=0.5, row_width=20, show=False, fig_size=(15, 10), win_name='', wait_time=0, out_file=None)[source]¶
Draw result over img.
 Parameters
img (str or ndarray) – The image to be displayed.
result (dict) – The classification results to draw over img.
text_color (str or tuple or
Color
) – Color of texts.font_scale (float) – Font scales of texts.
row_width (int) – width between each row of results on the image.
show (bool) – Whether to show the image. Default: False.
fig_size (tuple) – Image show figure size. Defaults to (15, 10).
win_name (str) – The window name.
wait_time (int) – How many seconds to display the image. Defaults to 0.
out_file (str or None) – The filename to write the image. Default: None.
 Returns
Image with overlaid results.
 Return type
img (ndarray)
 train_step(data, optimizer=None, **kwargs)[source]¶
The iteration step during training.
This method defines an iteration step during training, except for the back propagation and optimizer updating, which are done in an optimizer hook. Note that in some complicated cases or models, the whole process including back propagation and optimizer updating are also defined in this method, such as GAN.
 Parameters
data (dict) – The output of dataloader.
optimizer (
torch.optim.Optimizer
 dict, optional) – The optimizer of runner is passed totrain_step()
. This argument is unused and reserved.
 Returns
 Dict of outputs. The following fields are contained.
loss (torch.Tensor): A tensor for back propagation, which can be a weighted sum of multiple losses.
log_vars (dict): Dict contains all the variables to be sent to the logger.
num_samples (int): Indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.
 Return type
dict
 val_step(data, optimizer=None, **kwargs)[source]¶
The iteration step during validation.
This method shares the same signature as
train_step()
, but used during val epochs. Note that the evaluation after training epochs is not implemented with this method, but an evaluation hook. Parameters
data (dict) – The output of dataloader.
optimizer (
torch.optim.Optimizer
 dict, optional) – The optimizer of runner is passed totrain_step()
. This argument is unused and reserved.
 Returns
 Dict of outputs. The following fields are contained.
loss (torch.Tensor): A tensor for back propagation, which can be a weighted sum of multiple losses.
log_vars (dict): Dict contains all the variables to be sent to the logger.
num_samples (int): Indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.
 Return type
dict
 class mmcls.models.classifiers.ImageClassifier(backbone, neck=None, head=None, pretrained=None, train_cfg=None, init_cfg=None)[source]¶

 forward_train(img, gt_label, **kwargs)[source]¶
Forward computation during training.
 Parameters
img (Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.
gt_label (Tensor) – It should be of shape (N, 1) encoding the groundtruth label of input images for single label task. It shoulf be of shape (N, C) encoding the groundtruth label of input images for multilabels task.
 Returns
a dictionary of loss components
 Return type
dict[str, Tensor]
backbones¶
 class mmcls.models.backbones.AlexNet(num_classes= 1)[source]¶
AlexNet backbone.
The input for AlexNet is a 224x224 RGB image.
 Parameters
num_classes (int) – number of classes for classification. The default value is 1, which uses the backbone as a feature extractor without the top classifier.
 class mmcls.models.backbones.LeNet5(num_classes= 1)[source]¶
LeNet5 backbone.
The input for LeNet5 is a 32×32 grayscale image.
 Parameters
num_classes (int) – number of classes for classification. The default value is 1, which uses the backbone as a feature extractor without the top classifier.
 class mmcls.models.backbones.MlpMixer(arch='b', img_size=224, patch_size=16, out_indices= 1, drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_cfg={}, layer_cfgs={}, init_cfg=None)[source]¶
MlpMixer backbone.
Pytorch implementation of MLPMixer: An allMLP Architecture for Vision
 Parameters
arch (str  dict) – MLP Mixer architecture Defaults to ‘b’.
img_size (int  tuple) – Input image size.
patch_size (int  tuple) – The patch size.
out_indices (Sequence  int) – Output from which layer. Defaults to 1, means the last layer.
drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.
drop_path_rate (float) – stochastic depth rate. Defaults to 0.
norm_cfg (dict) – Config dict for normalization layer. Defaults to
dict(type='LN')
.act_cfg (dict) – The activation config for FFNs. Default GELU.
patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.
layer_cfgs (Sequence  dict) – Configs of each mixer block layer. Defaults to an empty dict.
init_cfg (dict, optional) – Initialization config dict. Defaults to None.
 class mmcls.models.backbones.MobileNetV2(widen_factor=1.0, out_indices=(7), frozen_stages= 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
MobileNetV2 backbone.
 Parameters
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.
out_indices (None or Sequence[int]) – Output from which stages. Default: (7, ).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: 1, which means not freezing any parameters.
conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
 forward(x)[source]¶
Forward computation.
 Parameters
x (tensor  tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.
 make_layer(out_channels, num_blocks, stride, expand_ratio)[source]¶
Stack InvertedResidual blocks to build a layer for MobileNetV2.
 Parameters
out_channels (int) – out_channels of block.
num_blocks (int) – number of blocks.
stride (int) – stride of the first block. Default: 1
expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio. Default: 6.
 class mmcls.models.backbones.MobileNetV3(arch='small', conv_cfg=None, norm_cfg={'eps': 0.001, 'momentum': 0.01, 'type': 'BN'}, out_indices=None, frozen_stages= 1, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d'], 'nonlinearity': 'leaky_relu'}, {'type': 'Normal', 'layer': ['Linear'], 'std': 0.01}, {'type': 'Constant', 'layer': ['BatchNorm2d'], 'val': 1}])[source]¶
MobileNetV3 backbone.
 Parameters
arch (str) – Architecture of mobilnetv3, from {small, large}. Default: small.
conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
out_indices (None or Sequence[int]) – Output from which stages. Default: None, which means output tensors from final stage.
frozen_stages (int) – Stages to be frozen (all param fixed). Default: 1, which means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
 class mmcls.models.backbones.RegNet(arch, in_channels=3, stem_channels=32, base_channels=32, strides=(2, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages= 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, init_cfg=None)[source]¶
RegNet backbone.
More details can be found in paper .
 Parameters
arch (dict) – The parameter of RegNets.  w0 (int): initial width  wa (float): slope of width  wm (float): quantization parameter to quantize the width  depth (int): depth of the backbone  group_w (int): width of group  bot_mul (float): bottleneck ratio, i.e. expansion of bottleneck.
strides (Sequence[int]) – Strides of the first block of each stage.
base_channels (int) – Base channels after stem layer.
in_channels (int) – Number of input image channels. Default: 3.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stridetwo layer is the 3x3 conv layer, otherwise the stridetwo layer is the first 1x1 conv layer. Default: “pytorch”.
frozen_stages (int) – Stages to be frozen (all param fixed). 1 means not freezing any parameters. Default: 1.
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
Example
>>> from mmcls.models import RegNet >>> import torch >>> self = RegNet( arch=dict( w0=88, wa=26.31, wm=2.25, group_w=48, depth=25, bot_mul=1.0)) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 96, 8, 8) (1, 192, 4, 4) (1, 432, 2, 2) (1, 1008, 1, 1)
 adjust_width_group(widths, bottleneck_ratio, groups)[source]¶
Adjusts the compatibility of widths and groups.
 Parameters
widths (list[int]) – Width of each stage.
bottleneck_ratio (float) – Bottleneck ratio.
groups (int) – number of groups in each stage
 Returns
The adjusted widths and groups of each stage.
 Return type
tuple(list)
 forward(x)[source]¶
Forward computation.
 Parameters
x (tensor  tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.
 generate_regnet(initial_width, width_slope, width_parameter, depth, divisor=8)[source]¶
Generates per block width from RegNet parameters.
 Parameters
initial_width ([int]) – Initial width of the backbone
width_slope ([float]) – Slope of the quantized linear function
width_parameter ([int]) – Parameter used to quantize the width.
depth ([int]) – Depth of the backbone.
divisor (int) – The divisor of channels. Defaults to 8.
 Returns
 tuple containing:
list: Widths of each stage.
int: The number of stages.
 Return type
tuple
 class mmcls.models.backbones.RepVGG(arch, in_channels=3, base_channels=64, out_indices=(3), strides=(2, 2, 2, 2), dilations=(1, 1, 1, 1), frozen_stages= 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, with_cp=False, deploy=False, norm_eval=False, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
RepVGG backbone.
A PyTorch impl of : RepVGG: Making VGGstyle ConvNets Great Again
 Parameters
arch (str  dict) –
The parameter of RepVGG. If it’s a dict, it should contain the following keys:
num_blocks (Sequence[int]): Number of blocks in each stage.
width_factor (Sequence[float]): Width deflator in each stage.
group_layer_map (dict  None): RepVGG Block that declares the need to apply group convolution.
se_cfg (dict  None): Se Layer config
in_channels (int) – Number of input image channels. Default: 3.
base_channels (int) – Base channels of RepVGG backbone, work with width_factor together. Default: 64.
out_indices (Sequence[int]) – Output from which stages. Default: (3, ).
strides (Sequence[int]) – Strides of the first block of each stage. Default: (2, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
frozen_stages (int) – Stages to be frozen (all param fixed). 1 means not freezing any parameters. Default: 1.
conv_cfg (dict  None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
deploy (bool) – Whether to switch the model structure to deployment mode. Default: False.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
init_cfg (dict or list[dict], optional) – Initialization config dict.
 class mmcls.models.backbones.Res2Net(scales=4, base_width=26, style='pytorch', deep_stem=True, avg_down=True, init_cfg=None, **kwargs)[source]¶
Res2Net backbone.
A PyTorch implement of : Res2Net: A New Multiscale Backbone Architecture
 Parameters
depth (int) – Depth of Res2Net, choose from {50, 101, 152}.
scales (int) – Scales used in Res2Net. Defaults to 4.
base_width (int) – Basic width of each scale. Defaults to 26.
in_channels (int) – Number of input image channels. Defaults to 3.
num_stages (int) – Number of Res2Net stages. Defaults to 4.
strides (Sequence[int]) – Strides of the first block of each stage. Defaults to
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Defaults to
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. Defaults to
(3, )
.style (str) – “pytorch” or “caffe”. If set to “pytorch”, the stridetwo layer is the 3x3 conv layer, otherwise the stridetwo layer is the first 1x1 conv layer. Defaults to “pytorch”.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Defaults to True.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottle2neck. Defaults to True.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). 1 means not freezing any parameters. Defaults to 1.
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to
dict(type='BN', requires_grad=True)
.norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Defaults to True.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
Example
>>> from mmcls.models import Res2Net >>> import torch >>> model = Res2Net(depth=50, ... scales=4, ... base_width=26, ... out_indices=(0, 1, 2, 3)) >>> model.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = model.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 256, 8, 8) (1, 512, 4, 4) (1, 1024, 2, 2) (1, 2048, 1, 1)
 class mmcls.models.backbones.ResNeSt(depth, groups=1, width_per_group=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs)[source]¶
ResNeSt backbone.
Please refer to the paper for details.
 Parameters
depth (int) – Network depth, from {50, 101, 152, 200}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
radix (int) – Radix of SpltAtConv2d. Default: 2
reduction_factor (int) – Reduction factor of SplitAttentionConv2d. Default: 4.
avg_down_stride (bool) – Whether to use average pool for stride in Bottleneck. Default: True.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stridetwo layer is the 3x3 conv layer, otherwise the stridetwo layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). 1 means not freezing any parameters. Default: 1.
conv_cfg (dict  None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
 class mmcls.models.backbones.ResNeXt(depth, groups=32, width_per_group=4, **kwargs)[source]¶
ResNeXt backbone.
Please refer to the paper for details.
 Parameters
depth (int) – Network depth, from {50, 101, 152}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stridetwo layer is the 3x3 conv layer, otherwise the stridetwo layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). 1 means not freezing any parameters. Default: 1.
conv_cfg (dict  None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
 class mmcls.models.backbones.ResNet(depth, in_channels=3, stem_channels=64, base_channels=64, expansion=None, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages= 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
ResNet backbone.
Please refer to the paper for details.
 Parameters
depth (int) – Network depth, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
base_channels (int) – Middle channels of the first stage. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stridetwo layer is the 3x3 conv layer, otherwise the stridetwo layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). 1 means not freezing any parameters. Default: 1.
conv_cfg (dict  None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
Example
>>> from mmcls.models import ResNet >>> import torch >>> self = ResNet(depth=18) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 64, 8, 8) (1, 128, 4, 4) (1, 256, 2, 2) (1, 512, 1, 1)
 class mmcls.models.backbones.ResNetV1d(**kwargs)[source]¶
ResNetV1d backbone.
This variant is described in Bag of Tricks..
Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.
 class mmcls.models.backbones.ResNet_CIFAR(depth, deep_stem=False, **kwargs)[source]¶
ResNet backbone for CIFAR.
Compared to standard ResNet, it uses kernel_size=3 and stride=1 in conv1, and does not apply MaxPoolinng after stem. It has been proven to be more efficient than standard ResNet in other public codebase, e.g., https://github.com/kuangliu/pytorchcifar/blob/master/models/resnet.py.
 Parameters
depth (int) – Network depth, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
base_channels (int) – Middle channels of the first stage. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stridetwo layer is the 3x3 conv layer, otherwise the stridetwo layer is the first 1x1 conv layer.
deep_stem (bool) – This network has specific designed stem, thus it is asserted to be False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). 1 means not freezing any parameters. Default: 1.
conv_cfg (dict  None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
 class mmcls.models.backbones.SEResNeXt(depth, groups=32, width_per_group=4, **kwargs)[source]¶
SEResNeXt backbone.
Please refer to the paper for details.
 Parameters
depth (int) – Network depth, from {50, 101, 152}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
se_ratio (int) – Squeeze ratio in SELayer. Default: 16.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stridetwo layer is the 3x3 conv layer, otherwise the stridetwo layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). 1 means not freezing any parameters. Default: 1.
conv_cfg (dict  None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
 class mmcls.models.backbones.SEResNet(depth, se_ratio=16, **kwargs)[source]¶
SEResNet backbone.
Please refer to the paper for details.
 Parameters
depth (int) – Network depth, from {50, 101, 152}.
se_ratio (int) – Squeeze ratio in SELayer. Default: 16.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default:
(1, 2, 2, 2)
.dilations (Sequence[int]) – Dilation of each stage. Default:
(1, 1, 1, 1)
.out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default:
(3, )
.style (str) – pytorch or caffe. If set to “pytorch”, the stridetwo layer is the 3x3 conv layer, otherwise the stridetwo layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). 1 means not freezing any parameters. Default: 1.
conv_cfg (dict  None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
Example
>>> from mmcls.models import SEResNet >>> import torch >>> self = SEResNet(depth=50) >>> self.eval() >>> inputs = torch.rand(1, 3, 224, 224) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 64, 56, 56) (1, 128, 28, 28) (1, 256, 14, 14) (1, 512, 7, 7)
 class mmcls.models.backbones.ShuffleNetV1(groups=3, widen_factor=1.0, out_indices=(2), frozen_stages= 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False, init_cfg=None)[source]¶
ShuffleNetV1 backbone.
 Parameters
groups (int) – The number of groups to be used in grouped 1x1 convolutions in each ShuffleUnit. Default: 3.
widen_factor (float) – Width multiplier  adjusts the number of channels in each layer by this amount. Default: 1.0.
out_indices (Sequence[int]) – Output from which stages. Default: (2, )
frozen_stages (int) – Stages to be frozen (all param fixed). Default: 1, which means not freezing any parameters.
conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
 forward(x)[source]¶
Forward computation.
 Parameters
x (tensor  tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.
 make_layer(out_channels, num_blocks, first_block=False)[source]¶
Stack ShuffleUnit blocks to make a layer.
 Parameters
out_channels (int) – out_channels of the block.
num_blocks (int) – Number of blocks.
first_block (bool) – Whether is the first ShuffleUnit of a sequential ShuffleUnits. Default: False, which means using the grouped 1x1 convolution.
 class mmcls.models.backbones.ShuffleNetV2(widen_factor=1.0, out_indices=(3), frozen_stages= 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False, init_cfg=None)[source]¶
ShuffleNetV2 backbone.
 Parameters
widen_factor (float) – Width multiplier  adjusts the number of channels in each layer by this amount. Default: 1.0.
out_indices (Sequence[int]) – Output from which stages. Default: (0, 1, 2, 3).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: 1, which means not freezing any parameters.
conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
 class mmcls.models.backbones.SwinTransformer(arch='T', img_size=224, in_channels=3, drop_rate=0.0, drop_path_rate=0.1, out_indices=(3), use_abs_pos_embed=False, auto_pad=False, with_cp=False, norm_cfg={'type': 'LN'}, stage_cfgs={}, patch_cfg={}, init_cfg=None)[source]¶
Swin Transformer A PyTorch implement of : Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Inspiration from https://github.com/microsoft/SwinTransformer
 Parameters
arch (str  dict) – Swin Transformer architecture Defaults to ‘T’.
img_size (int  tuple) – The size of input image. Defaults to 224.
in_channels (int) – The num of input channels. Defaults to 3.
drop_rate (float) – Dropout rate after embedding. Defaults to 0.
drop_path_rate (float) – Stochastic depth rate. Defaults to 0.1.
use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults to False.
with_cp (bool, optional) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.
auto_pad (bool) – If True, auto pad feature map to fit window_size. Defaults to False.
norm_cfg (dict, optional) – Config dict for normalization layer at end of backone. Defaults to dict(type=’LN’)
stage_cfgs (Sequence  dict, optional) – Extra config dict for each stage. Defaults to empty dict.
patch_cfg (dict, optional) – Extra config dict for patch embedding. Defaults to empty dict.
init_cfg (dict, optional) – The Config for initialization. Defaults to None.
Examples
>>> from mmcls.models import SwinTransformer >>> import torch >>> extra_config = dict( >>> arch='tiny', >>> stage_cfgs=dict(downsample_cfg={'kernel_size': 3, >>> 'expansion_ratio': 3}), >>> auto_pad=True) >>> self = SwinTransformer(**extra_config) >>> inputs = torch.rand(1, 3, 224, 224) >>> output = self.forward(inputs) >>> print(output.shape) (1, 2592, 4)
 class mmcls.models.backbones.T2T_ViT(img_size=224, in_channels=3, embed_dims=384, t2t_cfg={}, drop_rate=0.0, num_layers=14, out_indices= 1, layer_cfgs={}, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, final_norm=True, output_cls_token=True, init_cfg=None)[source]¶
TokenstoToken Vision Transformer (T2TViT)
A PyTorch implementation of `TokenstoToken ViT: Training Vision Transformers from Scratch on ImageNet<https://arxiv.org/abs/2101.11986>`_
 Parameters
img_size (int) – Input image size.
in_channels (int) – Number of input channels.
embed_dims (int) – Embedding dimension.
t2t_cfg (dict) – Extra config of TokenstoToken module. Defaults to an empty dict.
drop_rate (float) – Dropout rate after position embedding. Defaults to 0.
num_layers (int) – Num of transformer layers in encoder. Defaults to 14.
out_indices (Sequence  int) – Output from which stages. Defaults to 1, means the last stage.
layer_cfgs (Sequence  dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.
drop_path_rate (float) – stochastic depth rate. Defaults to 0.
norm_cfg (dict) – Config dict for normalization layer. Defaults to
dict(type='LN')
.final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.
output_cls_token (bool) – Whether output the cls_token. Defaults to True.
init_cfg (dict, optional) – The Config for initialization. Defaults to None.
 class mmcls.models.backbones.TIMMBackbone(model_name, pretrained=False, checkpoint_path='', in_channels=3, init_cfg=None, **kwargs)[source]¶
Wrapper to use backbones from timm library. More details can be found in timm .
 Parameters
model_name (str) – Name of timm model to instantiate.
pretrained (bool) – Load pretrained weights if True.
checkpoint_path (str) – Path of checkpoint to load after model is initialized.
in_channels (int) – Number of input image channels. Default: 3.
init_cfg (dict, optional) – Initialization config dict
**kwargs – Other timm & model specific arguments.
 class mmcls.models.backbones.TNT(arch='b', img_size=224, patch_size=16, in_channels=3, ffn_ratio=4, qkv_bias=False, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, first_stride=4, num_fcs=2, init_cfg=[{'type': 'TruncNormal', 'layer': 'Linear', 'std': 0.02}, {'type': 'Constant', 'layer': 'LayerNorm', 'val': 1.0, 'bias': 0.0}])[source]¶
Transformer in Transformer A PyTorch implement of : Transformer in Transformer
Inspiration from https://github.com/rwightman/pytorchimagemodels/blob/master/timm/models/tnt.py
 Parameters
arch (str  dict) – Vision Transformer architecture Default: ‘b’
img_size (int  tuple) – Input image size. Default to 224
patch_size (int  tuple) – The patch size. Deault to 16
in_channels (int) – Number of input channels. Default to 3
ffn_ratio (int) – A ratio to calculate the hidden_dims in ffn layer. Default: 4
qkv_bias (bool) – Enable bias for qkv if True. Default False
drop_rate (float) – Probability of an element to be zeroed after the feed forward layer. Default 0.
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.
drop_path_rate (float) – stochastic depth rate. Default 0.
act_cfg (dict) – The activation config for FFNs. Defaults to GELU.
norm_cfg (dict) – Config dict for normalization layer. Default layer normalization
first_stride (int) – The stride of the conv2d layer. We use a conv2d layer and a unfold layer to implement image to pixel embedding.
num_fcs (int) – The number of fullyconnected layers for FFNs. Default 2
init_cfg (dict, optional) – Initialization config dict
 class mmcls.models.backbones.VGG(depth, num_classes= 1, num_stages=5, dilations=(1, 1, 1, 1, 1), out_indices=None, frozen_stages= 1, conv_cfg=None, norm_cfg=None, act_cfg={'type': 'ReLU'}, norm_eval=False, ceil_mode=False, with_last_pool=True, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1.0, 'layer': ['_BatchNorm']}, {'type': 'Normal', 'std': 0.01, 'layer': ['Linear']}])[source]¶
VGG backbone.
 Parameters
depth (int) – Depth of vgg, from {11, 13, 16, 19}.
with_norm (bool) – Use BatchNorm or not.
num_classes (int) – number of classes for classification.
num_stages (int) – VGG stages, normally 5.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int], optional) – Output from which stages. When it is None, the default behavior depends on whether num_classes is specified. If num_classes <= 0, the default value is (4, ), output the last feature map before classifier. If num_classes > 0, the default value is (5, ), output the classification score. Default: None.
frozen_stages (int) – Stages to be frozen (all param fixed). 1 means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
ceil_mode (bool) – Whether to use ceil_mode of MaxPool. Default: False.
with_last_pool (bool) – Whether to keep the last pooling before classifier. Default: True.
 class mmcls.models.backbones.VisionTransformer(arch='b', img_size=224, patch_size=16, out_indices= 1, drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'eps': 1e06, 'type': 'LN'}, final_norm=True, output_cls_token=True, interpolate_mode='bicubic', patch_cfg={}, layer_cfgs={}, init_cfg=None)[source]¶
Vision Transformer.
A PyTorch implement of : `An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale<https://arxiv.org/abs/2010.11929>`_
 Parameters
arch (str  dict) – Vision Transformer architecture Default: ‘b’
img_size (int  tuple) – Input image size
patch_size (int  tuple) – The patch size
out_indices (Sequence  int) – Output from which stages. Defaults to 1, means the last stage.
drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.
drop_path_rate (float) – stochastic depth rate. Defaults to 0.
norm_cfg (dict) – Config dict for normalization layer. Defaults to
dict(type='LN')
.final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.
output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Defaults to True.
interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Defaults to “bicubic”.
patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.
layer_cfgs (Sequence  dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.
init_cfg (dict, optional) – Initialization config dict. Defaults to None.
 forward(x)[source]¶
Forward computation.
 Parameters
x (tensor  tuple[tensor]) – x could be a Torch.tensor or a tuple of Torch.tensor, containing input data for forward computation.
 static resize_pos_embed(pos_embed, src_shape, dst_shape, mode='bicubic')[source]¶
Resize pos_embed weights.
 Parameters
pos_embed (torch.Tensor) – Position embedding weights with shape [1, L, C].
src_shape (tuple) – The resolution of downsampled origin training image.
dst_shape (tuple) – The resolution of downsampled new training image.
mode (str) – Algorithm used for upsampling:
'nearest'
'linear'
'bilinear'
'bicubic'
'trilinear'
. Default:'bicubic'
 Returns
The resized pos_embed of shape [1, L_new, C]
 Return type
torch.Tensor
necks¶
 class mmcls.models.necks.GlobalAveragePooling(dim=2)[source]¶
Global Average Pooling neck.
Note that we use view to remove extra channel after pooling. We do not use squeeze as it will also remove the batch dimension when the tensor has a batch dimension of size 1, which can lead to unexpected errors.
 Parameters
dim (int) – Dimensions of each sample channel, can be one of {1, 2, 3}. Default: 2
 forward(inputs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
heads¶
 class mmcls.models.heads.ClsHead(loss={'loss_weight': 1.0, 'type': 'CrossEntropyLoss'}, topk=(1), cal_acc=False, init_cfg=None)[source]¶
classification head.
 Parameters
loss (dict) – Config of classification loss.
topk (int  tuple) – Topk accuracy.
cal_acc (bool) – Whether to calculate accuracy during training. If you use Mixup/CutMix or something like that during training, it is not reasonable to calculate accuracy. Defaults to False.
 class mmcls.models.heads.LinearClsHead(num_classes, in_channels, init_cfg={'layer': 'Linear', 'std': 0.01, 'type': 'Normal'}, *args, **kwargs)[source]¶
Linear classifier head.
 Parameters
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
init_cfg (dict  optional) – The extra init config of layers. Defaults to use dict(type=’Normal’, layer=’Linear’, std=0.01).
 class mmcls.models.heads.MultiLabelClsHead(loss={'loss_weight': 1.0, 'reduction': 'mean', 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, init_cfg=None)[source]¶
Classification head for multilabel task.
 Parameters
loss (dict) – Config of classification loss.
 class mmcls.models.heads.MultiLabelLinearClsHead(num_classes, in_channels, loss={'loss_weight': 1.0, 'reduction': 'mean', 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, init_cfg={'layer': 'Linear', 'std': 0.01, 'type': 'Normal'})[source]¶
Linear classification head for multilabel task.
 Parameters
num_classes (int) – Number of categories.
in_channels (int) – Number of channels in the input feature map.
loss (dict) – Config of classification loss.
init_cfg (dict  optional) – The extra init config of layers. Defaults to use dict(type=’Normal’, layer=’Linear’, std=0.01).
 class mmcls.models.heads.StackedLinearClsHead(num_classes: int, in_channels: int, mid_channels: Sequence, dropout_rate: float = 0.0, norm_cfg: Optional[Dict] = None, act_cfg: Dict = {'type': 'ReLU'}, **kwargs)[source]¶
Classifier head with several hidden fc layer and a output fc layer.
 Parameters
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
mid_channels (Sequence) – Number of channels in the hidden fc layers.
dropout_rate (float) – Dropout rate after each hidden fc layer, except the last layer. Defaults to 0.
norm_cfg (dict, optional) – Config dict of normalization layer after each hidden fc layer, except the last layer. Defaults to None.
act_cfg (dict, optional) – Config dict of activation function after each hidden layer, except the last layer. Defaults to use “ReLU”.
 class mmcls.models.heads.VisionTransformerClsHead(num_classes, in_channels, hidden_dim=None, act_cfg={'type': 'Tanh'}, init_cfg={'layer': 'Linear', 'type': 'Constant', 'val': 0}, *args, **kwargs)[source]¶
Vision Transformer classifier head.
 Parameters
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
hidden_dim (int) – Number of the dimensions for hidden layer. Only available during pretraining. Default None.
act_cfg (dict) – The activation config. Only available during pretraining. Defaults to Tanh.
losses¶
 class mmcls.models.losses.AsymmetricLoss(gamma_pos=0.0, gamma_neg=4.0, clip=0.05, reduction='mean', loss_weight=1.0)[source]¶
asymmetric loss.
 Parameters
gamma_pos (float) – positive focusing parameter. Defaults to 0.0.
gamma_neg (float) – Negative focusing parameter. We usually set gamma_neg > gamma_pos. Defaults to 4.0.
clip (float, optional) – Probability margin. Defaults to 0.05.
reduction (str) – The method used to reduce the loss into a scalar.
loss_weight (float) – Weight of loss. Defaults to 1.0.
 class mmcls.models.losses.CrossEntropyLoss(use_sigmoid=False, use_soft=False, reduction='mean', loss_weight=1.0, class_weight=None, pos_weight=None)[source]¶
Cross entropy loss.
 Parameters
use_sigmoid (bool) – Whether the prediction uses sigmoid of softmax. Defaults to False.
use_soft (bool) – Whether to use the soft version of CrossEntropyLoss. Defaults to False.
reduction (str) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. Defaults to ‘mean’.
loss_weight (float) – Weight of the loss. Defaults to 1.0.
class_weight (List[float], optional) – The weight for each class with shape (C), C is the number of classes. Default None.
pos_weight (List[float], optional) – The positive weight for each class with shape (C), C is the number of classes. Only enabled in BCE loss when
use_sigmoid
is True. Default None.
 forward(cls_score, label, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 class mmcls.models.losses.FocalLoss(gamma=2.0, alpha=0.25, reduction='mean', loss_weight=1.0)[source]¶
Focal loss.
 Parameters
gamma (float) – Focusing parameter in focal loss. Defaults to 2.0.
alpha (float) – The parameter in balanced form of focal loss. Defaults to 0.25.
reduction (str) – The method used to reduce the loss into a scalar. Options are “none” and “mean”. Defaults to ‘mean’.
loss_weight (float) – Weight of loss. Defaults to 1.0.
 forward(pred, target, weight=None, avg_factor=None, reduction_override=None)[source]¶
Sigmoid focal loss.
 Parameters
pred (torch.Tensor) – The prediction with shape (N, *).
target (torch.Tensor) – The ground truth label of the prediction with shape (N, *), N or (N,1).
weight (torch.Tensor, optional) – Samplewise loss weight with shape (N, *). Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The method used to reduce the loss into a scalar. Options are “none”, “mean” and “sum”. Defaults to None.
 Returns
Loss.
 Return type
torch.Tensor
 class mmcls.models.losses.LabelSmoothLoss(label_smooth_val, num_classes=None, mode=None, reduction='mean', loss_weight=1.0)[source]¶
Initializer for the label smoothed cross entropy loss.
Refers to Rethinking the Inception Architecture for Computer Vision
This decreases gap between output scores and encourages generalization. Labels provided to forward can be onehot like vectors (NxC) or class indices (Nx1). And this accepts linear combination of onehot like labels from mixup or cutmix except multilabel task.
 Parameters
label_smooth_val (float) – The degree of label smoothing.
num_classes (int, optional) – Number of classes. Defaults to None.
mode (str) – Refers to notes, Options are ‘original’, ‘classy_vision’, ‘multi_label’. Defaults to ‘classy_vision’
reduction (str) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. Defaults to ‘mean’.
loss_weight (float) – Weight of the loss. Defaults to 1.0.
Notes
if the mode is “original”, this will use the same label smooth method as the original paper as:
\[(1\epsilon)\delta_{k, y} + \frac{\epsilon}{K}\]where epsilon is the label_smooth_val, K is the num_classes and delta(k,y) is Dirac delta, which equals 1 for k=y and 0 otherwise.
if the mode is “classy_vision”, this will use the same label smooth method as the facebookresearch/ClassyVision repo as:
\[\frac{\delta_{k, y} + \epsilon/K}{1+\epsilon}\]if the mode is “multi_label”, this will accept labels from multilabel task and smoothing them as:
\[(12\epsilon)\delta_{k, y} + \epsilon\] forward(cls_score, label, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]¶
Label smooth loss.
 Parameters
pred (torch.Tensor) – The prediction with shape (N, *).
label (torch.Tensor) – The ground truth label of the prediction with shape (N, *).
weight (torch.Tensor, optional) – Samplewise loss weight with shape (N, *). Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The method used to reduce the loss into a scalar. Options are “none”, “mean” and “sum”. Defaults to None.
 Returns
Loss.
 Return type
torch.Tensor
 class mmcls.models.losses.SeesawLoss(use_sigmoid=False, p=0.8, q=2.0, num_classes=1000, eps=0.01, reduction='mean', loss_weight=1.0)[source]¶
Implementation of seesaw loss.
Refers to Seesaw Loss for LongTailed Instance Segmentation (CVPR 2021)
 Parameters
use_sigmoid (bool) – Whether the prediction uses sigmoid of softmax. Only False is supported. Defaults to False.
p (float) – The
p
in the mitigation factor. Defaults to 0.8.q (float) – The
q
in the compenstation factor. Defaults to 2.0.num_classes (int) – The number of classes. Default to 1000 for the ImageNet dataset.
eps (float) – The minimal value of divisor to smooth the computation of compensation factor, default to 1e2.
reduction (str) – The method that reduces the loss to a scalar. Options are “none”, “mean” and “sum”. Default to “mean”.
loss_weight (float) – The weight of the loss. Defaults to 1.0
 forward(cls_score, labels, weight=None, avg_factor=None, reduction_override=None)[source]¶
Forward function.
 Parameters
cls_score (torch.Tensor) – The prediction with shape (N, C).
labels (torch.Tensor) – The learning label of the prediction.
weight (torch.Tensor, optional) – Samplewise loss weight.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction (str, optional) – The method used to reduce the loss. Options are “none”, “mean” and “sum”.
 Returns
The calculated loss
 Return type
torch.Tensor
 mmcls.models.losses.accuracy(pred, target, topk=1, thrs=0.0)[source]¶
Calculate accuracy according to the prediction and target.
 Parameters
pred (torch.Tensor  np.array) – The model prediction.
target (torch.Tensor  np.array) – The target of each prediction
topk (int  tuple[int]) – If the predictions in
topk
matches the target, the predictions will be regarded as correct ones. Defaults to 1.thrs (Number  tuple[Number], optional) – Predictions with scores under the thresholds are considered negative. Default to 0.
 Returns
 Accuracy
float: If both
topk
andthrs
is a single value.list[float]: If one of
topk
orthrs
is a tuple.list[list[float]]: If both
topk
andthrs
is a tuple. And the first dim istopk
, the second dim isthrs
.
 Return type
float  list[float]  list[list[float]]
 mmcls.models.losses.asymmetric_loss(pred, target, weight=None, gamma_pos=1.0, gamma_neg=4.0, clip=0.05, reduction='mean', avg_factor=None)[source]¶
asymmetric loss.
Please refer to the paper for details.
 Parameters
pred (torch.Tensor) – The prediction with shape (N, *).
target (torch.Tensor) – The ground truth label of the prediction with shape (N, *).
weight (torch.Tensor, optional) – Samplewise loss weight with shape (N, ). Defaults to None.
gamma_pos (float) – positive focusing parameter. Defaults to 0.0.
gamma_neg (float) – Negative focusing parameter. We usually set gamma_neg > gamma_pos. Defaults to 4.0.
clip (float, optional) – Probability margin. Defaults to 0.05.
reduction (str) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. If reduction is ‘none’ , loss is same shape as pred and label. Defaults to ‘mean’.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
 Returns
Loss.
 Return type
torch.Tensor
 mmcls.models.losses.binary_cross_entropy(pred, label, weight=None, reduction='mean', avg_factor=None, class_weight=None, pos_weight=None)[source]¶
Calculate the binary CrossEntropy loss with logits.
 Parameters
pred (torch.Tensor) – The prediction with shape (N, *).
label (torch.Tensor) – The gt label with shape (N, *).
weight (torch.Tensor, optional) – Elementwise weight of loss with shape (N, ). Defaults to None.
reduction (str) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. If reduction is ‘none’ , loss is same shape as pred and label. Defaults to ‘mean’.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
class_weight (torch.Tensor, optional) – The weight for each class with shape (C), C is the number of classes. Default None.
pos_weight (torch.Tensor, optional) – The positive weight for each class with shape (C), C is the number of classes. Default None.
 Returns
The calculated loss
 Return type
torch.Tensor
 mmcls.models.losses.convert_to_one_hot(targets: torch.Tensor, classes) → torch.Tensor[source]¶
This function converts target class indices to onehot vectors, given the number of classes.
 Parameters
targets (Tensor) – The ground truth label of the prediction with shape (N, 1)
classes (int) – the number of classes.
 Returns
Processed loss values.
 Return type
Tensor
 mmcls.models.losses.cross_entropy(pred, label, weight=None, reduction='mean', avg_factor=None, class_weight=None)[source]¶
Calculate the CrossEntropy loss.
 Parameters
pred (torch.Tensor) – The prediction with shape (N, C), C is the number of classes.
label (torch.Tensor) – The gt label of the prediction.
weight (torch.Tensor, optional) – Samplewise loss weight.
reduction (str) – The method used to reduce the loss.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
class_weight (torch.Tensor, optional) – The weight for each class with shape (C), C is the number of classes. Default None.
 Returns
The calculated loss
 Return type
torch.Tensor
 mmcls.models.losses.reduce_loss(loss, reduction)[source]¶
Reduce loss as specified.
 Parameters
loss (Tensor) – Elementwise loss tensor.
reduction (str) – Options are “none”, “mean” and “sum”.
 Returns
Reduced loss tensor.
 Return type
Tensor
 mmcls.models.losses.sigmoid_focal_loss(pred, target, weight=None, gamma=2.0, alpha=0.25, reduction='mean', avg_factor=None)[source]¶
Sigmoid focal loss.
 Parameters
pred (torch.Tensor) – The prediction with shape (N, *).
target (torch.Tensor) – The ground truth label of the prediction with shape (N, *).
weight (torch.Tensor, optional) – Samplewise loss weight with shape (N, ). Defaults to None.
gamma (float) – The gamma for calculating the modulating factor. Defaults to 2.0.
alpha (float) – A balanced form for Focal Loss. Defaults to 0.25.
reduction (str) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. If reduction is ‘none’ , loss is same shape as pred and label. Defaults to ‘mean’.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
 Returns
Loss.
 Return type
torch.Tensor
 mmcls.models.losses.weight_reduce_loss(loss, weight=None, reduction='mean', avg_factor=None)[source]¶
Apply elementwise weight and reduce loss.
 Parameters
loss (Tensor) – Elementwise loss.
weight (Tensor) – Elementwise weights.
reduction (str) – Same as builtin losses of PyTorch.
avg_factor (float) – Average factor when computing the mean of losses.
 Returns
Processed loss values.
 Return type
Tensor
 mmcls.models.losses.weighted_loss(loss_func)[source]¶
Create a weighted version of a given loss function.
To use this decorator, the loss function must have the signature like
loss_func(pred, target, **kwargs)
. The function only needs to compute elementwise loss without any reduction. This decorator will add weight and reduction arguments to the function. The decorated function will have the signature likeloss_func(pred, target, weight=None, reduction='mean', avg_factor=None, **kwargs)
. Example
>>> import torch >>> @weighted_loss >>> def l1_loss(pred, target): >>> return (pred  target).abs()
>>> pred = torch.Tensor([0, 2, 3]) >>> target = torch.Tensor([1, 1, 1]) >>> weight = torch.Tensor([1, 0, 1])
>>> l1_loss(pred, target) tensor(1.3333) >>> l1_loss(pred, target, weight) tensor(1.) >>> l1_loss(pred, target, reduction='none') tensor([1., 1., 2.]) >>> l1_loss(pred, target, weight, avg_factor=2) tensor(1.5000)
utils¶
 class mmcls.models.utils.Augments(augments_cfg)[source]¶
Data augments.
We implement some data augmentation methods, such as mixup, cutmix.
 Parameters
(list[mmcv.ConfigDict]  obj (augments_cfg) – mmcv.ConfigDict): Config dict of augments
Example
>>> augments_cfg = [ dict(type='BatchCutMix', alpha=1., num_classes=10, prob=0.5), dict(type='BatchMixup', alpha=1., num_classes=10, prob=0.3) ] >>> augments = Augments(augments_cfg) >>> imgs = torch.randn(16, 3, 32, 32) >>> label = torch.randint(0, 10, (16, )) >>> imgs, label = augments(imgs, label)
To decide which augmentation within Augments block is used the following rule is applied. We pick augmentation based on the probabilities. In the example above, we decide if we should use BatchCutMix with probability 0.5, BatchMixup 0.3. As Identity is not in augments_cfg, we use Identity with probability 1  0.5  0.3 = 0.2.
 class mmcls.models.utils.HybridEmbed(backbone, img_size=224, feature_size=None, in_channels=3, embed_dims=768, conv_cfg=None, init_cfg=None)[source]¶
CNN Feature Map Embedding.
Extract feature map from CNN, flatten, project to embedding dim.
 Parameters
backbone (nn.Module) – CNN backbone
img_size (int  tuple) – The size of input image. Default: 224
feature_size (int  tuple, optional) – Size of feature map extracted by CNN backbone. Default: None
in_channels (int) – The num of input channels. Default: 3
embed_dims (int) – The dimensions of embedding. Default: 768
conv_cfg (dict, optional) – The config dict for conv layers. Default: None.
init_cfg (mmcv.ConfigDict, optional) – The Config for initialization. Default: None.
 forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 class mmcls.models.utils.InvertedResidual(in_channels, out_channels, mid_channels, kernel_size=3, stride=1, se_cfg=None, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, with_cp=False, init_cfg=None)[source]¶
Inverted Residual Block.
 Parameters
in_channels (int) – The input channels of this Module.
out_channels (int) – The output channels of this Module.
mid_channels (int) – The input channels of the depthwise convolution.
kernel_size (int) – The kernel size of the depthwise convolution. Default: 3.
stride (int) – The stride of the depthwise convolution. Default: 1.
se_cfg (dict) – Config dict for se layer. Default: None, which means no se layer.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
 Returns
The output tensor.
 Return type
Tensor
 forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 class mmcls.models.utils.MultiheadAttention(embed_dims, num_heads, input_dims=None, attn_drop=0.0, proj_drop=0.0, dropout_layer={'drop_prob': 0.0, 'type': 'Dropout'}, qkv_bias=True, qk_scale=None, proj_bias=True, v_shortcut=False, init_cfg=None)[source]¶
Multihead Attention Module.
This module implements multihead attention that supports different input dims and embed dims. And it also supports a shortcut from
value
, which is useful if input dims is not the same with embed dims. Parameters
embed_dims (int) – The embedding dimension.
num_heads (int) – Parallel attention heads.
input_dims (int, optional) – The input dimension, and if None, use
embed_dims
. Defaults to None.attn_drop (float) – Dropout rate of the dropout layer after the attention calculation of query and key. Defaults to 0.
proj_drop (float) – Dropout rate of the dropout layer after the output projection. Defaults to 0.
dropout_layer (dict) – The dropout config before adding the shortcut. Defaults to
dict(type='Dropout', drop_prob=0.)
.qkv_bias (bool) – If True, add a learnable bias to q, k, v. Defaults to True.
qk_scale (float, optional) – Override default qk scale of
head_dim ** 0.5
if set. Defaults to None.proj_bias (bool) – Defaults to True.
v_shortcut (bool) – Add a shortcut from value to output. It’s usually used if
input_dims
is different fromembed_dims
. Defaults to False.init_cfg (dict, optional) – The Config for initialization. Defaults to None.
 forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 class mmcls.models.utils.PatchEmbed(img_size=224, in_channels=3, embed_dims=768, norm_cfg=None, conv_cfg=None, init_cfg=None)[source]¶
Image to Patch Embedding.
We use a conv layer to implement PatchEmbed.
 Parameters
img_size (int  tuple) – The size of input image. Default: 224
in_channels (int) – The num of input channels. Default: 3
embed_dims (int) – The dimensions of embedding. Default: 768
norm_cfg (dict, optional) – Config dict for normalization layer. Default: None
conv_cfg (dict, optional) – The config dict for conv layers. Default: None
init_cfg (mmcv.ConfigDict, optional) – The Config for initialization. Default: None
 forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 class mmcls.models.utils.PatchMerging(input_resolution, in_channels, expansion_ratio, kernel_size=2, stride=None, padding=0, dilation=1, bias=False, norm_cfg={'type': 'LN'}, init_cfg=None)[source]¶
Merge patch feature map.
This layer use nn.Unfold to group feature map by kernel_size, and use norm and linear layer to embed grouped feature map.
 Parameters
input_resolution (tuple) – The size of input patch resolution.
in_channels (int) – The num of input channels.
expansion_ratio (Number) – Expansion ratio of output channels. The num of output channels is equal to int(expansion_ratio * in_channels).
kernel_size (int  tuple, optional) – the kernel size in the unfold layer. Defaults to 2.
stride (int  tuple, optional) – the stride of the sliding blocks in the unfold layer. Defaults to be equal with kernel_size.
padding (int  tuple, optional) – zero padding width in the unfold layer. Defaults to 0.
dilation (int  tuple, optional) – dilation parameter in the unfold layer. Defaults to 1.
bias (bool, optional) – Whether to add bias in linear layer or not. Defaults to False.
norm_cfg (dict, optional) – Config dict for normalization layer. Defaults to dict(type=’LN’).
init_cfg (dict, optional) – The extra config for initialization. Defaults to None.
 class mmcls.models.utils.SELayer(channels, squeeze_channels=None, ratio=16, divisor=8, bias='auto', conv_cfg=None, act_cfg=({'type': 'ReLU'}, {'type': 'Sigmoid'}), init_cfg=None)[source]¶
SqueezeandExcitation Module.
 Parameters
channels (int) – The input (and output) channels of the SE layer.
squeeze_channels (None or int) – The intermediate channel number of SElayer. Default: None, means the value of
squeeze_channels
ismake_divisible(channels // ratio, divisor)
.ratio (int) – Squeeze ratio in SELayer, the intermediate channel will be
make_divisible(channels // ratio, divisor)
. Only used whensqueeze_channels
is None. Default: 16.divisor (int) – The divisor to true divide the channel number. Only used when
squeeze_channels
is None. Default: 8.conv_cfg (None or dict) – Config dict for convolution layer. Default: None, which means using conv2d.
act_cfg (dict or Sequence[dict]) – Config dict for activation layer. If act_cfg is a dict, two activation layers will be configurated by this dict. If act_cfg is a sequence of dicts, the first activation layer will be configurated by the first dict and the second activation layer will be configurated by the second dict. Default: (dict(type=’ReLU’), dict(type=’Sigmoid’))
 forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 class mmcls.models.utils.ShiftWindowMSA(embed_dims, input_resolution, num_heads, window_size, shift_size=0, qkv_bias=True, qk_scale=None, attn_drop=0, proj_drop=0, dropout_layer={'drop_prob': 0.0, 'type': 'DropPath'}, auto_pad=False, init_cfg=None)[source]¶
Shift Window Multihead SelfAttention Module.
 Parameters
embed_dims (int) – Number of input channels.
input_resolution (Tuple[int, int]) – The resolution of the input feature map.
num_heads (int) – Number of attention heads.
window_size (int) – The height and width of the window.
shift_size (int, optional) – The shift step of each window towards rightbottom. If zero, act as regular windowmsa. Defaults to 0.
qkv_bias (bool, optional) – If True, add a learnable bias to q, k, v. Default: True
qk_scale (float  None, optional) – Override default qk scale of head_dim ** 0.5 if set. Defaults to None.
attn_drop (float, optional) – Dropout ratio of attention weight. Defaults to 0.0.
proj_drop (float, optional) – Dropout ratio of output. Defaults to 0.
dropout_layer (dict, optional) – The dropout_layer used before output. Defaults to dict(type=’DropPath’, drop_prob=0.).
auto_pad (bool, optional) – Auto pad the feature map to be divisible by window_size, Defaults to False.
init_cfg (dict, optional) – The extra config for initialization. Default: None.
 forward(query)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 mmcls.models.utils.channel_shuffle(x, groups)[source]¶
Channel Shuffle operation.
This function enables crossgroup information flow for multiple groups convolution layers.
 Parameters
x (Tensor) – The input tensor.
groups (int) – The number of groups to divide the input tensor in the channel dimension.
 Returns
The output tensor after channel shuffle operation.
 Return type
Tensor
 mmcls.models.utils.make_divisible(value, divisor, min_value=None, min_ratio=0.9)[source]¶
Make divisible function.
This function rounds the channel number down to the nearest value that can be divisible by the divisor.
 Parameters
value (int) – The original channel number.
divisor (int) – The divisor to fully divide the channel number.
min_value (int, optional) – The minimum value of the output channel. Default: None, means that the minimum value equal to the divisor.
min_ratio (float) – The minimum ratio of the rounded channel number to the original channel number. Default: 0.9.
 Returns
The modified output channel number
 Return type
int
mmcls.datasets¶
datasets¶
 class mmcls.datasets.BaseDataset(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[source]¶
Base dataset.
 Parameters
data_prefix (str) – the prefix of data path
pipeline (list) – a list of dict, where each element represents a operation defined in mmcls.datasets.pipelines
ann_file (str  None) – the annotation file. When ann_file is str, the subclass is expected to read from the ann_file. When ann_file is None, the subclass is expected to read according to data_prefix
test_mode (bool) – in train mode or test mode
 property class_to_idx¶
Map mapping class name to class index.
 Returns
mapping from class name to class index.
 Return type
dict
 evaluate(results, metric='accuracy', metric_options=None, logger=None)[source]¶
Evaluate the dataset.
 Parameters
results (list) – Testing results of the dataset.
metric (str  list[str]) – Metrics to be evaluated. Default value is accuracy.
metric_options (dict, optional) – Options for calculating metrics. Allowed keys are ‘topk’, ‘thrs’ and ‘average_mode’. Defaults to None.
logger (logging.Logger  str, optional) – Logger used for printing related information during evaluation. Defaults to None.
 Returns
evaluation results
 Return type
dict
 get_cat_ids(idx: int) → List[int][source]¶
Get category id by index.
 Parameters
idx (int) – Index of data.
 Returns
Image category of specified index.
 Return type
cat_ids (List[int])
 classmethod get_classes(classes=None)[source]¶
Get class names of current dataset.
 Parameters
classes (Sequence[str]  str  None) – If classes is None, use default CLASSES defined by builtin dataset. If classes is a string, take it as a file name. The file contains the name of classes where each line contains one class name. If classes is a tuple or list, override the CLASSES defined by the dataset.
 Returns
Names of categories of the dataset.
 Return type
tuple[str] or list[str]
 class mmcls.datasets.CIFAR10(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[source]¶
CIFAR10 Dataset.
This implementation is modified from https://github.com/pytorch/vision/blob/master/torchvision/datasets/cifar.py
 class mmcls.datasets.CIFAR100(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[source]¶
CIFAR100 Dataset.
 class mmcls.datasets.ClassBalancedDataset(dataset, oversample_thr)[source]¶
A wrapper of repeated dataset with repeat factor.
Suitable for training on class imbalanced datasets like LVIS. Following the sampling strategy in 2, in each epoch, an image may appear multiple times based on its “repeat factor”.
The repeat factor for an image is a function of the frequency the rarest category labeled in that image. The “frequency of category c” in [0, 1] is defined by the fraction of images in the training set (without repeats) in which category c appears.
The dataset needs to implement
self.get_cat_ids()
to support ClassBalancedDataset.The repeat factor is computed as followed.
For each category c, compute the fraction \(f(c)\) of images that contain it.
For each category c, compute the categorylevel repeat factor
\[r(c) = \max(1, \sqrt{\frac{t}{f(c)}})\]For each image I and its labels \(L(I)\), compute the imagelevel repeat factor
\[r(I) = \max_{c \in L(I)} r(c)\]
References
 Parameters
dataset (
CustomDataset
) – The dataset to be repeated.oversample_thr (float) – frequency threshold below which data is repeated. For categories with f_c >= oversample_thr, there is no oversampling. For categories with f_c < oversample_thr, the degree of oversampling following the squareroot inverse frequency heuristic above.
 class mmcls.datasets.ConcatDataset(datasets)[source]¶
A wrapper of concatenated dataset.
Same as
torch.utils.data.dataset.ConcatDataset
, but add get_cat_ids function. Parameters
datasets (list[
Dataset
]) – A list of datasets.
 class mmcls.datasets.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, round_up=True)[source]¶
 class mmcls.datasets.FashionMNIST(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[source]¶
FashionMNIST Dataset.
 class mmcls.datasets.ImageNet(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[source]¶
ImageNet Dataset.
This implementation is modified from https://github.com/pytorch/vision/blob/master/torchvision/datasets/imagenet.py
 class mmcls.datasets.ImageNet21k(data_prefix, pipeline, classes=None, ann_file=None, multi_label=False, recursion_subdir=False, test_mode=False)[source]¶
ImageNet21k Dataset.
Since the dataset ImageNet21k is extremely big, cantains 21k+ classes and 1.4B files. This class has improved the following points on the basis of the class ImageNet, in order to save memory usage and time
required :
Delete the samples attribute
using ‘slots’ create a Data_item tp replace dict
Modify setting info dict from function load_annotations to function prepare_data
using int instead of np.array(…, np.int64)
Args: data_prefix (str): the prefix of data path pipeline (list): a list of dict, where each element represents
a operation defined in mmcls.datasets.pipelines
 ann_file (str  None): the annotation file. When ann_file is str,
the subclass is expected to read from the ann_file. When ann_file is None, the subclass is expected to read according to data_prefix
test_mode (bool): in train mode or test mode multi_label (bool): use multi label or not. recursion_subdir(bool): whether to use subdirectory pictures, which
are meet the conditions in the folder under category directory.
 class mmcls.datasets.MNIST(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[source]¶
MNIST Dataset.
This implementation is modified from https://github.com/pytorch/vision/blob/master/torchvision/datasets/mnist.py
 class mmcls.datasets.MultiLabelDataset(data_prefix, pipeline, classes=None, ann_file=None, test_mode=False)[source]¶
Multilabel Dataset.
 evaluate(results, metric='mAP', metric_options=None, logger=None, **deprecated_kwargs)[source]¶
Evaluate the dataset.
 Parameters
results (list) – Testing results of the dataset.
metric (str  list[str]) – Metrics to be evaluated. Default value is ‘mAP’. Options are ‘mAP’, ‘CP’, ‘CR’, ‘CF1’, ‘OP’, ‘OR’ and ‘OF1’.
metric_options (dict, optional) – Options for calculating metrics. Allowed keys are ‘k’ and ‘thr’. Defaults to None
logger (logging.Logger  str, optional) – Logger used for printing related information during evaluation. Defaults to None.
deprecated_kwargs (dict) – Used for containing deprecated arguments.
 Returns
evaluation results
 Return type
dict
 class mmcls.datasets.RepeatDataset(dataset, times)[source]¶
A wrapper of repeated dataset.
The length of repeated dataset will be times larger than the original dataset. This is useful when the data loading time is long but the dataset is small. Using RepeatDataset can reduce the data loading time between epochs.
 Parameters
dataset (
Dataset
) – The dataset to be repeated.times (int) – Repeat times.
 class mmcls.datasets.VOC(**kwargs)[source]¶
Pascal VOC Dataset.
 mmcls.datasets.build_dataloader(dataset, samples_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, round_up=True, seed=None, pin_memory=True, persistent_workers=True, **kwargs)[source]¶
Build PyTorch DataLoader.
In distributed training, each GPU/process has a dataloader. In nondistributed training, there is only one dataloader for all GPUs.
 Parameters
dataset (Dataset) – A PyTorch dataset.
samples_per_gpu (int) – Number of training samples on each GPU, i.e., batch size of each GPU.
workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.
num_gpus (int) – Number of GPUs. Only used in nondistributed training.
dist (bool) – Distributed training/test or not. Default: True.
shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.
round_up (bool) – Whether to round up the length of dataset by adding extra samples to make it evenly divisible. Default: True.
pin_memory (bool) – Whether to use pin_memory in DataLoader. Default: True
persistent_workers (bool) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. The argument also has effect in PyTorch>=1.7.0. Default: True
kwargs – any keyword argument to be used to initialize DataLoader
 Returns
A PyTorch dataloader.
 Return type
DataLoader
pipelines¶
 class mmcls.datasets.pipelines.AutoAugment(policies, hparams={'pad_val': 128})[source]¶
Auto augmentation.
This data augmentation is proposed in AutoAugment: Learning Augmentation Policies from Data.
 Parameters
policies (list[list[dict]]) – The policies of auto augmentation. Each policy in
policies
is a specific augmentation policy, and is composed by several augmentations (dict). When AutoAugment is called, a random policy inpolicies
will be selected to augment images.hparams (dict) – Configs of hyperparameters. Hyperparameters will be used in policies that require these arguments if these arguments are not set in policy dicts. Defaults to use _HPARAMS_DEFAULT.
 class mmcls.datasets.pipelines.AutoContrast(prob=0.5)[source]¶
Auto adjust image contrast.
 Parameters
prob (float) – The probability for performing invert therefore should be in range [0, 1]. Defaults to 0.5.
 class mmcls.datasets.pipelines.Brightness(magnitude, prob=0.5, random_negative_prob=0.5)[source]¶
Adjust images brightness.
 Parameters
magnitude (int  float) – The magnitude used for adjusting brightness. A positive magnitude would enhance the brightness and a negative magnitude would make the image darker. A magnitude=0 gives the origin img.
prob (float) – The probability for performing contrast adjusting therefore should be in range [0, 1]. Defaults to 0.5.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
 class mmcls.datasets.pipelines.CenterCrop(crop_size, efficientnet_style=False, crop_padding=32, interpolation='bilinear', backend='cv2')[source]¶
Center crop the image.
 Parameters
crop_size (int  tuple) – Expected size after cropping with the format of (h, w).
efficientnet_style (bool) – Whether to use efficientnet style center crop. Defaults to False.
crop_padding (int) – The crop padding parameter in efficientnet style center crop. Only valid if efficientnet style is True. Defaults to 32.
interpolation (str) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Only valid if
efficientnet_style
is True. Defaults to ‘bilinear’.backend (str) – The image resize backend type, accepted values are cv2 and pillow. Only valid if efficientnet style is True. Defaults to cv2.
Notes
If the image is smaller than the crop size, return the original image.
If efficientnet_style is set to False, the pipeline would be a simple center crop using the crop_size.
If efficientnet_style is set to True, the pipeline will be to first to perform the center crop with the
crop_size_
as:
\[\text{crop\_size\_} = \frac{\text{crop\_size}}{\text{crop\_size} + \text{crop\_padding}} \times \text{short\_edge}\]And then the pipeline resizes the img to the input crop size.
 class mmcls.datasets.pipelines.Collect(keys, meta_keys=('filename', 'ori_filename', 'ori_shape', 'img_shape', 'flip', 'flip_direction', 'img_norm_cfg'))[source]¶
Collect data from the loader relevant to the specific task.
This is usually the last stage of the data loader pipeline. Typically keys is set to some subset of “img” and “gt_label”.
 Parameters
keys (Sequence[str]) – Keys of results to be collected in
data
.meta_keys (Sequence[str], optional) – Meta keys to be converted to
mmcv.DataContainer
and collected indata[img_metas]
. Default: (‘filename’, ‘ori_shape’, ‘img_shape’, ‘flip’, ‘flip_direction’, ‘img_norm_cfg’)
 Returns
The result dict contains the following keys
keys in
self.keys
img_metas
if available
 Return type
dict
 class mmcls.datasets.pipelines.ColorJitter(brightness, contrast, saturation)[source]¶
Randomly change the brightness, contrast and saturation of an image.
 Parameters
brightness (float) – How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1  brightness), 1 + brightness].
contrast (float) – How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1  contrast), 1 + contrast].
saturation (float) – How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1  saturation), 1 + saturation].
 class mmcls.datasets.pipelines.ColorTransform(magnitude, prob=0.5, random_negative_prob=0.5)[source]¶
Adjust images color balance.
 Parameters
magnitude (int  float) – The magnitude used for color transform. A positive magnitude would enhance the color and a negative magnitude would make the image grayer. A magnitude=0 gives the origin img.
prob (float) – The probability for performing ColorTransform therefore should be in range [0, 1]. Defaults to 0.5.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
 class mmcls.datasets.pipelines.Compose(transforms)[source]¶
Compose a data pipeline with a sequence of transforms.
 Parameters
transforms (list[dict  callable]) – Either config dicts of transforms or transform objects.
 class mmcls.datasets.pipelines.Contrast(magnitude, prob=0.5, random_negative_prob=0.5)[source]¶
Adjust images contrast.
 Parameters
magnitude (int  float) – The magnitude used for adjusting contrast. A positive magnitude would enhance the contrast and a negative magnitude would make the image grayer. A magnitude=0 gives the origin img.
prob (float) – The probability for performing contrast adjusting therefore should be in range [0, 1]. Defaults to 0.5.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
 class mmcls.datasets.pipelines.Cutout(shape, pad_val=128, prob=0.5)[source]¶
Cutout images.
 Parameters
shape (int  float  tuple(int  float)) – Expected cutout shape (h, w). If given as a single value, the value will be used for both h and w.
pad_val (int, Sequence[int]) – Pixel pad_val value for constant fill. If it is a sequence, it must have the same length with the image channels. Defaults to 128.
prob (float) – The probability for performing cutout therefore should be in range [0, 1]. Defaults to 0.5.
 class mmcls.datasets.pipelines.Equalize(prob=0.5)[source]¶
Equalize the image histogram.
 Parameters
prob (float) – The probability for performing invert therefore should be in range [0, 1]. Defaults to 0.5.
 class mmcls.datasets.pipelines.Invert(prob=0.5)[source]¶
Invert images.
 Parameters
prob (float) – The probability for performing invert therefore should be in range [0, 1]. Defaults to 0.5.
 class mmcls.datasets.pipelines.Lighting(eigval, eigvec, alphastd=0.1, to_rgb=True)[source]¶
Adjust images lighting using AlexNetstyle PCA jitter.
 Parameters
eigval (list) – the eigenvalue of the convariance matrix of pixel values, respectively.
eigvec (list[list]) – the eigenvector of the convariance matrix of pixel values, respectively.
alphastd (float) – The standard deviation for distribution of alpha. Defaults to 0.1
to_rgb (bool) – Whether to convert img to rgb.
 class mmcls.datasets.pipelines.LoadImageFromFile(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶
Load an image from file.
Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape) and “img_norm_cfg” (means=0 and stds=1).
 Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for
mmcv.imfrombytes()
. Defaults to ‘color’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.
 class mmcls.datasets.pipelines.Normalize(mean, std, to_rgb=True)[source]¶
Normalize the image.
 Parameters
mean (sequence) – Mean values of 3 channels.
std (sequence) – Std values of 3 channels.
to_rgb (bool) – Whether to convert the image from BGR to RGB, default is true.
 class mmcls.datasets.pipelines.Pad(size=None, pad_to_square=False, pad_val=0, padding_mode='constant')[source]¶
Pad images.
 Parameters
size (tuple[int]  None) – Expected padding size (h, w). Conflicts with pad_to_square. Defaults to None.
pad_to_square (bool) – Pad any image to square shape. Defaults to False.
pad_val (Number  Sequence[Number]) – Values to be filled in padding areas when padding_mode is ‘constant’. Default to 0.
padding_mode (str) – Type of padding. Should be: constant, edge, reflect or symmetric. Default to “constant”.
 class mmcls.datasets.pipelines.Posterize(bits, prob=0.5)[source]¶
Posterize images (reduce the number of bits for each color channel).
 Parameters
bits (int  float) – Number of bits for each pixel in the output img, which should be less or equal to 8.
prob (float) – The probability for posterizing therefore should be in range [0, 1]. Defaults to 0.5.
 class mmcls.datasets.pipelines.RandAugment(policies, num_policies, magnitude_level, magnitude_std=0.0, total_level=30, hparams={'pad_val': 128})[source]¶
Random augmentation.
This data augmentation is proposed in RandAugment: Practical automated data augmentation with a reduced search space.
 Parameters
policies (list[dict]) – The policies of random augmentation. Each policy in
policies
is one specific augmentation policy (dict). The policy shall at least have key type, indicating the type of augmentation. For those which have magnitude, (given to the fact they are named differently in different augmentation, ) magnitude_key and magnitude_range shall be the magnitude argument (str) and the range of magnitude (tuple in the format of (val1, val2)), respectively. Note that val1 is not necessarily less than val2.num_policies (int) – Number of policies to select from policies each time.
magnitude_level (int  float) – Magnitude level for all the augmentation selected.
total_level (int  float) – Total level for the magnitude. Defaults to 30.
magnitude_std (Number  str) –
Deviation of magnitude noise applied.
If positive number, magnitude is sampled from normal distribution (mean=magnitude, std=magnitude_std).
If 0 or negative number, magnitude remains unchanged.
If str “inf”, magnitude is sampled from uniform distribution (range=[min, magnitude]).
hparams (dict) – Configs of hyperparameters. Hyperparameters will be used in policies that require these arguments if these arguments are not set in policy dicts. Defaults to use _HPARAMS_DEFAULT.
Note
magnitude_std will introduce some randomness to policy, modified by https://github.com/rwightman/pytorchimagemodels.
When magnitude_std=0, we calculate the magnitude as follows:
\[\text{magnitude} = \frac{\text{magnitude\_level}} {\text{total\_level}} \times (\text{val2}  \text{val1}) + \text{val1}\]
 class mmcls.datasets.pipelines.RandomCrop(size, padding=None, pad_if_needed=False, pad_val=0, padding_mode='constant')[source]¶
Crop the given Image at a random location.
 Parameters
size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.
padding (int or sequence, optional) – Optional padding on each border of the image. If a sequence of length 4 is provided, it is used to pad left, top, right, bottom borders respectively. If a sequence of length 2 is provided, it is used to pad left/right, top/bottom borders, respectively. Default: None, which means no padding.
pad_if_needed (boolean) – It will pad the image if smaller than the desired size to avoid raising an exception. Since cropping is done after padding, the padding seems to be done at a random offset. Default: False.
pad_val (Number  Sequence[Number]) – Pixel pad_val value for constant fill. If a tuple of length 3, it is used to pad_val R, G, B channels respectively. Default: 0.
padding_mode (str) –
Type of padding. Defaults to “constant”. Should be one of the following:
constant: Pads with a constant value, this value is specified with pad_val.
edge: pads with the last value at the edge of the image.
reflect: Pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2].
symmetric: Pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3].
 static get_params(img, output_size)[source]¶
Get parameters for
crop
for a random crop. Parameters
img (ndarray) – Image to be cropped.
output_size (tuple) – Expected output size of the crop.
 Returns
 Params (xmin, ymin, target_height, target_width) to be
passed to
crop
for random crop.
 Return type
tuple
 class mmcls.datasets.pipelines.RandomErasing(erase_prob=0.5, min_area_ratio=0.02, max_area_ratio=0.4, aspect_range=(0.3, 3.3333333333333335), mode='const', fill_color=(128, 128, 128), fill_std=None)[source]¶
Randomly selects a rectangle region in an image and erase pixels.
 Parameters
erase_prob (float) – Probability that image will be randomly erased. Default: 0.5
min_area_ratio (float) – Minimum erased area / input image area Default: 0.02
max_area_ratio (float) – Maximum erased area / input image area Default: 0.4
aspect_range (sequence  float) – Aspect ratio range of erased area. if float, it will be converted to (aspect_ratio, 1/aspect_ratio) Default: (3/10, 10/3)
mode (str) –
Fill method in erased area, can be:
const (default): All pixels are assign with the same value.
rand: each pixel is assigned with a random value in [0, 255]
fill_color (sequence  Number) – Base color filled in erased area. Defaults to (128, 128, 128).
fill_std (sequence  Number, optional) – If set and
mode
is ‘rand’, fill erased area with random color from normal distribution (mean=fill_color, std=fill_std); If not set, fill erased area with random color from uniform distribution (0~255). Defaults to None.
Note
See Random Erasing Data Augmentation
This paper provided 4 modes: RER, REM, RE0, RE255, and use REM as default. The config of these 4 modes are:
RER: RandomErasing(mode=’rand’)
REM: RandomErasing(mode=’const’, fill_color=(123.67, 116.3, 103.5))
RE0: RandomErasing(mode=’const’, fill_color=0)
RE255: RandomErasing(mode=’const’, fill_color=255)
 class mmcls.datasets.pipelines.RandomFlip(flip_prob=0.5, direction='horizontal')[source]¶
Flip the image randomly.
Flip the image randomly based on flip probaility and flip direction.
 Parameters
flip_prob (float) – probability of the image being flipped. Default: 0.5
direction (str) – The flipping direction. Options are ‘horizontal’ and ‘vertical’. Default: ‘horizontal’.
 class mmcls.datasets.pipelines.RandomGrayscale(gray_prob=0.1)[source]¶
Randomly convert image to grayscale with a probability of gray_prob.
 Parameters
gray_prob (float) – Probability that image should be converted to grayscale. Default: 0.1.
 Returns
Image after randomly grayscale transform.
 Return type
ndarray
Notes
If input image is 1 channel: grayscale version is 1 channel.
If input image is 3 channel: grayscale version is 3 channel with r == g == b.
 class mmcls.datasets.pipelines.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), max_attempts=10, efficientnet_style=False, min_covered=0.1, crop_padding=32, interpolation='bilinear', backend='cv2')[source]¶
Crop the given image to random size and aspect ratio.
A crop of random size (default: of 0.08 to 1.0) of the original size and a random aspect ratio (default: of 3/4 to 4/3) of the original aspect ratio is made. This crop is finally resized to given size.
 Parameters
size (sequence  int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.
scale (tuple) – Range of the random size of the cropped image compared to the original image. Defaults to (0.08, 1.0).
ratio (tuple) – Range of the random aspect ratio of the cropped image compared to the original image. Defaults to (3. / 4., 4. / 3.).
max_attempts (int) – Maximum number of attempts before falling back to Central Crop. Defaults to 10.
efficientnet_style (bool) – Whether to use efficientnet style Random ResizedCrop. Defaults to False.
min_covered (Number) – Minimum ratio of the cropped area to the original area. Only valid if efficientnet_style is true. Defaults to 0.1.
crop_padding (int) – The crop padding parameter in efficientnet style center crop. Only valid if efficientnet_style is true. Defaults to 32.
interpolation (str) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘bilinear’.
backend (str) – The image resize backend type, accepted values are cv2 and pillow. Defaults to cv2.
 static get_params(img, scale, ratio, max_attempts=10)[source]¶
Get parameters for
crop
for a random sized crop. Parameters
img (ndarray) – Image to be cropped.
scale (tuple) – Range of the random size of the cropped image compared to the original image size.
ratio (tuple) – Range of the random aspect ratio of the cropped image compared to the original image area.
max_attempts (int) – Maximum number of attempts before falling back to central crop. Defaults to 10.
 Returns
 Params (ymin, xmin, ymax, xmax) to be passed to crop for
a random sized crop.
 Return type
tuple
 static get_params_efficientnet_style(img, size, scale, ratio, max_attempts=10, min_covered=0.1, crop_padding=32)[source]¶
Get parameters for
crop
for a random sized crop in efficientnet style. Parameters
img (ndarray) – Image to be cropped.
size (sequence) – Desired output size of the crop.
scale (tuple) – Range of the random size of the cropped image compared to the original image size.
ratio (tuple) – Range of the random aspect ratio of the cropped image compared to the original image area.
max_attempts (int) – Maximum number of attempts before falling back to central crop. Defaults to 10.
min_covered (Number) – Minimum ratio of the cropped area to the original area. Only valid if efficientnet_style is true. Defaults to 0.1.
crop_padding (int) – The crop padding parameter in efficientnet style center crop. Defaults to 32.
 Returns
 Params (ymin, xmin, ymax, xmax) to be passed to crop for
a random sized crop.
 Return type
tuple
 class mmcls.datasets.pipelines.Resize(size, interpolation='bilinear', adaptive_side='short', backend='cv2')[source]¶
Resize images.
 Parameters
size (int  tuple) – Images scales for resizing (h, w). When size is int, the default behavior is to resize an image to (size, size). When size is tuple and the second value is 1, the image will be resized according to adaptive_side. For example, when size is 224, the image is resized to 224x224. When size is (224, 1) and adaptive_size is “short”, the short side is resized to 224 and the other side is computed based on the short side, maintaining the aspect ratio.
interpolation (str) – Interpolation method. For “cv2” backend, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos”. For “pillow” backend, accepted values are “nearest”, “bilinear”, “bicubic”, “box”, “lanczos”, “hamming”. More details can be found in mmcv.image.geometric.
adaptive_side (str) – Adaptive resize policy, accepted values are “short”, “long”, “height”, “width”. Default to “short”.
backend (str) – The image resize backend type, accepted values are cv2 and pillow. Default: cv2.
 class mmcls.datasets.pipelines.Rotate(angle, center=None, scale=1.0, pad_val=128, prob=0.5, random_negative_prob=0.5, interpolation='nearest')[source]¶
Rotate images.
 Parameters
angle (float) – The angle used for rotate. Positive values stand for clockwise rotation.
center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If None, the center of the image will be used. Defaults to None.
scale (float) – Isotropic scale factor. Defaults to 1.0.
pad_val (int, Sequence[int]) – Pixel pad_val value for constant fill. If a sequence of length 3, it is used to pad_val R, G, B channels respectively. Defaults to 128.
prob (float) – The probability for performing Rotate therefore should be in range [0, 1]. Defaults to 0.5.
random_negative_prob (float) – The probability that turns the angle negative, which should be in range [0,1]. Defaults to 0.5.
interpolation (str) – Interpolation method. Options are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘nearest’.
 class mmcls.datasets.pipelines.Sharpness(magnitude, prob=0.5, random_negative_prob=0.5)[source]¶
Adjust images sharpness.
 Parameters
magnitude (int  float) – The magnitude used for adjusting sharpness. A positive magnitude would enhance the sharpness and a negative magnitude would make the image bulr. A magnitude=0 gives the origin img.
prob (float) – The probability for performing contrast adjusting therefore should be in range [0, 1]. Defaults to 0.5.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
 class mmcls.datasets.pipelines.Shear(magnitude, pad_val=128, prob=0.5, direction='horizontal', random_negative_prob=0.5, interpolation='bicubic')[source]¶
Shear images.
 Parameters
magnitude (int  float) – The magnitude used for shear.
pad_val (int, Sequence[int]) – Pixel pad_val value for constant fill. If a sequence of length 3, it is used to pad_val R, G, B channels respectively. Defaults to 128.
prob (float) – The probability for performing Shear therefore should be in range [0, 1]. Defaults to 0.5.
direction (str) – The shearing direction. Options are ‘horizontal’ and ‘vertical’. Defaults to ‘horizontal’.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
interpolation (str) – Interpolation method. Options are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘bicubic’.
 class mmcls.datasets.pipelines.Solarize(thr, prob=0.5)[source]¶
Solarize images (invert all pixel values above a threshold).
 Parameters
thr (int  float) – The threshold above which the pixels value will be inverted.
prob (float) – The probability for solarizing therefore should be in range [0, 1]. Defaults to 0.5.
 class mmcls.datasets.pipelines.SolarizeAdd(magnitude, thr=128, prob=0.5)[source]¶
SolarizeAdd images (add a certain value to pixels below a threshold).
 Parameters
magnitude (int  float) – The value to be added to pixels below the thr.
thr (int  float) – The threshold below which the pixels value will be adjusted.
prob (float) – The probability for solarizing therefore should be in range [0, 1]. Defaults to 0.5.
 class mmcls.datasets.pipelines.Translate(magnitude, pad_val=128, prob=0.5, direction='horizontal', random_negative_prob=0.5, interpolation='nearest')[source]¶
Translate images.
 Parameters
magnitude (int  float) – The magnitude used for translate. Note that the offset is calculated by magnitude * size in the corresponding direction. With a magnitude of 1, the whole image will be moved out of the range.
pad_val (int, Sequence[int]) – Pixel pad_val value for constant fill. If a sequence of length 3, it is used to pad_val R, G, B channels respectively. Defaults to 128.
prob (float) – The probability for performing translate therefore should be in range [0, 1]. Defaults to 0.5.
direction (str) – The translating direction. Options are ‘horizontal’ and ‘vertical’. Defaults to ‘horizontal’.
random_negative_prob (float) – The probability that turns the magnitude negative, which should be in range [0,1]. Defaults to 0.5.
interpolation (str) – Interpolation method. Options are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘nearest’.
mmcls.utils¶
 mmcls.utils.load_json_logs(json_logs)[source]¶
load and convert json_logs to log_dicts.
 Parameters
json_logs (str) – paths of json_logs.
 Returns
 dict())]: key is epoch, value is a sub dict keys of
sub dict is different metrics, e.g. memory, bbox_mAP, value of sub dict is a list of corresponding values of all iterations.
 Return type
list[dict(int