Data Transformations¶

In MMClassification, the data preparation and the dataset is decomposed. The datasets only define how to get samples’ basic information from the file system. These basic information includes the ground-truth label and raw images data / the paths of images.

To prepare the inputs data, we need to do some transformations on these basic information. These transformations includes loading, preprocessing and formatting. And a series of data transformations makes up a data pipeline. Therefore, you can find the a pipeline argument in the configs of dataset, for example:

img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='RandomResizedCrop', size=224),
dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_label']),
dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
dict(type='Resize', size=256),
dict(type='CenterCrop', crop_size=224),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]

data = dict(
train=dict(..., pipeline=train_pipeline),
val=dict(..., pipeline=test_pipeline),
test=dict(..., pipeline=test_pipeline),
)


Every item of a pipeline list is one of the following data transformations class. And if you want to add a custom data transformation class, the tutorial Custom Data Pipelines will help you.

Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape) and “img_norm_cfg” (means=0 and stds=1).

• to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

• color_type (str) – The flag argument for mmcv.imfrombytes(). Defaults to ‘color’.

• file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

Preprocessing and Augmentation¶

CenterCrop¶

class mmcls.datasets.pipelines.CenterCrop(crop_size, efficientnet_style=False, crop_padding=32, interpolation='bilinear', backend='cv2')[源代码]

Center crop the image.

• crop_size (int | tuple) – Expected size after cropping with the format of (h, w).

• efficientnet_style (bool) – Whether to use efficientnet style center crop. Defaults to False.

• crop_padding (int) – The crop padding parameter in efficientnet style center crop. Only valid if efficientnet style is True. Defaults to 32.

• interpolation (str) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Only valid if efficientnet_style is True. Defaults to ‘bilinear’.

• backend (str) – The image resize backend type, accepted values are cv2 and pillow. Only valid if efficientnet style is True. Defaults to cv2.

• If the image is smaller than the crop size, return the original image.

• If efficientnet_style is set to False, the pipeline would be a simple center crop using the crop_size.

• If efficientnet_style is set to True, the pipeline will be to first to perform the center crop with the crop_size_ as:

$\text{crop_size_} = \frac{\text{crop_size}}{\text{crop_size} + \text{crop_padding}} \times \text{short_edge}$

And then the pipeline resizes the img to the input crop size.

Lighting¶

class mmcls.datasets.pipelines.Lighting(eigval, eigvec, alphastd=0.1, to_rgb=True)[源代码]

Adjust images lighting using AlexNet-style PCA jitter.

• eigval (list) – the eigenvalue of the convariance matrix of pixel values, respectively.

• eigvec (list[list]) – the eigenvector of the convariance matrix of pixel values, respectively.

• alphastd (float) – The standard deviation for distribution of alpha. Defaults to 0.1

• to_rgb (bool) – Whether to convert img to rgb.

Normalize¶

class mmcls.datasets.pipelines.Normalize(mean, std, to_rgb=True)[源代码]

Normalize the image.

• mean (sequence) – Mean values of 3 channels.

• std (sequence) – Std values of 3 channels.

• to_rgb (bool) – Whether to convert the image from BGR to RGB, default is true.

• size (tuple[int] | None) – Expected padding size (h, w). Conflicts with pad_to_square. Defaults to None.

• pad_to_square (bool) – Pad any image to square shape. Defaults to False.

• pad_val (Number | Sequence[Number]) – Values to be filled in padding areas when padding_mode is ‘constant’. Default to 0.

• padding_mode (str) – Type of padding. Should be: constant, edge, reflect or symmetric. Default to “constant”.

Resize¶

Resize images.

• size (int | tuple) – Images scales for resizing (h, w). When size is int, the default behavior is to resize an image to (size, size). When size is tuple and the second value is -1, the image will be resized according to adaptive_side. For example, when size is 224, the image is resized to 224x224. When size is (224, -1) and adaptive_size is “short”, the short side is resized to 224 and the other side is computed based on the short side, maintaining the aspect ratio.

• interpolation (str) – Interpolation method. For “cv2” backend, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos”. For “pillow” backend, accepted values are “nearest”, “bilinear”, “bicubic”, “box”, “lanczos”, “hamming”. More details can be found in mmcv.image.geometric.

• adaptive_side (str) – Adaptive resize policy, accepted values are “short”, “long”, “height”, “width”. Default to “short”.

• backend (str) – The image resize backend type, accepted values are cv2 and pillow. Default: cv2.

RandomCrop¶

Crop the given Image at a random location.

• size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.

• padding (int or sequence, optional) – Optional padding on each border of the image. If a sequence of length 4 is provided, it is used to pad left, top, right, bottom borders respectively. If a sequence of length 2 is provided, it is used to pad left/right, top/bottom borders, respectively. Default: None, which means no padding.

• pad_if_needed (boolean) – It will pad the image if smaller than the desired size to avoid raising an exception. Since cropping is done after padding, the padding seems to be done at a random offset. Default: False.

• pad_val (Number | Sequence[Number]) – Pixel pad_val value for constant fill. If a tuple of length 3, it is used to pad_val R, G, B channels respectively. Default: 0.

Type of padding. Defaults to “constant”. Should be one of the following:

• constant: Pads with a constant value, this value is specified with pad_val.

• edge: pads with the last value at the edge of the image.

• reflect: Pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2].

• symmetric: Pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3].

RandomErasing¶

class mmcls.datasets.pipelines.RandomErasing(erase_prob=0.5, min_area_ratio=0.02, max_area_ratio=0.4, aspect_range=(0.3, 3.3333333333333335), mode='const', fill_color=(128, 128, 128), fill_std=None)[源代码]

Randomly selects a rectangle region in an image and erase pixels.

• erase_prob (float) – Probability that image will be randomly erased. Default: 0.5

• min_area_ratio (float) – Minimum erased area / input image area Default: 0.02

• max_area_ratio (float) – Maximum erased area / input image area Default: 0.4

• aspect_range (sequence | float) – Aspect ratio range of erased area. if float, it will be converted to (aspect_ratio, 1/aspect_ratio) Default: (3/10, 10/3)

• mode (str) –

Fill method in erased area, can be:

• const (default): All pixels are assign with the same value.

• rand: each pixel is assigned with a random value in [0, 255]

• fill_color (sequence | Number) – Base color filled in erased area. Defaults to (128, 128, 128).

• fill_std (sequence | Number, optional) – If set and mode is ‘rand’, fill erased area with random color from normal distribution (mean=fill_color, std=fill_std); If not set, fill erased area with random color from uniform distribution (0~255). Defaults to None.

This paper provided 4 modes: RE-R, RE-M, RE-0, RE-255, and use RE-M as default. The config of these 4 modes are:

• RE-R: RandomErasing(mode=’rand’)

• RE-M: RandomErasing(mode=’const’, fill_color=(123.67, 116.3, 103.5))

• RE-0: RandomErasing(mode=’const’, fill_color=0)

• RE-255: RandomErasing(mode=’const’, fill_color=255)

RandomFlip¶

class mmcls.datasets.pipelines.RandomFlip(flip_prob=0.5, direction='horizontal')[源代码]

Flip the image randomly.

Flip the image randomly based on flip probaility and flip direction.

• flip_prob (float) – probability of the image being flipped. Default: 0.5

• direction (str) – The flipping direction. Options are ‘horizontal’ and ‘vertical’. Default: ‘horizontal’.

RandomGrayscale¶

class mmcls.datasets.pipelines.RandomGrayscale(gray_prob=0.1)[源代码]

Randomly convert image to grayscale with a probability of gray_prob.

gray_prob (float) – Probability that image should be converted to grayscale. Default: 0.1.

Image after randomly grayscale transform.

ndarray

• If input image is 1 channel: grayscale version is 1 channel.

• If input image is 3 channel: grayscale version is 3 channel with r == g == b.

RandomResizedCrop¶

class mmcls.datasets.pipelines.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), max_attempts=10, efficientnet_style=False, min_covered=0.1, crop_padding=32, interpolation='bilinear', backend='cv2')[源代码]

Crop the given image to random size and aspect ratio.

A crop of random size (default: of 0.08 to 1.0) of the original size and a random aspect ratio (default: of 3/4 to 4/3) of the original aspect ratio is made. This crop is finally resized to given size.

• size (sequence | int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.

• scale (tuple) – Range of the random size of the cropped image compared to the original image. Defaults to (0.08, 1.0).

• ratio (tuple) – Range of the random aspect ratio of the cropped image compared to the original image. Defaults to (3. / 4., 4. / 3.).

• max_attempts (int) – Maximum number of attempts before falling back to Central Crop. Defaults to 10.

• efficientnet_style (bool) – Whether to use efficientnet style Random ResizedCrop. Defaults to False.

• min_covered (Number) – Minimum ratio of the cropped area to the original area. Only valid if efficientnet_style is true. Defaults to 0.1.

• crop_padding (int) – The crop padding parameter in efficientnet style center crop. Only valid if efficientnet_style is true. Defaults to 32.

• interpolation (str) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘bilinear’.

• backend (str) – The image resize backend type, accepted values are cv2 and pillow. Defaults to cv2.

ColorJitter¶

class mmcls.datasets.pipelines.ColorJitter(brightness, contrast, saturation)[源代码]

Randomly change the brightness, contrast and saturation of an image.

• brightness (float) – How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness].

• contrast (float) – How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast].

• saturation (float) – How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation].

Composed Augmentation¶

Composed augmentation is a kind of methods which compose a series of data augmentation transformations, such as AutoAugment and RandAugment.

Auto augmentation.

This data augmentation is proposed in AutoAugment: Learning Augmentation Policies from Data.

• policies (list[list[dict]]) – The policies of auto augmentation. Each policy in policies is a specific augmentation policy, and is composed by several augmentations (dict). When AutoAugment is called, a random policy in policies will be selected to augment images.

• hparams (dict) – Configs of hyperparameters. Hyperparameters will be used in policies that require these arguments if these arguments are not set in policy dicts. Defaults to use _HPARAMS_DEFAULT.

class mmcls.datasets.pipelines.RandAugment(policies, num_policies, magnitude_level, magnitude_std=0.0, total_level=30, hparams={'pad_val': 128})[源代码]

Random augmentation.

This data augmentation is proposed in RandAugment: Practical automated data augmentation with a reduced search space.

• policies (list[dict]) – The policies of random augmentation. Each policy in policies is one specific augmentation policy (dict). The policy shall at least have key type, indicating the type of augmentation. For those which have magnitude, (given to the fact they are named differently in different augmentation, ) magnitude_key and magnitude_range shall be the magnitude argument (str) and the range of magnitude (tuple in the format of (val1, val2)), respectively. Note that val1 is not necessarily less than val2.

• num_policies (int) – Number of policies to select from policies each time.

• magnitude_level (int | float) – Magnitude level for all the augmentation selected.

• total_level (int | float) – Total level for the magnitude. Defaults to 30.

• magnitude_std (Number | str) –

Deviation of magnitude noise applied.

• If positive number, magnitude is sampled from normal distribution (mean=magnitude, std=magnitude_std).

• If 0 or negative number, magnitude remains unchanged.

• If str “inf”, magnitude is sampled from uniform distribution (range=[min, magnitude]).

• hparams (dict) – Configs of hyperparameters. Hyperparameters will be used in policies that require these arguments if these arguments are not set in policy dicts. Defaults to use _HPARAMS_DEFAULT.

magnitude_std will introduce some randomness to policy, modified by https://github.com/rwightman/pytorch-image-models.

When magnitude_std=0, we calculate the magnitude as follows:

$\text{magnitude} = \frac{\text{magnitude_level}} {\text{totallevel}} \times (\text{val2} - \text{val1}) + \text{val1}$

In composed augmentation, we need to specify several data transformations or several groups of data transformations (The policies argument) as the random sampling space. These data transformations are chosen from the below table. In addition, we provide some preset policies in this folder.

 AutoContrast Auto adjust image contrast. Brightness Adjust images brightness. ColorTransform Adjust images color balance. Contrast Adjust images contrast. Cutout Cutout images. Equalize Equalize the image histogram. Invert Invert images. Posterize Posterize images (reduce the number of bits for each color channel). Rotate Rotate images. Sharpness Adjust images sharpness. Shear Shear images. Solarize Solarize images (invert all pixel values above a threshold). SolarizeAdd SolarizeAdd images (add a certain value to pixels below a threshold). Translate Translate images.

Formatting¶

Collect¶

class mmcls.datasets.pipelines.Collect(keys, meta_keys=('filename', 'ori_filename', 'ori_shape', 'img_shape', 'flip', 'flip_direction', 'img_norm_cfg'))[源代码]

This is usually the last stage of the data loader pipeline. Typically keys is set to some subset of “img” and “gt_label”.

• keys (Sequence[str]) – Keys of results to be collected in data.

• meta_keys (Sequence[str], optional) – Meta keys to be converted to mmcv.DataContainer and collected in data[img_metas]. Default: (‘filename’, ‘ori_shape’, ‘img_shape’, ‘flip’, ‘flip_direction’, ‘img_norm_cfg’)

The result dict contains the following keys

• keys in self.keys

• img_metas if available

dict

ImageToTensor¶

class mmcls.datasets.pipelines.ImageToTensor(keys)[源代码]

ToNumpy¶

class mmcls.datasets.pipelines.ToNumpy[源代码]

ToPIL¶

class mmcls.datasets.pipelines.ToPIL[源代码]

ToTensor¶

class mmcls.datasets.pipelines.ToTensor(keys)[源代码]

Transpose¶

class mmcls.datasets.pipelines.Transpose(keys, order)[源代码]