Shortcuts

Data Transformations

In MMClassification, the data preparation and the dataset is decomposed. The datasets only define how to get samples’ basic information from the file system. These basic information includes the ground-truth label and raw images data / the paths of images.

To prepare the inputs data, we need to do some transformations on these basic information. These transformations includes loading, preprocessing and formatting. And a series of data transformations makes up a data pipeline. Therefore, you can find the a pipeline argument in the configs of dataset, for example:

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='RandomResizedCrop', size=224),
    dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='ToTensor', keys=['gt_label']),
    dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='Resize', size=256),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='Collect', keys=['img'])
]

data = dict(
    train=dict(..., pipeline=train_pipeline),
    val=dict(..., pipeline=test_pipeline),
    test=dict(..., pipeline=test_pipeline),
)

Every item of a pipeline list is one of the following data transformations class. And if you want to add a custom data transformation class, the tutorial Custom Data Pipelines will help you.

Loading

LoadImageFromFile

class mmcls.datasets.pipelines.LoadImageFromFile(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]

Load an image from file.

Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape) and “img_norm_cfg” (means=0 and stds=1).

Parameters
  • to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

  • color_type (str) – The flag argument for mmcv.imfrombytes(). Defaults to ‘color’.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

Preprocessing and Augmentation

CenterCrop

class mmcls.datasets.pipelines.CenterCrop(crop_size, efficientnet_style=False, crop_padding=32, interpolation='bilinear', backend='cv2')[source]

Center crop the image.

Parameters
  • crop_size (int | tuple) – Expected size after cropping with the format of (h, w).

  • efficientnet_style (bool) – Whether to use efficientnet style center crop. Defaults to False.

  • crop_padding (int) – The crop padding parameter in efficientnet style center crop. Only valid if efficientnet style is True. Defaults to 32.

  • interpolation (str) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Only valid if efficientnet_style is True. Defaults to ‘bilinear’.

  • backend (str) – The image resize backend type, accepted values are cv2 and pillow. Only valid if efficientnet style is True. Defaults to cv2.

Notes

  • If the image is smaller than the crop size, return the original image.

  • If efficientnet_style is set to False, the pipeline would be a simple center crop using the crop_size.

  • If efficientnet_style is set to True, the pipeline will be to first to perform the center crop with the crop_size_ as:

\[\text{crop_size_} = \frac{\text{crop_size}}{\text{crop_size} + \text{crop_padding}} \times \text{short_edge}\]

And then the pipeline resizes the img to the input crop size.

Lighting

class mmcls.datasets.pipelines.Lighting(eigval, eigvec, alphastd=0.1, to_rgb=True)[source]

Adjust images lighting using AlexNet-style PCA jitter.

Parameters
  • eigval (list) – the eigenvalue of the convariance matrix of pixel values, respectively.

  • eigvec (list[list]) – the eigenvector of the convariance matrix of pixel values, respectively.

  • alphastd (float) – The standard deviation for distribution of alpha. Defaults to 0.1

  • to_rgb (bool) – Whether to convert img to rgb.

Normalize

class mmcls.datasets.pipelines.Normalize(mean, std, to_rgb=True)[source]

Normalize the image.

Parameters
  • mean (sequence) – Mean values of 3 channels.

  • std (sequence) – Std values of 3 channels.

  • to_rgb (bool) – Whether to convert the image from BGR to RGB, default is true.

Pad

class mmcls.datasets.pipelines.Pad(size=None, pad_to_square=False, pad_val=0, padding_mode='constant')[source]

Pad images.

Parameters
  • size (tuple[int] | None) – Expected padding size (h, w). Conflicts with pad_to_square. Defaults to None.

  • pad_to_square (bool) – Pad any image to square shape. Defaults to False.

  • pad_val (Number | Sequence[Number]) – Values to be filled in padding areas when padding_mode is ‘constant’. Default to 0.

  • padding_mode (str) – Type of padding. Should be: constant, edge, reflect or symmetric. Default to “constant”.

Resize

class mmcls.datasets.pipelines.Resize(size, interpolation='bilinear', adaptive_side='short', backend='cv2')[source]

Resize images.

Parameters
  • size (int | tuple) – Images scales for resizing (h, w). When size is int, the default behavior is to resize an image to (size, size). When size is tuple and the second value is -1, the image will be resized according to adaptive_side. For example, when size is 224, the image is resized to 224x224. When size is (224, -1) and adaptive_size is “short”, the short side is resized to 224 and the other side is computed based on the short side, maintaining the aspect ratio.

  • interpolation (str) – Interpolation method. For “cv2” backend, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos”. For “pillow” backend, accepted values are “nearest”, “bilinear”, “bicubic”, “box”, “lanczos”, “hamming”. More details can be found in mmcv.image.geometric.

  • adaptive_side (str) – Adaptive resize policy, accepted values are “short”, “long”, “height”, “width”. Default to “short”.

  • backend (str) – The image resize backend type, accepted values are cv2 and pillow. Default: cv2.

RandomCrop

class mmcls.datasets.pipelines.RandomCrop(size, padding=None, pad_if_needed=False, pad_val=0, padding_mode='constant')[source]

Crop the given Image at a random location.

Parameters
  • size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.

  • padding (int or sequence, optional) – Optional padding on each border of the image. If a sequence of length 4 is provided, it is used to pad left, top, right, bottom borders respectively. If a sequence of length 2 is provided, it is used to pad left/right, top/bottom borders, respectively. Default: None, which means no padding.

  • pad_if_needed (boolean) – It will pad the image if smaller than the desired size to avoid raising an exception. Since cropping is done after padding, the padding seems to be done at a random offset. Default: False.

  • pad_val (Number | Sequence[Number]) – Pixel pad_val value for constant fill. If a tuple of length 3, it is used to pad_val R, G, B channels respectively. Default: 0.

  • padding_mode (str) –

    Type of padding. Defaults to “constant”. Should be one of the following:

    • constant: Pads with a constant value, this value is specified with pad_val.

    • edge: pads with the last value at the edge of the image.

    • reflect: Pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2].

    • symmetric: Pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3].

RandomErasing

class mmcls.datasets.pipelines.RandomErasing(erase_prob=0.5, min_area_ratio=0.02, max_area_ratio=0.4, aspect_range=(0.3, 3.3333333333333335), mode='const', fill_color=(128, 128, 128), fill_std=None)[source]

Randomly selects a rectangle region in an image and erase pixels.

Parameters
  • erase_prob (float) – Probability that image will be randomly erased. Default: 0.5

  • min_area_ratio (float) – Minimum erased area / input image area Default: 0.02

  • max_area_ratio (float) – Maximum erased area / input image area Default: 0.4

  • aspect_range (sequence | float) – Aspect ratio range of erased area. if float, it will be converted to (aspect_ratio, 1/aspect_ratio) Default: (3/10, 10/3)

  • mode (str) –

    Fill method in erased area, can be:

    • const (default): All pixels are assign with the same value.

    • rand: each pixel is assigned with a random value in [0, 255]

  • fill_color (sequence | Number) – Base color filled in erased area. Defaults to (128, 128, 128).

  • fill_std (sequence | Number, optional) – If set and mode is ‘rand’, fill erased area with random color from normal distribution (mean=fill_color, std=fill_std); If not set, fill erased area with random color from uniform distribution (0~255). Defaults to None.

Note

See Random Erasing Data Augmentation

This paper provided 4 modes: RE-R, RE-M, RE-0, RE-255, and use RE-M as default. The config of these 4 modes are:

  • RE-R: RandomErasing(mode=’rand’)

  • RE-M: RandomErasing(mode=’const’, fill_color=(123.67, 116.3, 103.5))

  • RE-0: RandomErasing(mode=’const’, fill_color=0)

  • RE-255: RandomErasing(mode=’const’, fill_color=255)

RandomFlip

class mmcls.datasets.pipelines.RandomFlip(flip_prob=0.5, direction='horizontal')[source]

Flip the image randomly.

Flip the image randomly based on flip probaility and flip direction.

Parameters
  • flip_prob (float) – probability of the image being flipped. Default: 0.5

  • direction (str) – The flipping direction. Options are ‘horizontal’ and ‘vertical’. Default: ‘horizontal’.

RandomGrayscale

class mmcls.datasets.pipelines.RandomGrayscale(gray_prob=0.1)[source]

Randomly convert image to grayscale with a probability of gray_prob.

Parameters

gray_prob (float) – Probability that image should be converted to grayscale. Default: 0.1.

Returns

Image after randomly grayscale transform.

Return type

ndarray

Notes

  • If input image is 1 channel: grayscale version is 1 channel.

  • If input image is 3 channel: grayscale version is 3 channel with r == g == b.

RandomResizedCrop

class mmcls.datasets.pipelines.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), max_attempts=10, efficientnet_style=False, min_covered=0.1, crop_padding=32, interpolation='bilinear', backend='cv2')[source]

Crop the given image to random size and aspect ratio.

A crop of random size (default: of 0.08 to 1.0) of the original size and a random aspect ratio (default: of 3/4 to 4/3) of the original aspect ratio is made. This crop is finally resized to given size.

Parameters
  • size (sequence | int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.

  • scale (tuple) – Range of the random size of the cropped image compared to the original image. Defaults to (0.08, 1.0).

  • ratio (tuple) – Range of the random aspect ratio of the cropped image compared to the original image. Defaults to (3. / 4., 4. / 3.).

  • max_attempts (int) – Maximum number of attempts before falling back to Central Crop. Defaults to 10.

  • efficientnet_style (bool) – Whether to use efficientnet style Random ResizedCrop. Defaults to False.

  • min_covered (Number) – Minimum ratio of the cropped area to the original area. Only valid if efficientnet_style is true. Defaults to 0.1.

  • crop_padding (int) – The crop padding parameter in efficientnet style center crop. Only valid if efficientnet_style is true. Defaults to 32.

  • interpolation (str) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘bilinear’.

  • backend (str) – The image resize backend type, accepted values are cv2 and pillow. Defaults to cv2.

ColorJitter

class mmcls.datasets.pipelines.ColorJitter(brightness, contrast, saturation)[source]

Randomly change the brightness, contrast and saturation of an image.

Parameters
  • brightness (float) – How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness].

  • contrast (float) – How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast].

  • saturation (float) – How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation].

Composed Augmentation

Composed augmentation is a kind of methods which compose a series of data augmentation transformations, such as AutoAugment and RandAugment.

class mmcls.datasets.pipelines.AutoAugment(policies, hparams={'pad_val': 128})[source]

Auto augmentation.

This data augmentation is proposed in AutoAugment: Learning Augmentation Policies from Data.

Parameters
  • policies (list[list[dict]]) – The policies of auto augmentation. Each policy in policies is a specific augmentation policy, and is composed by several augmentations (dict). When AutoAugment is called, a random policy in policies will be selected to augment images.

  • hparams (dict) – Configs of hyperparameters. Hyperparameters will be used in policies that require these arguments if these arguments are not set in policy dicts. Defaults to use _HPARAMS_DEFAULT.

class mmcls.datasets.pipelines.RandAugment(policies, num_policies, magnitude_level, magnitude_std=0.0, total_level=30, hparams={'pad_val': 128})[source]

Random augmentation.

This data augmentation is proposed in RandAugment: Practical automated data augmentation with a reduced search space.

Parameters
  • policies (list[dict]) – The policies of random augmentation. Each policy in policies is one specific augmentation policy (dict). The policy shall at least have key type, indicating the type of augmentation. For those which have magnitude, (given to the fact they are named differently in different augmentation, ) magnitude_key and magnitude_range shall be the magnitude argument (str) and the range of magnitude (tuple in the format of (val1, val2)), respectively. Note that val1 is not necessarily less than val2.

  • num_policies (int) – Number of policies to select from policies each time.

  • magnitude_level (int | float) – Magnitude level for all the augmentation selected.

  • total_level (int | float) – Total level for the magnitude. Defaults to 30.

  • magnitude_std (Number | str) –

    Deviation of magnitude noise applied.

    • If positive number, magnitude is sampled from normal distribution (mean=magnitude, std=magnitude_std).

    • If 0 or negative number, magnitude remains unchanged.

    • If str “inf”, magnitude is sampled from uniform distribution (range=[min, magnitude]).

  • hparams (dict) – Configs of hyperparameters. Hyperparameters will be used in policies that require these arguments if these arguments are not set in policy dicts. Defaults to use _HPARAMS_DEFAULT.

Note

magnitude_std will introduce some randomness to policy, modified by https://github.com/rwightman/pytorch-image-models.

When magnitude_std=0, we calculate the magnitude as follows:

\[\text{magnitude} = \frac{\text{magnitude_level}} {\text{totallevel}} \times (\text{val2} - \text{val1}) + \text{val1}\]

In composed augmentation, we need to specify several data transformations or several groups of data transformations (The policies argument) as the random sampling space. These data transformations are chosen from the below table. In addition, we provide some preset policies in this folder.

AutoContrast

Auto adjust image contrast.

Brightness

Adjust images brightness.

ColorTransform

Adjust images color balance.

Contrast

Adjust images contrast.

Cutout

Cutout images.

Equalize

Equalize the image histogram.

Invert

Invert images.

Posterize

Posterize images (reduce the number of bits for each color channel).

Rotate

Rotate images.

Sharpness

Adjust images sharpness.

Shear

Shear images.

Solarize

Solarize images (invert all pixel values above a threshold).

SolarizeAdd

SolarizeAdd images (add a certain value to pixels below a threshold).

Translate

Translate images.

Formatting

Collect

class mmcls.datasets.pipelines.Collect(keys, meta_keys=('filename', 'ori_filename', 'ori_shape', 'img_shape', 'flip', 'flip_direction', 'img_norm_cfg'))[source]

Collect data from the loader relevant to the specific task.

This is usually the last stage of the data loader pipeline. Typically keys is set to some subset of “img” and “gt_label”.

Parameters
  • keys (Sequence[str]) – Keys of results to be collected in data.

  • meta_keys (Sequence[str], optional) – Meta keys to be converted to mmcv.DataContainer and collected in data[img_metas]. Default: (‘filename’, ‘ori_shape’, ‘img_shape’, ‘flip’, ‘flip_direction’, ‘img_norm_cfg’)

Returns

The result dict contains the following keys

  • keys in self.keys

  • img_metas if available

Return type

dict

ImageToTensor

class mmcls.datasets.pipelines.ImageToTensor(keys)[source]

ToNumpy

class mmcls.datasets.pipelines.ToNumpy[source]

ToPIL

class mmcls.datasets.pipelines.ToPIL[source]

ToTensor

class mmcls.datasets.pipelines.ToTensor(keys)[source]

Transpose

class mmcls.datasets.pipelines.Transpose(keys, order)[source]
Read the Docs v: latest
Versions
master
latest
1.x
dev-1.x
Downloads
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.