Note
You are reading the documentation for MMClassification 0.x, which will soon be deprecated at the end of 2022. We recommend you upgrade to MMClassification 1.0 to enjoy fruitful new features and better performance brought by OpenMMLab 2.0. Check the installation tutorial, migration tutorial and changelog for more details.
Data Transformations¶
In MMClassification, the data preparation and the dataset is decomposed. The datasets only define how to get samples’ basic information from the file system. These basic information includes the ground-truth label and raw images data / the paths of images.
To prepare the inputs data, we need to do some transformations on these basic
information. These transformations includes loading, preprocessing and
formatting. And a series of data transformations makes up a data pipeline.
Therefore, you can find the a pipeline
argument in the configs of dataset,
for example:
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='RandomResizedCrop', size=224),
dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_label']),
dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', size=256),
dict(type='CenterCrop', crop_size=224),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]
data = dict(
train=dict(..., pipeline=train_pipeline),
val=dict(..., pipeline=test_pipeline),
test=dict(..., pipeline=test_pipeline),
)
Every item of a pipeline list is one of the following data transformations class. And if you want to add a custom data transformation class, the tutorial Custom Data Pipelines will help you.
mmcls.datasets.pipelines
Loading¶
LoadImageFromFile¶
- class mmcls.datasets.pipelines.LoadImageFromFile(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶
Load an image from file.
Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape) and “img_norm_cfg” (means=0 and stds=1).
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for
mmcv.imfrombytes()
. Defaults to ‘color’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.
Preprocessing and Augmentation¶
CenterCrop¶
- class mmcls.datasets.pipelines.CenterCrop(crop_size, efficientnet_style=False, crop_padding=32, interpolation='bilinear', backend='cv2')[source]¶
Center crop the image.
- Parameters
crop_size (int | tuple) – Expected size after cropping with the format of (h, w).
efficientnet_style (bool) – Whether to use efficientnet style center crop. Defaults to False.
crop_padding (int) – The crop padding parameter in efficientnet style center crop. Only valid if efficientnet style is True. Defaults to 32.
interpolation (str) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Only valid if
efficientnet_style
is True. Defaults to ‘bilinear’.backend (str) – The image resize backend type, accepted values are cv2 and pillow. Only valid if efficientnet style is True. Defaults to cv2.
Notes
If the image is smaller than the crop size, return the original image.
If efficientnet_style is set to False, the pipeline would be a simple center crop using the crop_size.
If efficientnet_style is set to True, the pipeline will be to first to perform the center crop with the
crop_size_
as:
\[\text{crop_size_} = \frac{\text{crop_size}}{\text{crop_size} + \text{crop_padding}} \times \text{short_edge}\]And then the pipeline resizes the img to the input crop size.
Lighting¶
- class mmcls.datasets.pipelines.Lighting(eigval, eigvec, alphastd=0.1, to_rgb=True)[source]¶
Adjust images lighting using AlexNet-style PCA jitter.
- Parameters
eigval (list) – the eigenvalue of the convariance matrix of pixel values, respectively.
eigvec (list[list]) – the eigenvector of the convariance matrix of pixel values, respectively.
alphastd (float) – The standard deviation for distribution of alpha. Defaults to 0.1
to_rgb (bool) – Whether to convert img to rgb.
Normalize¶
Pad¶
- class mmcls.datasets.pipelines.Pad(size=None, pad_to_square=False, pad_val=0, padding_mode='constant')[source]¶
Pad images.
- Parameters
size (tuple[int] | None) – Expected padding size (h, w). Conflicts with pad_to_square. Defaults to None.
pad_to_square (bool) – Pad any image to square shape. Defaults to False.
pad_val (Number | Sequence[Number]) – Values to be filled in padding areas when padding_mode is ‘constant’. Default to 0.
padding_mode (str) – Type of padding. Should be: constant, edge, reflect or symmetric. Default to “constant”.
Resize¶
- class mmcls.datasets.pipelines.Resize(size, interpolation='bilinear', adaptive_side='short', backend='cv2')[source]¶
Resize images.
- Parameters
size (int | tuple) – Images scales for resizing (h, w). When size is int, the default behavior is to resize an image to (size, size). When size is tuple and the second value is -1, the image will be resized according to adaptive_side. For example, when size is 224, the image is resized to 224x224. When size is (224, -1) and adaptive_size is “short”, the short side is resized to 224 and the other side is computed based on the short side, maintaining the aspect ratio.
interpolation (str) – Interpolation method. For “cv2” backend, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos”. For “pillow” backend, accepted values are “nearest”, “bilinear”, “bicubic”, “box”, “lanczos”, “hamming”. More details can be found in mmcv.image.geometric.
adaptive_side (str) – Adaptive resize policy, accepted values are “short”, “long”, “height”, “width”. Default to “short”.
backend (str) – The image resize backend type, accepted values are cv2 and pillow. Default: cv2.
RandomCrop¶
- class mmcls.datasets.pipelines.RandomCrop(size, padding=None, pad_if_needed=False, pad_val=0, padding_mode='constant')[source]¶
Crop the given Image at a random location.
- Parameters
size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.
padding (int or sequence, optional) – Optional padding on each border of the image. If a sequence of length 4 is provided, it is used to pad left, top, right, bottom borders respectively. If a sequence of length 2 is provided, it is used to pad left/right, top/bottom borders, respectively. Default: None, which means no padding.
pad_if_needed (boolean) – It will pad the image if smaller than the desired size to avoid raising an exception. Since cropping is done after padding, the padding seems to be done at a random offset. Default: False.
pad_val (Number | Sequence[Number]) – Pixel pad_val value for constant fill. If a tuple of length 3, it is used to pad_val R, G, B channels respectively. Default: 0.
padding_mode (str) –
Type of padding. Defaults to “constant”. Should be one of the following:
constant: Pads with a constant value, this value is specified with pad_val.
edge: pads with the last value at the edge of the image.
reflect: Pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2].
symmetric: Pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3].
RandomErasing¶
- class mmcls.datasets.pipelines.RandomErasing(erase_prob=0.5, min_area_ratio=0.02, max_area_ratio=0.4, aspect_range=(0.3, 3.3333333333333335), mode='const', fill_color=(128, 128, 128), fill_std=None)[source]¶
Randomly selects a rectangle region in an image and erase pixels.
- Parameters
erase_prob (float) – Probability that image will be randomly erased. Default: 0.5
min_area_ratio (float) – Minimum erased area / input image area Default: 0.02
max_area_ratio (float) – Maximum erased area / input image area Default: 0.4
aspect_range (sequence | float) – Aspect ratio range of erased area. if float, it will be converted to (aspect_ratio, 1/aspect_ratio) Default: (3/10, 10/3)
mode (str) –
Fill method in erased area, can be:
const (default): All pixels are assign with the same value.
rand: each pixel is assigned with a random value in [0, 255]
fill_color (sequence | Number) – Base color filled in erased area. Defaults to (128, 128, 128).
fill_std (sequence | Number, optional) – If set and
mode
is ‘rand’, fill erased area with random color from normal distribution (mean=fill_color, std=fill_std); If not set, fill erased area with random color from uniform distribution (0~255). Defaults to None.
Note
See Random Erasing Data Augmentation
This paper provided 4 modes: RE-R, RE-M, RE-0, RE-255, and use RE-M as default. The config of these 4 modes are:
RE-R: RandomErasing(mode=’rand’)
RE-M: RandomErasing(mode=’const’, fill_color=(123.67, 116.3, 103.5))
RE-0: RandomErasing(mode=’const’, fill_color=0)
RE-255: RandomErasing(mode=’const’, fill_color=255)
RandomFlip¶
RandomGrayscale¶
- class mmcls.datasets.pipelines.RandomGrayscale(gray_prob=0.1)[source]¶
Randomly convert image to grayscale with a probability of gray_prob.
- Parameters
gray_prob (float) – Probability that image should be converted to grayscale. Default: 0.1.
- Returns
Image after randomly grayscale transform.
- Return type
ndarray
Notes
If input image is 1 channel: grayscale version is 1 channel.
If input image is 3 channel: grayscale version is 3 channel with r == g == b.
RandomResizedCrop¶
- class mmcls.datasets.pipelines.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), max_attempts=10, efficientnet_style=False, min_covered=0.1, crop_padding=32, interpolation='bilinear', backend='cv2')[source]¶
Crop the given image to random size and aspect ratio.
A crop of random size (default: of 0.08 to 1.0) of the original size and a random aspect ratio (default: of 3/4 to 4/3) of the original aspect ratio is made. This crop is finally resized to given size.
- Parameters
size (sequence | int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made.
scale (tuple) – Range of the random size of the cropped image compared to the original image. Defaults to (0.08, 1.0).
ratio (tuple) – Range of the random aspect ratio of the cropped image compared to the original image. Defaults to (3. / 4., 4. / 3.).
max_attempts (int) – Maximum number of attempts before falling back to Central Crop. Defaults to 10.
efficientnet_style (bool) – Whether to use efficientnet style Random ResizedCrop. Defaults to False.
min_covered (Number) – Minimum ratio of the cropped area to the original area. Only valid if efficientnet_style is true. Defaults to 0.1.
crop_padding (int) – The crop padding parameter in efficientnet style center crop. Only valid if efficientnet_style is true. Defaults to 32.
interpolation (str) – Interpolation method, accepted values are ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’, ‘lanczos’. Defaults to ‘bilinear’.
backend (str) – The image resize backend type, accepted values are cv2 and pillow. Defaults to cv2.
ColorJitter¶
- class mmcls.datasets.pipelines.ColorJitter(brightness, contrast, saturation)[source]¶
Randomly change the brightness, contrast and saturation of an image.
- Parameters
brightness (float) – How much to jitter brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness].
contrast (float) – How much to jitter contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast].
saturation (float) – How much to jitter saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation].
Composed Augmentation¶
Composed augmentation is a kind of methods which compose a series of data
augmentation transformations, such as AutoAugment
and RandAugment
.
- class mmcls.datasets.pipelines.AutoAugment(policies, hparams={'pad_val': 128})[source]¶
Auto augmentation.
This data augmentation is proposed in AutoAugment: Learning Augmentation Policies from Data.
- Parameters
policies (list[list[dict]]) – The policies of auto augmentation. Each policy in
policies
is a specific augmentation policy, and is composed by several augmentations (dict). When AutoAugment is called, a random policy inpolicies
will be selected to augment images.hparams (dict) – Configs of hyperparameters. Hyperparameters will be used in policies that require these arguments if these arguments are not set in policy dicts. Defaults to use _HPARAMS_DEFAULT.
- class mmcls.datasets.pipelines.RandAugment(policies, num_policies, magnitude_level, magnitude_std=0.0, total_level=30, hparams={'pad_val': 128})[source]¶
Random augmentation.
This data augmentation is proposed in RandAugment: Practical automated data augmentation with a reduced search space.
- Parameters
policies (list[dict]) – The policies of random augmentation. Each policy in
policies
is one specific augmentation policy (dict). The policy shall at least have key type, indicating the type of augmentation. For those which have magnitude, (given to the fact they are named differently in different augmentation, ) magnitude_key and magnitude_range shall be the magnitude argument (str) and the range of magnitude (tuple in the format of (val1, val2)), respectively. Note that val1 is not necessarily less than val2.num_policies (int) – Number of policies to select from policies each time.
magnitude_level (int | float) – Magnitude level for all the augmentation selected.
total_level (int | float) – Total level for the magnitude. Defaults to 30.
magnitude_std (Number | str) –
Deviation of magnitude noise applied.
If positive number, magnitude is sampled from normal distribution (mean=magnitude, std=magnitude_std).
If 0 or negative number, magnitude remains unchanged.
If str “inf”, magnitude is sampled from uniform distribution (range=[min, magnitude]).
hparams (dict) – Configs of hyperparameters. Hyperparameters will be used in policies that require these arguments if these arguments are not set in policy dicts. Defaults to use _HPARAMS_DEFAULT.
Note
magnitude_std will introduce some randomness to policy, modified by https://github.com/rwightman/pytorch-image-models.
When magnitude_std=0, we calculate the magnitude as follows:
\[\text{magnitude} = \frac{\text{magnitude_level}} {\text{totallevel}} \times (\text{val2} - \text{val1}) + \text{val1}\]
In composed augmentation, we need to specify several data transformations or
several groups of data transformations (The policies
argument) as the
random sampling space. These data transformations are chosen from the below
table. In addition, we provide some preset policies in this folder.
Auto adjust image contrast. |
|
Adjust images brightness. |
|
Adjust images color balance. |
|
Adjust images contrast. |
|
Cutout images. |
|
Equalize the image histogram. |
|
Invert images. |
|
Posterize images (reduce the number of bits for each color channel). |
|
Rotate images. |
|
Adjust images sharpness. |
|
Shear images. |
|
Solarize images (invert all pixel values above a threshold). |
|
SolarizeAdd images (add a certain value to pixels below a threshold). |
|
Translate images. |
Formatting¶
Collect¶
- class mmcls.datasets.pipelines.Collect(keys, meta_keys=('filename', 'ori_filename', 'ori_shape', 'img_shape', 'flip', 'flip_direction', 'img_norm_cfg'))[source]¶
Collect data from the loader relevant to the specific task.
This is usually the last stage of the data loader pipeline. Typically keys is set to some subset of “img” and “gt_label”.
- Parameters
- Returns
The result dict contains the following keys
keys in
self.keys
img_metas
if available
- Return type