CutMix¶
- class mmcls.models.utils.batch_augments.CutMix(alpha, cutmix_minmax=None, correct_lam=True)[source]¶
CutMix batch agumentation.
CutMix is a method to improve the network’s generalization capability. It’s proposed in CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features <https://arxiv.org/abs/1905.04899>
With this method, patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches.
- Parameters
alpha (float) – Parameters for Beta distribution to generate the mixing ratio. It should be a positive number. More details can be found in
Mixup
.cutmix_minmax (List[float], optional) – The min/max area ratio of the patches. If not None, the bounding-box of patches is uniform sampled within this ratio range, and the
alpha
will be ignored. Otherwise, the bounding-box is generated according to thealpha
. Defaults to None.correct_lam (bool) – Whether to apply lambda correction when cutmix bbox clipped by image borders. Defaults to True.
Note
If the
cutmix_minmax
is None, how to generate the bounding-box of patches according to thealpha
?First, generate a \(\lambda\), details can be found in
Mixup
. And then, the area ratio of the bounding-box is calculated by:\[\text{ratio} = \sqrt{1-\lambda}\]- mix(batch_inputs, batch_scores)[source]¶
Mix the batch inputs and batch one-hot format ground truth.
- Parameters
batch_inputs (Tensor) – A batch of images tensor in the shape of
(N, C, H, W)
.batch_scores (Tensor) – A batch of one-hot format labels in the shape of
(N, num_classes)
.
- Returns
The mixed inputs and labels.
- Return type
Tuple[Tensor, Tensor)
- rand_bbox(img_shape, lam, margin=0.0, count=None)[source]¶
Standard CutMix bounding-box that generates a random square bbox based on lambda value. This implementation includes support for enforcing a border margin as percent of bbox dimensions.
- rand_bbox_minmax(img_shape, count=None)[source]¶
Min-Max CutMix bounding-box Inspired by Darknet cutmix implementation. It generates a random rectangular bbox based on min/max percent values applied to each dimension of the input image.
Typical defaults for minmax are usually in the .2-.3 for min and .8-.9 range for max.