LabelSmoothLoss¶

class mmpretrain.models.losses.LabelSmoothLoss(label_smooth_val, num_classes=None, use_sigmoid=None, mode='original', reduction='mean', loss_weight=1.0, class_weight=None, pos_weight=None)[source]¶

Initializer for the label smoothed cross entropy loss.

Refers to Rethinking the Inception Architecture for Computer Vision

This decreases gap between output scores and encourages generalization. Labels provided to forward can be one-hot like vectors (NxC) or class indices (Nx1). And this accepts linear combination of one-hot like labels from mixup or cutmix except multi-label task.

Parameters:

label_smooth_val (float) – The degree of label smoothing.
num_classes (int, optional) – Number of classes. Defaults to None.
mode (str) – Refers to notes, Options are ‘original’, ‘classy_vision’, ‘multi_label’. Defaults to ‘original’.
use_sigmoid (bool, optional) – Whether the prediction uses sigmoid of softmax. Defaults to None, which means to use sigmoid in “multi_label” mode and not use in other modes.
reduction (str) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. Defaults to ‘mean’.
loss_weight (float) – Weight of the loss. Defaults to 1.0.

Notes

if the mode is “original”, this will use the same label smooth method as the original paper as:

\[(1-\epsilon)\delta_{k, y} + \frac{\epsilon}{K}\]

where \(\epsilon\) is the label_smooth_val, \(K\) is the num_classes and \(\delta_{k, y}\) is Dirac delta, which equals 1 for \(k=y\) and 0 otherwise.
if the mode is “classy_vision”, this will use the same label smooth method as the facebookresearch/ClassyVision repo as:

\[\frac{\delta_{k, y} + \epsilon/K}{1+\epsilon}\]
if the mode is “multi_label”, this will accept labels from multi-label task and smoothing them as:

\[(1-2\epsilon)\delta_{k, y} + \epsilon\]

forward(cls_score, label, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]¶

Label smooth loss.

Parameters:

pred (torch.Tensor) – The prediction with shape (N, *).
label (torch.Tensor) – The ground truth label of the prediction with shape (N, *).
weight (torch.Tensor, optional) – Sample-wise loss weight with shape (N, *). Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The method used to reduce the loss into a scalar. Options are “none”, “mean” and “sum”. Defaults to None.

Returns:

Loss.

Return type:

torch.Tensor

generate_one_hot_like_label(label)[source]¶: This function takes one-hot or index label vectors and computes one- hot like label vectors (float)