# SingleLabelMetric¶

class mmcls.evaluation.SingleLabelMetric(thrs=0.0, items=('precision', 'recall', 'f1-score'), average='macro', num_classes=None, collect_device='cpu', prefix=None)[source]

A collection of precision, recall, f1-score and support for single-label tasks.

The collection of metrics is for single-label multi-class classification. And all these metrics are based on the confusion matrix of every category:

All metrics can be formulated use variables above:

Precision is the fraction of correct predictions in all predictions:

$\text{Precision} = \frac{TP}{TP+FP}$

Recall is the fraction of correct predictions in all targets:

$\text{Recall} = \frac{TP}{TP+FN}$

F1-score is the harmonic mean of the precision and recall:

$\text{F1-score} = \frac{2\times\text{Recall}\times\text{Precision}}{\text{Recall}+\text{Precision}}$

Support is the number of samples:

$\text{Support} = TP + TN + FN + FP$
Parameters
• thrs (Sequence[float | None] | float | None) – If a float, predictions with score lower than the threshold will be regard as the negative prediction. If None, only the top-1 prediction will be regard as the positive prediction. If the parameter is a tuple, accuracy based on all thresholds will be calculated and outputted together. Defaults to 0.

• items (Sequence[str]) – The detailed metric items to evaluate, select from “precision”, “recall”, “f1-score” and “support”. Defaults to ('precision', 'recall', 'f1-score').

• average (str | None) –

How to calculate the final metrics from the confusion matrix of every category. It supports three modes:

• ”macro”: Calculate metrics for each category, and calculate the mean value over all categories.

• ”micro”: Average the confusion matrix over all categories and calculate metrics on the mean confusion matrix.

• None: Calculate metrics of every category and output directly.

Defaults to “macro”.

• num_classes (int, optional) – The number of classes. Defaults to None.

• collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

• prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

Examples

>>> import torch
>>> from mmcls.evaluation import SingleLabelMetric
>>> # -------------------- The Basic Usage --------------------
>>> y_pred = [0, 1, 1, 3]
>>> y_true = [0, 2, 1, 3]
>>> # Output precision, recall, f1-score and support.
>>> SingleLabelMetric.calculate(y_pred, y_true, num_classes=4)
(tensor(62.5000), tensor(75.), tensor(66.6667), tensor(4))
>>> # Calculate with different thresholds.
>>> y_score = torch.rand((1000, 10))
>>> y_true = torch.zeros((1000, ))
>>> SingleLabelMetric.calculate(y_score, y_true, thrs=(0., 0.9))
[(tensor(10.), tensor(0.9500), tensor(1.7352), tensor(1000)),
(tensor(10.), tensor(0.5500), tensor(1.0427), tensor(1000))]
>>>
>>> # ------------------- Use with Evalutor -------------------
>>> from mmcls.structures import ClsDataSample
>>> from mmengine.evaluator import Evaluator
>>> data_samples = [
...     ClsDataSample().set_gt_label(i%5).set_pred_score(torch.rand(5))
...     for i in range(1000)
... ]
>>> evaluator = Evaluator(metrics=SingleLabelMetric())
>>> evaluator.process(data_samples)
>>> evaluator.evaluate(1000)
{'single-label/precision': 19.650691986083984,
'single-label/recall': 19.600000381469727,
'single-label/f1-score': 19.619548797607422}
>>> # Evaluate on each class
>>> evaluator = Evaluator(metrics=SingleLabelMetric(average=None))
>>> evaluator.process(data_samples)
>>> evaluator.evaluate(1000)
{
'single-label/precision_classwise': [21.1, 18.7, 17.8, 19.4, 16.1],
'single-label/recall_classwise': [18.5, 18.5, 17.0, 20.0, 18.0],
'single-label/f1-score_classwise': [19.7, 18.6, 17.1, 19.7, 17.0]
}

static calculate(pred, target, thrs=(0.0,), average='macro', num_classes=None)[source]

Calculate the precision, recall, f1-score and support.

Parameters
• pred (torch.Tensor | np.ndarray | Sequence) – The prediction results. It can be labels (N, ), or scores of every class (N, C).

• target (torch.Tensor | np.ndarray | Sequence) – The target of each prediction with shape (N, ).

• thrs (Sequence[float | None]) – Predictions with scores under the thresholds are considered negative. It’s only used when pred is scores. None means no thresholds. Defaults to (0., ).

• average (str | None) –

How to calculate the final metrics from the confusion matrix of every category. It supports three modes:

• ”macro”: Calculate metrics for each category, and calculate the mean value over all categories.

• ”micro”: Average the confusion matrix over all categories and calculate metrics on the mean confusion matrix.

• None: Calculate metrics of every category and output directly.

Defaults to “macro”.

• num_classes (Optional, int) – The number of classes. If the pred is label instead of scores, this argument is required. Defaults to None.

Returns

The tuple contains precision, recall and f1-score. And the type of each item is:

• torch.Tensor: If the pred is a sequence of label instead of score (number of dimensions is 1). Only returns a tensor for each metric. The shape is (1, ) if classwise is False, and (C, ) if classwise is True.

• List[torch.Tensor]: If the pred is a sequence of score (number of dimensions is 2). Return the metrics on each thrs. The shape of tensor is (1, ) if classwise is False, and (C, ) if classwise is True.

Return type

Tuple

compute_metrics(results)[source]

Compute the metrics from processed results.

Parameters

results (list) – The processed results of each batch.

Returns

The computed metrics. The keys are the names of the metrics, and the values are corresponding results.

Return type

Dict

process(data_batch, data_samples)[source]

Process one batch of data samples.

The processed results should be stored in self.results, which will be used to computed the metrics when all batches have been processed.

Parameters
• data_batch – A batch of data from the dataloader.

• data_samples (Sequence[dict]) – A batch of outputs from the model.