Shortcuts

MultiLabelMetric

class mmcls.evaluation.MultiLabelMetric(thr=None, topk=None, items=('precision', 'recall', 'f1-score'), average='macro', collect_device='cpu', prefix=None)[source]

A collection of precision, recall, f1-score and support for multi-label tasks.

The collection of metrics is for single-label multi-class classification. And all these metrics are based on the confusion matrix of every category:

../../_images/confusion-matrix.png

All metrics can be formulated use variables above:

Precision is the fraction of correct predictions in all predictions:

\[\text{Precision} = \frac{TP}{TP+FP}\]

Recall is the fraction of correct predictions in all targets:

\[\text{Recall} = \frac{TP}{TP+FN}\]

F1-score is the harmonic mean of the precision and recall:

\[\text{F1-score} = \frac{2\times\text{Recall}\times\text{Precision}}{\text{Recall}+\text{Precision}}\]

Support is the number of samples:

\[\text{Support} = TP + TN + FN + FP\]
Parameters
  • thr (float, optional) – Predictions with scores under the threshold are considered as negative. If None, the topk predictions will be considered as positive. If the topk is also None, use thr=0.5 as default. Defaults to None.

  • topk (int, optional) – Predictions with the k-th highest scores are considered as positive. If None, use thr to determine positive predictions. If both thr and topk are not None, use thr. Defaults to None.

  • items (Sequence[str]) – The detailed metric items to evaluate, select from “precision”, “recall”, “f1-score” and “support”. Defaults to ('precision', 'recall', 'f1-score').

  • average (str | None) –

    How to calculate the final metrics from the confusion matrix of every category. It supports three modes:

    • ”macro”: Calculate metrics for each category, and calculate the mean value over all categories.

    • ”micro”: Average the confusion matrix over all categories and calculate metrics on the mean confusion matrix.

    • None: Calculate metrics of every category and output directly.

    Defaults to “macro”.

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

Examples

>>> import torch
>>> from mmcls.evaluation import MultiLabelMetric
>>> # ------ The Basic Usage for category indices labels -------
>>> y_pred = [[0], [1], [0, 1], [3]]
>>> y_true = [[0, 3], [0, 2], [1], [3]]
>>> # Output precision, recall, f1-score and support
>>> MultiLabelMetric.calculate(
...     y_pred, y_true, pred_indices=True, target_indices=True, num_classes=4)
(tensor(50.), tensor(50.), tensor(45.8333), tensor(6))
>>> # ----------- The Basic Usage for one-hot labels -----------
>>> y_pred = torch.tensor([[1, 1, 0, 0],
...                        [1, 1, 0, 0],
...                        [0, 0, 1, 0],
...                        [0, 1, 0, 0],
...                        [0, 1, 0, 0]])
>>> y_true = torch.Tensor([[1, 1, 0, 0],
...                        [0, 0, 1, 0],
...                        [1, 1, 1, 0],
...                        [1, 0, 0, 0],
...                        [1, 0, 0, 0]])
>>> MultiLabelMetric.calculate(y_pred, y_true)
(tensor(43.7500), tensor(31.2500), tensor(33.3333), tensor(8))
>>> # --------- The Basic Usage for one-hot pred scores ---------
>>> y_pred = torch.rand(y_true.size())
>>> y_pred
tensor([[0.4575, 0.7335, 0.3934, 0.2572],
[0.1318, 0.1004, 0.8248, 0.6448],
[0.8349, 0.6294, 0.7896, 0.2061],
[0.4037, 0.7308, 0.6713, 0.8374],
[0.3779, 0.4836, 0.0313, 0.0067]])
>>> # Calculate with different threshold.
>>> MultiLabelMetric.calculate(y_pred, y_true, thr=0.1)
(tensor(42.5000), tensor(75.), tensor(53.1746), tensor(8))
>>> # Calculate with topk.
>>> MultiLabelMetric.calculate(y_pred, y_true, topk=1)
(tensor(62.5000), tensor(31.2500), tensor(39.1667), tensor(8))
>>>
>>> # ------------------- Use with Evalutor -------------------
>>> from mmcls.structures import ClsDataSample
>>> from mmengine.evaluator import Evaluator
>>> data_sampels = [
...     ClsDataSample().set_pred_score(pred).set_gt_score(gt)
...     for pred, gt in zip(torch.rand(1000, 5), torch.randint(0, 2, (1000, 5)))]
>>> evaluator = Evaluator(metrics=MultiLabelMetric(thr=0.5))
>>> evaluator.process(data_sampels)
>>> evaluator.evaluate(1000)
{
    'multi-label/precision': 50.72898037055408,
    'multi-label/recall': 50.06836461357571,
    'multi-label/f1-score': 50.384466955258475
}
>>> # Evaluate on each class by using topk strategy
>>> evaluator = Evaluator(metrics=MultiLabelMetric(topk=1, average=None))
>>> evaluator.process(data_sampels)
>>> evaluator.evaluate(1000)
{
    'multi-label/precision_top1_classwise': [48.22, 50.54, 50.99, 44.18, 52.5],
    'multi-label/recall_top1_classwise': [18.92, 19.22, 19.92, 20.0, 20.27],
    'multi-label/f1-score_top1_classwise': [27.18, 27.85, 28.65, 27.54, 29.25]
}
static calculate(pred, target, pred_indices=False, target_indices=False, average='macro', thr=None, topk=None, num_classes=None)[source]

Calculate the precision, recall, f1-score.

Parameters
  • pred (torch.Tensor | np.ndarray | Sequence) – The prediction results. A torch.Tensor or np.ndarray with shape (N, num_classes) or a sequence of index/onehot format labels.

  • target (torch.Tensor | np.ndarray | Sequence) – The prediction results. A torch.Tensor or np.ndarray with shape (N, num_classes) or a sequence of index/onehot format labels.

  • pred_indices (bool) – Whether the pred is a sequence of category index labels. If True, num_classes must be set. Defaults to False.

  • target_indices (bool) – Whether the target is a sequence of category index labels. If True, num_classes must be set. Defaults to False.

  • average (str | None) –

    How to calculate the final metrics from the confusion matrix of every category. It supports three modes:

    • ”macro”: Calculate metrics for each category, and calculate the mean value over all categories.

    • ”micro”: Average the confusion matrix over all categories and calculate metrics on the mean confusion matrix.

    • None: Calculate metrics of every category and output directly.

    Defaults to “macro”.

  • thr (float, optional) – Predictions with scores under the thresholds are considered as negative. Defaults to None.

  • topk (int, optional) – Predictions with the k-th highest scores are considered as positive. Defaults to None.

  • num_classes (Optional, int) – The number of classes. If the pred is indices instead of onehot, this argument is required. Defaults to None.

Returns

The tuple contains precision, recall and f1-score. And the type of each item is:

  • torch.Tensor: A tensor for each metric. The shape is (1, ) if average is not None, and (C, ) if average is None.

Return type

Tuple

Notes

If both thr and topk are set, use thr` to determine positive predictions. If neither is set, use ``thr=0.5 as default.

compute_metrics(results)[source]

Compute the metrics from processed results.

Parameters

results (list) – The processed results of each batch.

Returns

The computed metrics. The keys are the names of the metrics, and the values are corresponding results.

Return type

Dict

process(data_batch, data_samples)[source]

Process one batch of data samples.

The processed results should be stored in self.results, which will be used to computed the metrics when all batches have been processed.

Parameters
  • data_batch – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of outputs from the model.

Read the Docs v: 1.x
Versions
master
latest
1.x
dev-1.x
Downloads
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.