MultiLabelMetric¶
- class mmcls.evaluation.MultiLabelMetric(thr=None, topk=None, items=('precision', 'recall', 'f1-score'), average='macro', collect_device='cpu', prefix=None)[source]¶
A collection of precision, recall, f1-score and support for multi-label tasks.
The collection of metrics is for single-label multi-class classification. And all these metrics are based on the confusion matrix of every category:
All metrics can be formulated use variables above:
Precision is the fraction of correct predictions in all predictions:
\[\text{Precision} = \frac{TP}{TP+FP}\]Recall is the fraction of correct predictions in all targets:
\[\text{Recall} = \frac{TP}{TP+FN}\]F1-score is the harmonic mean of the precision and recall:
\[\text{F1-score} = \frac{2\times\text{Recall}\times\text{Precision}}{\text{Recall}+\text{Precision}}\]Support is the number of samples:
\[\text{Support} = TP + TN + FN + FP\]- Parameters
thr (float, optional) – Predictions with scores under the threshold are considered as negative. If None, the
topk
predictions will be considered as positive. If thetopk
is also None, usethr=0.5
as default. Defaults to None.topk (int, optional) – Predictions with the k-th highest scores are considered as positive. If None, use
thr
to determine positive predictions. If boththr
andtopk
are not None, usethr
. Defaults to None.items (Sequence[str]) – The detailed metric items to evaluate, select from “precision”, “recall”, “f1-score” and “support”. Defaults to
('precision', 'recall', 'f1-score')
.average (str | None) –
How to calculate the final metrics from the confusion matrix of every category. It supports three modes:
”macro”: Calculate metrics for each category, and calculate the mean value over all categories.
”micro”: Average the confusion matrix over all categories and calculate metrics on the mean confusion matrix.
None: Calculate metrics of every category and output directly.
Defaults to “macro”.
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.
Examples
>>> import torch >>> from mmcls.evaluation import MultiLabelMetric >>> # ------ The Basic Usage for category indices labels ------- >>> y_pred = [[0], [1], [0, 1], [3]] >>> y_true = [[0, 3], [0, 2], [1], [3]] >>> # Output precision, recall, f1-score and support >>> MultiLabelMetric.calculate( ... y_pred, y_true, pred_indices=True, target_indices=True, num_classes=4) (tensor(50.), tensor(50.), tensor(45.8333), tensor(6)) >>> # ----------- The Basic Usage for one-hot labels ----------- >>> y_pred = torch.tensor([[1, 1, 0, 0], ... [1, 1, 0, 0], ... [0, 0, 1, 0], ... [0, 1, 0, 0], ... [0, 1, 0, 0]]) >>> y_true = torch.Tensor([[1, 1, 0, 0], ... [0, 0, 1, 0], ... [1, 1, 1, 0], ... [1, 0, 0, 0], ... [1, 0, 0, 0]]) >>> MultiLabelMetric.calculate(y_pred, y_true) (tensor(43.7500), tensor(31.2500), tensor(33.3333), tensor(8)) >>> # --------- The Basic Usage for one-hot pred scores --------- >>> y_pred = torch.rand(y_true.size()) >>> y_pred tensor([[0.4575, 0.7335, 0.3934, 0.2572], [0.1318, 0.1004, 0.8248, 0.6448], [0.8349, 0.6294, 0.7896, 0.2061], [0.4037, 0.7308, 0.6713, 0.8374], [0.3779, 0.4836, 0.0313, 0.0067]]) >>> # Calculate with different threshold. >>> MultiLabelMetric.calculate(y_pred, y_true, thr=0.1) (tensor(42.5000), tensor(75.), tensor(53.1746), tensor(8)) >>> # Calculate with topk. >>> MultiLabelMetric.calculate(y_pred, y_true, topk=1) (tensor(62.5000), tensor(31.2500), tensor(39.1667), tensor(8)) >>> >>> # ------------------- Use with Evalutor ------------------- >>> from mmcls.structures import ClsDataSample >>> from mmengine.evaluator import Evaluator >>> data_sampels = [ ... ClsDataSample().set_pred_score(pred).set_gt_score(gt) ... for pred, gt in zip(torch.rand(1000, 5), torch.randint(0, 2, (1000, 5)))] >>> evaluator = Evaluator(metrics=MultiLabelMetric(thr=0.5)) >>> evaluator.process(data_sampels) >>> evaluator.evaluate(1000) { 'multi-label/precision': 50.72898037055408, 'multi-label/recall': 50.06836461357571, 'multi-label/f1-score': 50.384466955258475 } >>> # Evaluate on each class by using topk strategy >>> evaluator = Evaluator(metrics=MultiLabelMetric(topk=1, average=None)) >>> evaluator.process(data_sampels) >>> evaluator.evaluate(1000) { 'multi-label/precision_top1_classwise': [48.22, 50.54, 50.99, 44.18, 52.5], 'multi-label/recall_top1_classwise': [18.92, 19.22, 19.92, 20.0, 20.27], 'multi-label/f1-score_top1_classwise': [27.18, 27.85, 28.65, 27.54, 29.25] }
- static calculate(pred, target, pred_indices=False, target_indices=False, average='macro', thr=None, topk=None, num_classes=None)[source]¶
Calculate the precision, recall, f1-score.
- Parameters
pred (torch.Tensor | np.ndarray | Sequence) – The prediction results. A
torch.Tensor
ornp.ndarray
with shape(N, num_classes)
or a sequence of index/onehot format labels.target (torch.Tensor | np.ndarray | Sequence) – The prediction results. A
torch.Tensor
ornp.ndarray
with shape(N, num_classes)
or a sequence of index/onehot format labels.pred_indices (bool) – Whether the
pred
is a sequence of category index labels. If True,num_classes
must be set. Defaults to False.target_indices (bool) – Whether the
target
is a sequence of category index labels. If True,num_classes
must be set. Defaults to False.average (str | None) –
How to calculate the final metrics from the confusion matrix of every category. It supports three modes:
”macro”: Calculate metrics for each category, and calculate the mean value over all categories.
”micro”: Average the confusion matrix over all categories and calculate metrics on the mean confusion matrix.
None: Calculate metrics of every category and output directly.
Defaults to “macro”.
thr (float, optional) – Predictions with scores under the thresholds are considered as negative. Defaults to None.
topk (int, optional) – Predictions with the k-th highest scores are considered as positive. Defaults to None.
num_classes (Optional, int) – The number of classes. If the
pred
is indices instead of onehot, this argument is required. Defaults to None.
- Returns
The tuple contains precision, recall and f1-score. And the type of each item is:
torch.Tensor: A tensor for each metric. The shape is (1, ) if
average
is not None, and (C, ) ifaverage
is None.
- Return type
Tuple
Notes
If both
thr
andtopk
are set, usethr` to determine positive predictions. If neither is set, use ``thr=0.5
as default.
- compute_metrics(results)[source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
The computed metrics. The keys are the names of the metrics, and the values are corresponding results.
- Return type
Dict
- process(data_batch, data_samples)[source]¶
Process one batch of data samples.
The processed results should be stored in
self.results
, which will be used to computed the metrics when all batches have been processed.- Parameters
data_batch – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of outputs from the model.