Customize Evaluation Metrics¶

Use metrics in MMClassification¶

In MMClassification, we have provided multiple metrics for both single-label classification and multi-label classification:

Single-label Classification:

Accuracy
SingleLabelMetric, including precision, recall, f1-score and support.

Multi-label Classification:

AveragePrecision, or AP (mAP).
MultiLabelMetric, including precision, recall, f1-score and support.

To use these metrics during validation and testing, we need to modify the val_evaluator and test_evaluator fields in the config file.

Here is several examples:

Calculate top-1 and top-5 accuracy during both validation and test.

val_evaluator = dict(type='Accuracy', topk=(1, 5))
test_evaluator = val_evaluator

Calculate top-1 accuracy, top-5 accuracy, precision and recall during both validation and test.

val_evaluator = [
  dict(type='Accuracy', topk=(1, 5)),
  dict(type='SingleLabelMetric', items=['precision', 'recall']),
]
test_evaluator = val_evaluator

Calculate mAP (mean AveragePrecision), CP (Class-wise mean Precision), CR (Class-wise mean Recall), CF (Class-wise mean F1-score), OP (Overall mean Precision), OR (Overall mean Recall) and OF1 (Overall mean F1-score).

val_evaluator = [
  dict(type='AveragePrecision'),
  dict(type='MultiLabelMetric', average='macro'),  # class-wise mean
  dict(type='MultiLabelMetric', average='micro'),  # overall mean
]
test_evaluator = val_evaluator

Add new metrics¶

MMClassification supports the implementation of customized evaluation metrics for users who pursue higher customization.

You need to create a new file under mmcls/evaluation/metrics, and implement the new metric in the file, for example, in mmcls/evaluation/metrics/my_metric.py. And create a customized evaluation metric class MyMetric which inherits BaseMetric in MMEngine.

The data format processing method process and the metric calculation method compute_metrics need to be overwritten respectively. Add it to the METRICS registry to implement any customized evaluation metric.

from mmengine.evaluator import BaseMetric
from mmcls.registry import METRICS

@METRICS.register_module()
class MyMetric(BaseMetric):

    def process(self, data_batch: Sequence[Dict], data_samples: Sequence[Dict]):
    """ The processed results should be stored in ``self.results``, which will
        be used to computed the metrics when all batches have been processed.
        `data_batch` stores the batch data from dataloader,
        and `data_samples` stores the batch outputs from model.
    """
        ...

    def compute_metrics(self, results: List):
    """ Compute the metrics from processed results and returns the evaluation results.
    """
        ...

Then, import it in the mmcls/evaluation/metrics/__init__.py to add it into the mmcls.evaluation package.

# In mmcls/evaluation/metrics/__init__.py
...
from .my_metric import MyMetric

__all__ = [..., 'MyMetric']

Finally, use MyMetric in the val_evaluator and test_evaluator field of config files.

val_evaluator = dict(type='MyMetric', ...)
test_evaluator = val_evaluator

Note

More details can be found in MMEngine Documentation: Evaluation.