Note

You are reading the documentation for MMClassification 0.x, which will soon be deprecated at the end of 2022. We recommend you upgrade to MMClassification 1.0 to enjoy fruitful new features and better performance brought by OpenMMLab 2.0. Check the installation tutorial, migration tutorial and changelog for more details.

# NPU (HUAWEI Ascend)¶

## Usage¶

### General Usage¶

Please install MMCV with NPU device support according to the tutorial.

Here we use 8 NPUs on your computer to train the model with the following command:

bash ./tools/dist_train.sh configs/resnet/resnet50_8xb32_in1k.py 8 --device npu


Also, you can use only one NPU to train the model with the following command:

python ./tools/train.py configs/resnet/resnet50_8xb32_in1k.py --device npu


### High-performance Usage on ARM server¶

Since the scheduling ability of ARM CPUs when processing resource preemption is not as good as that of X86 CPUs during multi-card training, we provide a high-performance startup script to accelerate training with the following command:

# The script under the 8 cards of a single machine is shown here
bash tools/dist_train_arm.sh configs/resnet/resnet50_8xb32_in1k.py 8 --device npu --cfg-options data.workers_per_gpu=$(($(nproc)/8))


For resnet50 8 NPUs training with batch_size(data.samples_per_gpu)=512, the performance data is shown below:

CPU

Start Script

IterTime(s)

ARM(Kunpeng920 *4)

./tools/dist_train.sh

~0.9(0.85-1.0)

ARM(Kunpeng920 *4)

./tools/dist_train_arm.sh

~0.8(0.78s-0.85)

## Models Results¶

Model

Top-1 (%)

Top-5 (%)

Config

Download

ResNet-50

76.38

93.22

config

model | log

ResNetXt-32x4d-50

77.55

93.75

config

model | log

HRNet-W18

77.01

93.46

config

model | log

ResNetV1D-152

79.11

94.54

config

model | log

SE-ResNet-50

77.64

93.76

config

model | log

VGG-11

68.92

88.83

config

model | log

ShuffleNetV2 1.0x

69.53

88.82

config

model | log

MobileNetV2

71.758

90.394

config

model | log

MobileNetV3-Small

67.522

87.316

config

model | log

*CSPResNeXt50

77.10

93.55

config

model | log

*EfficientNet-B4(AA + AdvProp)

75.55

92.86

config

model | log

**DenseNet121

72.62

91.04

config

model | log

Notes:

• If not specially marked, the results are almost same between results on the NPU and results on the GPU with FP32.

• (*) The training results of these models are lower than the results on the readme in the corresponding model, mainly because the results on the readme are directly the weight of the timm of the eval, and the results on this side are retrained according to the config with mmcls. The results of the config training on the GPU are consistent with the results of the NPU.

• (**) The accuracy of this model is slightly lower because config is a 4-card config, we use 8 cards to run, and users can adjust hyperparameters to get the best accuracy results.

All above models are provided by Huawei Ascend group.