昇腾设备torch_npu推理配置

1. Ascend 310B1的npu推理思路

  在昇腾 Ascend 310B1 NPU 上基于 PyTorch 进行推理时,通过 torch_npu 替换原有 GPU/CUDA 操作。

  torch_npu的技术参考文档:pytorch: Ascend Extension for PyTorch


2. 推理过程中可能遇到的问题和解决方案

1. NPU 设备不支持 double 数据类型

  • 错误日志中提示:

    Warning: Device do not support double dtype now, dtype cast repalce with float.

       这表明 NPU 设备(Ascend 310B1)不支持 double(即 float64)数据类型,因此系统自动将其转换为 float(即 float32)。

  • 解决方法
    确保所有张量的数据类型为 float32,而不是 double。可以在创建张量时显式指定数据类型:

    x = torch.randn(2, 3, dtype=torch.float32).to(device)
    y = torch.randn(2, 3, dtype=torch.float32).to(device)

    2. 环境变量未配置正确

  • 错误日志中提示:

    ImportError: libhccl.so: cannot open shared object file: No such file or directory

       libhccl.so 文件缺失的问题是一般是因为环境变量配置。在运行包含torch_npu的代码之前需要运行环境配置命令。

  • 解决方法
      在运行实际python文件之前需要在命令行运行以下命令:

  •  3. NPU 设备不支持某些算子

  • 错误日志中提示:

    RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is MaxPoolWithArgmaxV1.

       问题的原因可能是 MaxPoolWithArgmaxV1在当前的 CANN 工具包版本中不被支持。

  • 解决方法
      如果某个算子不被支持,可以尝试使用功能相似的其他算子来替代。MaxPool2d 不被支持,可以尝试使用AvgPool2d代替

  • 错误日志中提示:

    [W compiler_depend.ts:387] Warning: E40021: Failed to compile Op [DropOutDoMask]. (oppath: [Compile /usr/local/Ascend/ascend-toolkit/7.0.RC1/opp/built-in/op_impl/ai_core/tbe/impl/drop_out_do_mask.py failed with errormsg/stack: File "/root/miniconda3/envs/mindspore_py39/lib/python3.9/site-packages/tbe/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
        raise get_last_ffi_error()
    

       问题的原因可能是DropOutDoMask,在当前的 CANN 工具包版本中不被支持。

  • 解决方法
      可以尝试手动实现较为简单的算子dropout

  • # 自定义的 DropoutLayer
    class DropoutLayer(nn.Module):
        def __init__(self, p=0.5):
            super(DropoutLayer, self).__init__()
            self.p = p
    
        def forward(self, x: torch.Tensor) -> torch.Tensor:
            if self.training:  # 仅在训练时应用 dropout
                # 生成 mask (均匀分布随机数,超过 p 的位置保持)
                mask = (torch.rand_like(x) > self.p).float()
                # 将被丢弃的元素设为 0,并缩放剩下的部分
                return x * mask / (1 - self.p)
            return x  # 在评估模式下直接返回输入

3. cpu和npu实际性能表现

1. Alexnet

模型结构如下:

------------------------------------------------------------------
1-Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(1, 1)) 

------------------------------------------------------------------
2-Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)) 

------------------------------------------------------------------
3-Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) 

------------------------------------------------------------------
4-Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) 

------------------------------------------------------------------
5-Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) 

------------------------------------------------------------------
6-Linear(in_features=9216, out_features=4096, bias=True) 

------------------------------------------------------------------
7-Linear(in_features=4096, out_features=4096, bias=True) 

------------------------------------------------------------------
8-Linear(in_features=4096, out_features=1000, bias=True) 
设备\模块12345678总计(ms)
CPU5.3498.9665.1865.3494.071186.74281.3399.404306.406
NPU0.4560.9223.2654.7862.95413.9367.2391.83635.394

 2. lenet

模型结构如下:

------------------------------------------------------------------
1-Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)) 

------------------------------------------------------------------
2-Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1)) 

------------------------------------------------------------------
3-Linear(in_features=46656, out_features=1024, bias=True) 

------------------------------------------------------------------
4-Linear(in_features=1024, out_features=1024, bias=True) 

------------------------------------------------------------------
5-Linear(in_features=1024, out_features=1000, bias=True) 
设备\模块12345总计(ms)
CPU6.4904.001226.3132.6321.814241.250
NPU0.4340.24511.8710.4460.41613.412

  3. resnet

模型结构如下:

------------------------------------------------------------------
1-Sequential(
  (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU(inplace=True)
  (3): AvgPool2d(kernel_size=3, stride=2, padding=1)
) 

------------------------------------------------------------------
2-BasicBlock(
  (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
) 

------------------------------------------------------------------
3-BasicBlock(
  (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
) 

------------------------------------------------------------------
4-BasicBlock(
  (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
) 

------------------------------------------------------------------
5-BasicBlock(
  (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (downsample): Sequential(
    (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
) 

------------------------------------------------------------------
6-BasicBlock(
  (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
) 

------------------------------------------------------------------
7-BasicBlock(
  (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
) 

------------------------------------------------------------------
8-BasicBlock(
  (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
) 

------------------------------------------------------------------
9-BasicBlock(
  (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (downsample): Sequential(
    (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
) 

------------------------------------------------------------------
10-BasicBlock(
  (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
) 

------------------------------------------------------------------
11-BasicBlock(
  (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
) 

------------------------------------------------------------------
12-BasicBlock(
  (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
) 

------------------------------------------------------------------
13-BasicBlock(
  (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
) 

------------------------------------------------------------------
14-BasicBlock(
  (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
) 

------------------------------------------------------------------
15-BasicBlock(
  (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (downsample): Sequential(
    (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
) 

------------------------------------------------------------------
16-BasicBlock(
  (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
) 

------------------------------------------------------------------
17-BasicBlock(
  (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
  (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
) 

------------------------------------------------------------------
18-Sequential(
  (0): AdaptiveAvgPool2d(output_size=(1, 1))
  (1): Flatten(start_dim=1, end_dim=-1)
  (2): Linear(in_features=512, out_features=1000, bias=True)
) 



设备\模块123456789101112131415161718总计(ms)
CPU16.25216.86016.91717.02116.51515.84015.98316.05322.17424.50326.65924.86425.23725.10163.42157.38957.2761.259458.324
NPU3.3131.0620.9930.9791.6331.5201.4991.5094.2185.0885.0875.0745.0755.08716.11921.14021.2820.364101.042

    评论
    添加红包

    请填写红包祝福语或标题

    红包个数最小为10个

    红包金额最低5元

    当前余额3.43前往充值 >
    需支付:10.00
    成就一亿技术人!
    领取后你会自动成为博主和红包主的粉丝 规则
    hope_wisdom
    发出的红包
    实付
    使用余额支付
    点击重新获取
    扫码支付
    钱包余额 0

    抵扣说明:

    1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
    2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

    余额充值