使用mindspore的ResNet101使用GPU进行训练时报错

multiprocessing.context.TimeoutError

RuntimeError: mindspore/ccsrc/backend/session/kernel_build_client.h:109 Response] Response is empty

【操作步骤&问题现象】

1、修改resnet101_imagenet2012_config.yaml中的训练集路径,更改类数量以适应新数据集

2、在models/official/cv/resnet/下使用命令python train.py进行训练

解答:

出现这个问题大概率是开了图算融合特性,然后AKG算子编译卡死超时导致。如果要进一步看相关问题,可能需要你修改一下网络脚本,设置一下svae_graphs=True,然后会在本地生成一个kenel_meta文件,里面有一些相关的INFO,发给我们才能定点分析。(当前这个log看不出来更具体的情况了。)

当然如果只是想跑通这个网络,倒是也可以尝试将train.py里的set_graph_kernel_context这个函数稍微改一下,其中enable_graph_kernel设成False,下一行再注释掉,再跑,看看是否能通。

使能图算融合特性只是对时间性能可能会有优势,关闭该特性,对精度收敛啥的无影响。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
MindSpore是华为公司开发的深度学习框架,ResNet101是一种深度卷积神经网络,可以用于图像分类和目标检测等任务。在MindSpore中,可以使用ResNet101模型进行图像分类,代码示例如下: ```python import mindspore.nn as nn import mindspore.ops.operations as P class ResNet101(nn.Cell): def __init__(self, num_classes=1000): super(ResNet101, self).__init__() self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, has_bias=False, pad_mode='pad') self.bn1 = nn.BatchNorm2d(64) self.relu = nn.ReLU() self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode='same') self.layer1 = self._make_layer(64, 3, stride=1) self.layer2 = self._make_layer(128, 4, stride=2) self.layer3 = self._make_layer(256, 23, stride=2) self.layer4 = self._make_layer(512, 3, stride=2) self.avgpool = nn.AvgPool2d(kernel_size=7, stride=1) self.flatten = nn.Flatten() self.fc = nn.Dense(512 * Bottleneck.expansion, num_classes) def _make_layer(self, planes, blocks, stride): downsample = None if stride != 1 or self.inplanes != planes * Bottleneck.expansion: downsample = nn.SequentialCell([ nn.Conv2d(self.inplanes, planes * Bottleneck.expansion, kernel_size=1, stride=stride, has_bias=False), nn.BatchNorm2d(planes * Bottleneck.expansion) ]) layers = [Bottleneck(self.inplanes, planes, stride, downsample)] self.inplanes = planes * Bottleneck.expansion for _ in range(1, blocks): layers.append(Bottleneck(self.inplanes, planes)) return nn.SequentialCell(layers) def construct(self, x): x = self.conv1(x) x = self.bn1(x) x = self.relu(x) x = self.maxpool(x) x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x) x = self.avgpool(x) x = self.flatten(x) x = self.fc(x) return x class Bottleneck(nn.Cell): expansion = 4 def __init__(self, inplanes, planes, stride=1, downsample=None): super(Bottleneck, self).__init__() self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, has_bias=False) self.bn1 = nn.BatchNorm2d(planes) self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, has_bias=False) self.bn2 = nn.BatchNorm2d(planes) self.conv3 = nn.Conv2d(planes, planes * Bottleneck.expansion, kernel_size=1, has_bias=False) self.bn3 = nn.BatchNorm2d(planes * Bottleneck.expansion) self.relu = nn.ReLU() self.downsample = downsample self.stride = stride def construct(self, x): identity = x out = self.conv1(x) out = self.bn1(out) out = self.relu(out) out = self.conv2(out) out = self.bn2(out) out = self.relu(out) out = self.conv3(out) out = self.bn3(out) if self.downsample is not None: identity = self.downsample(x) out += identity out = self.relu(out) return out ``` 这段代码定义了一个名为ResNet101的神经网络模型,可以使用MindSpore中的数据集进行训练和测试。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值