python无法初始化设备是什么意思_PyTorch 常见问题整理

本文总结了从Keras转向PyTorch时可能会遇到的问题,包括Loss为NaN的处理、模型运行时间的正确测试、参数初始化、获取预训练模型层输出以及多线程训练中的一些错误修复,旨在帮助PyTorch新手更好地理解和解决实践中的问题。
摘要由CSDN通过智能技术生成

最近刚刚开始从 Keras 换成 PyTorch,在使用过程中可能会遇到一些常见的问题,做一些整理。

1 Loss 为 NaN

可以在 python 文件头部使用如下函数打开 nan 检查:

Python

torch.autograd.set_detect_anomaly(True)

1

torch.autograd.set_detect_anomaly(True)

如果遇到了 nan 的 Tensor,它会抛出异常。幸运的话它会告诉你 nan 产生的位置。比如说我遇到过:

PowerShell

RuntimeError: Function 'SmoothL1LossBackward' returned nan values in its 0th output.

1

RuntimeError:Function'SmoothL1LossBackward'returnednanvaluesinits0thoutput.

有些时候,往往会遇到比如 Adam 就没有 nan 而 SGD 就会出现 nan,这种通常都是 Loss 设得太大,可以调低学习率试试。

其他可能产生 nan 的地方可以尝试定位下:

1、脏数据,输入有 NaN

2、设置 clip gradient

3、更换初始化参数方法

2 正确测试模型运行时间

如果是为了测试模型的前向运算运行时间,需要设置 model 为评估模式:

Python

model.eval()

1

model.eval()

同时在 GPU 上测速时需要使用 torch.cuda.synchronize() 同步 CUDA 操作:

Python

torch.cuda.synchronize()

start = time.time()

result = model(input)

torch.cuda.synchronize()

end = time.time()

1

2

3

4

5

torch.cuda.synchronize()

start=time.time()

result=model(input)

torch.cuda.synchronize()

end=time.time()

3 参数初始化

在一些任务中,如果不是使用已有训练参数而是从 0 开始训练一个空白的网络,进行参数的初始化(例如 Conv2D)会有利于加快模型的收敛,例如下面参数初始化方式是(通常可以放在 model 的 init 函数结尾):

Python

# weight initialization

for m in self.modules():

if isinstance(m, nn.Conv2d):

nn.init.kaiming_normal_(m.weight, mode='fan_out')

if m.bias is not None:

nn.init.zeros_(m.bias)

elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):

nn.init.ones_(m.weight)

nn.init.zeros_(m.bias)

elif isinstance(m, nn.Linear):

nn.init.normal_(m.weight, 0, 0.01)

nn.init.zeros_(m.bias)

1

2

3

4

5

6

7

8

9

10

11

12

# weight initialization

forminself.modules():

ifisinstance(m,nn.Conv2d):

nn.init.kaiming_normal_(m.weight,mode='fan_out')

ifm.biasisnotNone:

nn.init.zeros_(m.bias)

elifisinstance(m,(nn.BatchNorm2d,nn.GroupNorm)):

nn.init.ones_(m.weight)

nn.init.zeros_(m.bias)

elifisinstance(m,nn.Linear):

nn.init.normal_(m.weight,0,0.01)

nn.init.zeros_(m.bias)

4 获取 torchvision 中某一层的输出

工程实践中经常用 torchvision 预训练参数然后提取其中部分层进行修改。这里面可以有两种方式:

第一种,直接 copy 全部的代码,然后根据自身需要输出中间层:

例如对于 shufflenetv2 代码可以这样修改返回你需要的层(_forward_impl 是原始的,_forward_impl_with_layers 是修改的):

Python

def _forward_impl(self, x):

# See note [TorchScript super()]

x = self.conv1(x)

x = self.maxpool(x)

x = self.stage2(x)

x = self.stage3(x)

x = self.stage4(x)

x = self.conv5(x)

x = x.mean([2, 3]) # globalpool

x = self.fc(x)

return x

def _forward_impl_with_layers(self, x):

# See note [TorchScript super()]

layer1 = self.conv1(x)

layer2 = self.maxpool(layer1)

layer3 = self.stage2(layer2)

layer4 = self.stage3(layer3)

layer5 = self.stage4(layer4)

x = self.conv5(layer5)

x = x.mean([2, 3]) # globalpool

x = self.fc(x)

return layer1, layer2, layer3, layer4, layer5, x

def forward(self, x):

return self._forward_impl_with_layers(x)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

def_forward_impl(self,x):

# See note [TorchScript super()]

x=self.conv1(x)

x=self.maxpool(x)

x=self.stage2(x)

x=self.stage3(x)

x=self.stage4(x)

x=self.conv5(x)

x=x.mean([2,3])# globalpool

x=self.fc(x)

returnx

def_forward_impl_with_layers(self,x):

# See note [TorchScript super()]

layer1=self.conv1(x)

layer2=self.maxpool(layer1)

layer3=self.stage2(layer2)

layer4=self.stage3(layer3)

layer5=self.stage4(layer4)

x=self.conv5(layer5)

x=x.mean([2,3])# globalpool

x=self.fc(x)

returnlayer1,layer2,layer3,layer4,layer5,x

defforward(self,x):

returnself._forward_impl_with_layers(x)

另外一种方法不下载代码直接调用 torchvision 中的层,这个可能需要分析每个代码的实现才能知道想要的层,比如这样打印:

Python

import torchvision

model = models.shufflenet_v2_x0_5(pretrained=True)

print('model = {}'.format(model))

1

2

3

importtorchvision

model=models.shufflenet_v2_x0_5(pretrained=True)

print('model = {}'.format(model))

打印结果类似:

PowerShell

model = ShuffleNetV2(

(conv1): Sequential(

(0): Conv2d(3, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)

(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

)

(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)

(stage2): Sequential(

(0): InvertedResidual(

(branch1): Sequential(

(0): Conv2d(24, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=24, bias=False)

(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(4): ReLU(inplace=True)

)

(branch2): Sequential(

(0): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(24, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=24, bias=False)

(4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

)

(1): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=24, bias=False)

(4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

)

(2): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=24, bias=False)

(4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

)

(3): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=24, bias=False)

(4): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

)

)

(stage3): Sequential(

(0): InvertedResidual(

(branch1): Sequential(

(0): Conv2d(48, 48, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=48, bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(3): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(4): ReLU(inplace=True)

)

(branch2): Sequential(

(0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(48, 48, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=48, bias=False)

(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

)

(1): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)

(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

)

(2): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)

(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

)

(3): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)

(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

)

(4): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)

(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

)

(5): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)

(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

)

(6): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)

(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

)

(7): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=48, bias=False)

(4): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(48, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

)

)

(stage4): Sequential(

(0): InvertedResidual(

(branch1): Sequential(

(0): Conv2d(96, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=96, bias=False)

(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(4): ReLU(inplace=True)

)

(branch2): Sequential(

(0): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(96, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=96, bias=False)

(4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

)

(1): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96, bias=False)

(4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

)

(2): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96, bias=False)

(4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

)

(3): InvertedResidual(

(branch1): Sequential()

(branch2): Sequential(

(0): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

(3): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96, bias=False)

(4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(5): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)

(6): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(7): ReLU(inplace=True)

)

)

)

(conv5): Sequential(

(0): Conv2d(192, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)

(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

(2): ReLU(inplace=True)

)

(fc): Linear(in_features=1024, out_features=1000, bias=True)

)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

model=ShuffleNetV2(

(conv1):Sequential(

(0):Conv2d(3,24,kernel_size=(3,3),stride=(2,2),padding=(1,1),bias=False)

(1):BatchNorm2d(24,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

)

(maxpool):MaxPool2d(kernel_size=3,stride=2,padding=1,dilation=1,ceil_mode=False)

(stage2):Sequential(

(0):InvertedResidual(

(branch1):Sequential(

(0):Conv2d(24,24,kernel_size=(3,3),stride=(2,2),padding=(1,1),groups=24,bias=False)

(1):BatchNorm2d(24,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):Conv2d(24,24,kernel_size=(1,1),stride=(1,1),bias=False)

(3):BatchNorm2d(24,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(4):ReLU(inplace=True)

)

(branch2):Sequential(

(0):Conv2d(24,24,kernel_size=(1,1),stride=(1,1),bias=False)

(1):BatchNorm2d(24,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

(3):Conv2d(24,24,kernel_size=(3,3),stride=(2,2),padding=(1,1),groups=24,bias=False)

(4):BatchNorm2d(24,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(5):Conv2d(24,24,kernel_size=(1,1),stride=(1,1),bias=False)

(6):BatchNorm2d(24,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(7):ReLU(inplace=True)

)

)

(1):InvertedResidual(

(branch1):Sequential()

(branch2):Sequential(

(0):Conv2d(24,24,kernel_size=(1,1),stride=(1,1),bias=False)

(1):BatchNorm2d(24,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

(3):Conv2d(24,24,kernel_size=(3,3),stride=(1,1),padding=(1,1),groups=24,bias=False)

(4):BatchNorm2d(24,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(5):Conv2d(24,24,kernel_size=(1,1),stride=(1,1),bias=False)

(6):BatchNorm2d(24,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(7):ReLU(inplace=True)

)

)

(2):InvertedResidual(

(branch1):Sequential()

(branch2):Sequential(

(0):Conv2d(24,24,kernel_size=(1,1),stride=(1,1),bias=False)

(1):BatchNorm2d(24,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

(3):Conv2d(24,24,kernel_size=(3,3),stride=(1,1),padding=(1,1),groups=24,bias=False)

(4):BatchNorm2d(24,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(5):Conv2d(24,24,kernel_size=(1,1),stride=(1,1),bias=False)

(6):BatchNorm2d(24,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(7):ReLU(inplace=True)

)

)

(3):InvertedResidual(

(branch1):Sequential()

(branch2):Sequential(

(0):Conv2d(24,24,kernel_size=(1,1),stride=(1,1),bias=False)

(1):BatchNorm2d(24,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

(3):Conv2d(24,24,kernel_size=(3,3),stride=(1,1),padding=(1,1),groups=24,bias=False)

(4):BatchNorm2d(24,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(5):Conv2d(24,24,kernel_size=(1,1),stride=(1,1),bias=False)

(6):BatchNorm2d(24,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(7):ReLU(inplace=True)

)

)

)

(stage3):Sequential(

(0):InvertedResidual(

(branch1):Sequential(

(0):Conv2d(48,48,kernel_size=(3,3),stride=(2,2),padding=(1,1),groups=48,bias=False)

(1):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):Conv2d(48,48,kernel_size=(1,1),stride=(1,1),bias=False)

(3):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(4):ReLU(inplace=True)

)

(branch2):Sequential(

(0):Conv2d(48,48,kernel_size=(1,1),stride=(1,1),bias=False)

(1):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

(3):Conv2d(48,48,kernel_size=(3,3),stride=(2,2),padding=(1,1),groups=48,bias=False)

(4):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(5):Conv2d(48,48,kernel_size=(1,1),stride=(1,1),bias=False)

(6):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(7):ReLU(inplace=True)

)

)

(1):InvertedResidual(

(branch1):Sequential()

(branch2):Sequential(

(0):Conv2d(48,48,kernel_size=(1,1),stride=(1,1),bias=False)

(1):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

(3):Conv2d(48,48,kernel_size=(3,3),stride=(1,1),padding=(1,1),groups=48,bias=False)

(4):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(5):Conv2d(48,48,kernel_size=(1,1),stride=(1,1),bias=False)

(6):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(7):ReLU(inplace=True)

)

)

(2):InvertedResidual(

(branch1):Sequential()

(branch2):Sequential(

(0):Conv2d(48,48,kernel_size=(1,1),stride=(1,1),bias=False)

(1):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

(3):Conv2d(48,48,kernel_size=(3,3),stride=(1,1),padding=(1,1),groups=48,bias=False)

(4):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(5):Conv2d(48,48,kernel_size=(1,1),stride=(1,1),bias=False)

(6):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(7):ReLU(inplace=True)

)

)

(3):InvertedResidual(

(branch1):Sequential()

(branch2):Sequential(

(0):Conv2d(48,48,kernel_size=(1,1),stride=(1,1),bias=False)

(1):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

(3):Conv2d(48,48,kernel_size=(3,3),stride=(1,1),padding=(1,1),groups=48,bias=False)

(4):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(5):Conv2d(48,48,kernel_size=(1,1),stride=(1,1),bias=False)

(6):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(7):ReLU(inplace=True)

)

)

(4):InvertedResidual(

(branch1):Sequential()

(branch2):Sequential(

(0):Conv2d(48,48,kernel_size=(1,1),stride=(1,1),bias=False)

(1):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

(3):Conv2d(48,48,kernel_size=(3,3),stride=(1,1),padding=(1,1),groups=48,bias=False)

(4):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(5):Conv2d(48,48,kernel_size=(1,1),stride=(1,1),bias=False)

(6):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(7):ReLU(inplace=True)

)

)

(5):InvertedResidual(

(branch1):Sequential()

(branch2):Sequential(

(0):Conv2d(48,48,kernel_size=(1,1),stride=(1,1),bias=False)

(1):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

(3):Conv2d(48,48,kernel_size=(3,3),stride=(1,1),padding=(1,1),groups=48,bias=False)

(4):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(5):Conv2d(48,48,kernel_size=(1,1),stride=(1,1),bias=False)

(6):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(7):ReLU(inplace=True)

)

)

(6):InvertedResidual(

(branch1):Sequential()

(branch2):Sequential(

(0):Conv2d(48,48,kernel_size=(1,1),stride=(1,1),bias=False)

(1):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

(3):Conv2d(48,48,kernel_size=(3,3),stride=(1,1),padding=(1,1),groups=48,bias=False)

(4):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(5):Conv2d(48,48,kernel_size=(1,1),stride=(1,1),bias=False)

(6):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(7):ReLU(inplace=True)

)

)

(7):InvertedResidual(

(branch1):Sequential()

(branch2):Sequential(

(0):Conv2d(48,48,kernel_size=(1,1),stride=(1,1),bias=False)

(1):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

(3):Conv2d(48,48,kernel_size=(3,3),stride=(1,1),padding=(1,1),groups=48,bias=False)

(4):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(5):Conv2d(48,48,kernel_size=(1,1),stride=(1,1),bias=False)

(6):BatchNorm2d(48,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(7):ReLU(inplace=True)

)

)

)

(stage4):Sequential(

(0):InvertedResidual(

(branch1):Sequential(

(0):Conv2d(96,96,kernel_size=(3,3),stride=(2,2),padding=(1,1),groups=96,bias=False)

(1):BatchNorm2d(96,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):Conv2d(96,96,kernel_size=(1,1),stride=(1,1),bias=False)

(3):BatchNorm2d(96,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(4):ReLU(inplace=True)

)

(branch2):Sequential(

(0):Conv2d(96,96,kernel_size=(1,1),stride=(1,1),bias=False)

(1):BatchNorm2d(96,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

(3):Conv2d(96,96,kernel_size=(3,3),stride=(2,2),padding=(1,1),groups=96,bias=False)

(4):BatchNorm2d(96,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(5):Conv2d(96,96,kernel_size=(1,1),stride=(1,1),bias=False)

(6):BatchNorm2d(96,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(7):ReLU(inplace=True)

)

)

(1):InvertedResidual(

(branch1):Sequential()

(branch2):Sequential(

(0):Conv2d(96,96,kernel_size=(1,1),stride=(1,1),bias=False)

(1):BatchNorm2d(96,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

(3):Conv2d(96,96,kernel_size=(3,3),stride=(1,1),padding=(1,1),groups=96,bias=False)

(4):BatchNorm2d(96,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(5):Conv2d(96,96,kernel_size=(1,1),stride=(1,1),bias=False)

(6):BatchNorm2d(96,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(7):ReLU(inplace=True)

)

)

(2):InvertedResidual(

(branch1):Sequential()

(branch2):Sequential(

(0):Conv2d(96,96,kernel_size=(1,1),stride=(1,1),bias=False)

(1):BatchNorm2d(96,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

(3):Conv2d(96,96,kernel_size=(3,3),stride=(1,1),padding=(1,1),groups=96,bias=False)

(4):BatchNorm2d(96,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(5):Conv2d(96,96,kernel_size=(1,1),stride=(1,1),bias=False)

(6):BatchNorm2d(96,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(7):ReLU(inplace=True)

)

)

(3):InvertedResidual(

(branch1):Sequential()

(branch2):Sequential(

(0):Conv2d(96,96,kernel_size=(1,1),stride=(1,1),bias=False)

(1):BatchNorm2d(96,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

(3):Conv2d(96,96,kernel_size=(3,3),stride=(1,1),padding=(1,1),groups=96,bias=False)

(4):BatchNorm2d(96,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(5):Conv2d(96,96,kernel_size=(1,1),stride=(1,1),bias=False)

(6):BatchNorm2d(96,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(7):ReLU(inplace=True)

)

)

)

(conv5):Sequential(

(0):Conv2d(192,1024,kernel_size=(1,1),stride=(1,1),bias=False)

(1):BatchNorm2d(1024,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)

(2):ReLU(inplace=True)

)

(fc):Linear(in_features=1024,out_features=1000,bias=True)

)

比如获得 conv1 层输出就是

Python

model.conv1

1

model.conv1

5 修正 The NVIDIA driver on your system is too old 错误

有时在你安装某一个版本的 PyTorch (比如 1.5.0) 时会出现如下错误提示:

Shell

The NVIDIA driver on your system is too old (found version 10000).

Please update your GPU driver by downloading and installing a new

version from the URL: http://www.nvidia.com/Download/index.aspx

Alternatively, go to: https://pytorch.org to install

a PyTorch version that has been compiled with your version

of the CUDA driver.

1

2

3

4

5

6

TheNVIDIAdriveronyoursystemistooold(foundversion10000).

PleaseupdateyourGPUdriverbydownloadingandinstallinganew

versionfromtheURL:http://www.nvidia.com/Download/index.aspx

Alternatively,goto:https://pytorch.orgtoinstall

aPyTorchversionthathasbeencompiledwithyourversion

oftheCUDAdriver.

在安装 PyTorch 的时候往往会指定相应的 CUDA 版本,这个错误的意思可能是你没有安装特定版本的 CUDA 或者你的 CUDA 版本与你的 GPU Driver 版本不匹配。

在 Nvidia 官网中给了我们如下的版本匹配:https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

如果需要升级,你可以使用如下方式升级:

1)增加软件源:

ZSH

sudo add-apt-repository ppa:graphics-drivers/ppa && sudo apt update

1

sudoadd-apt-repositoryppa:graphics-drivers/ppa&&sudoaptupdate

2)查看可以使用的版本:

ZSH

ubuntu-drivers devices

1

ubuntu-driversdevices

例如我这里查询结果是:

3)升级指定版本(根据上面表格找到合适的版本升级):

ZSH

sudo apt install nvidia-VERSION_NUMBER_HERE

1

sudoaptinstallnvidia-VERSION_NUMBER_HERE

如果出现某些冲突问题可以尝试先卸载再安装:

ZSH

sudo apt --purge autoremove nvidia*

1

sudoapt--purgeautoremovenvidia*

PS:另外一种方式你也可以先不升级指定版本,先使用如下命令查看本地 CUDA 版本:

nvcc --version

1

nvcc--version

比如我这里显示的就是:

那么我就应该安装支持 CUDA 10.0 的版本。可能 PyTorch 1.5 就不可用了,但是 PyTorch 1.4 还是可以的,可以使用如下命令安装:

pip install torch==1.4.0+cu100 torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html

1

pipinstalltorch==1.4.0+cu100torchvision==0.5.0+cu100-fhttps://download.pytorch.org/whl/torch_stable.html

PS:其他常用命令:

查看 GPU 型号:

ZSH

lspci | grep -i nvidia

1

lspci|grep-invidia

查看驱动版本:

ZSH

cat /proc/driver/nvidia/version

1

cat/proc/driver/nvidia/version

查看 PyTorch 所用 CUDA 版本,在 PyTorch 环境中运行如下脚本:

Python

import torch

print('torch.__version__ = {}'.format(torch.__version__))

print('torch.version.cuda = {}'.format(torch.version.cuda))

print('torch.cuda.is_available() = {}'.format(torch.cuda.is_available()))

1

2

3

4

importtorch

print('torch.__version__ = {}'.format(torch.__version__))

print('torch.version.cuda = {}'.format(torch.version.cuda))

print('torch.cuda.is_available() = {}'.format(torch.cuda.is_available()))

6 修正 Expected more than 1 value per channel when training 错误

如果在训练时遇到如下错误:

Shell

File "/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/modules/contai

ner.py", line 100, in forward

input = module(input)

File "/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/modules/module

.py", line 532, in __call__

result = self.forward(*input, **kwargs)

File "/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/modules/batchn

orm.py", line 107, in forward

exponential_average_factor, self.eps)

File "/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/functional.py"

, line 1666, in batch_norm

raise ValueError('Expected more than 1 value per channel when training, got

input size {}'.format(size))

ValueError: Expected more than 1 value per channel when training, got input size

torch.Size([1, 32, 1])

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

File"/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/modules/contai

ner.py",line100,inforward

input=module(input)

File"/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/modules/module

.py",line532,in__call__

result=self.forward(*input,**kwargs)

File"/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/modules/batchn

orm.py",line107,inforward

exponential_average_factor,self.eps)

File"/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/functional.py"

,line1666,inbatch_norm

raiseValueError('Expectedmorethan1valueperchannelwhentraining,got

inputsize{}'.format(size))

ValueError:Expectedmorethan1valueperchannelwhentraining,gotinputsize

torch.Size([1,32,1])

一个可能的原因是出现了输入 batch_size = 1 的情况,这时可以考虑在 DataLoader 属性加上 drop_last=True 解决,它会抛弃掉不够一个 batch size 的情况。例如:

Python

train_loader = torch.utils.data.DataLoader(dataset=train_set, shuffle=False, batch_size=opt.batch_size,

drop_last=True)

1

2

train_loader=torch.utils.data.DataLoader(dataset=train_set,shuffle=False,batch_size=opt.batch_size,

drop_last=True)

如果实在无法避免或者就需要 batch_size = 1 的训练方式,还可以考虑把网络中的 BatchNorm 换成 InstanceNorm。

7 修正 Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead 错误

如果获取变量值时,遇到下面错误:

Shell

RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.

1

RuntimeError:Can'tcallnumpy()onVariablethatrequiresgrad.Usevar.detach().numpy()instead.

这里面通常有两种情况:

一种是这个变量是含有训练参数的,需要反向传播,则使用 var.detach().numpy() 获取。

另一种如果这个变量是不进行训练的不需要反向传播,则将相关的代码用如下方式(with torch.no_grad())修饰即可:

Python

with torch.no_grad():

your code here

1

2

withtorch.no_grad():

yourcodehere

8 修正 RuntimeError: error executing torch_shm_manager 错误

如果在运行多线程训练时出现类似如下错误:

RuntimeError: error executing torch_shm_manager at "/hdd/kps_pipeline/venv/lib/python3.6/site-packages/torch/bin/torch_shm_manager" at /pytorch/torch/lib/libshm/core.cpp:99

torch_shm_manager: error while loading shared libraries: libcudart.so.10.0: cannot open shared object file: No such file or directory

torch_shm_manager: error while loading shared libraries: libcudart.so.10.0: cannot open shared object file: No such file or directory

1

2

3

RuntimeError:errorexecutingtorch_shm_managerat"/hdd/kps_pipeline/venv/lib/python3.6/site-packages/torch/bin/torch_shm_manager"at/pytorch/torch/lib/libshm/core.cpp:99

torch_shm_manager:errorwhileloadingsharedlibraries:libcudart.so.10.0:cannotopensharedobjectfile:Nosuchfileordirectory

torch_shm_manager:errorwhileloadingsharedlibraries:libcudart.so.10.0:cannotopensharedobjectfile:Nosuchfileordirectory

可能的解决方法是注释掉如下设置(如果有的话):

Python

# torch.multiprocessing.set_sharing_strategy('file_system')

1

# torch.multiprocessing.set_sharing_strategy('file_system')

9 修正 RuntimeError: Cowardly refusing to serialize non-leaf tensor which requires_grad 错误

如果在运行多线程训练时出现类似如下错误:

Cowardly refusing to serialize non-leaf tensor which requires_grad,

since autograd does not support crossing process boundaries.

If you just want to transfer the data, call detach() on the tensor

before serializing (e.g., putting it on the queue).

1

2

3

4

Cowardlyrefusingtoserializenon-leaftensorwhichrequires_grad,

sinceautograddoesnotsupportcrossingprocessboundaries.

Ifyoujustwanttotransferthedata,calldetach()onthetensor

beforeserializing(e.g.,puttingitonthequeue).

我们看下相关报错的函数是这样的:

Python

def reduce_tensor(tensor):

storage = tensor.storage()

if tensor.requires_grad and not tensor.is_leaf:

raise RuntimeError("Cowardly refusing to serialize non-leaf tensor which requires_grad, "

"since autograd does not support crossing process boundaries. "

"If you just want to transfer the data, call detach() on the tensor "

"before serializing (e.g., putting it on the queue).")

check_serializing_named_tensor(tensor)

torch.utils.hooks.warn_if_has_hooks(tensor)

1

2

3

4

5

6

7

8

9

10

11

defreduce_tensor(tensor):

storage=tensor.storage()

iftensor.requires_gradandnottensor.is_leaf:

raiseRuntimeError("Cowardly refusing to serialize non-leaf tensor which requires_grad, "

"since autograd does not support crossing process boundaries.  "

"If you just want to transfer the data, call detach() on the tensor "

"before serializing (e.g., putting it on the queue).")

check_serializing_named_tensor(tensor)

torch.utils.hooks.warn_if_has_hooks(tensor)

经过分析我这里的发生的原因是在多线程 DataLoader 中使用了一个模型生成数据,然而这个模型的参数有一部分却是 requires_grad = True 属性的。

可以采用如下方式处理模型让生成的 Tensor 都为 no_grad:

Python

# No need to backward use eval()

# Use to fix RuntimeError: Cowardly refusing to serialize non-leaf tensor which requires_grad

for param in self.superpoint.parameters():

param.requires_grad = False

self.superpoint.eval()

1

2

3

4

5

# No need to backward use eval()

# Use to fix RuntimeError: Cowardly refusing to serialize non-leaf tensor which requires_grad

forparaminself.superpoint.parameters():

param.requires_grad=False

self.superpoint.eval()

10 修正多线程 DataLoade rnumpy random 不变错误

由于 numpy 中的 random 不是 thread safe 的,因此在多线程中,其不同线程的 random 无法生成不同的随机数,需要每个线程重新设置 random.seed 才可以。因此对于 DataLoader 在 num_workers > 0 时就可能产生问题(比如需要每次生成不同的随机数据)。对于此问题有几种修改方式:

第一种

利用 worker_init_fn 每个线程重新设置种子,示例代码如下:

Python

ds = DataLoader(ds, 10, shuffle=False, num_workers=4, worker_init_fn=lambda _: np.random.seed())

1

ds=DataLoader(ds,10,shuffle=False,num_workers=4,worker_init_fn=lambda_:np.random.seed())

第二种

在文件开头加上下面两行设置:

Python

import torch.multiprocessing as mp

mp.set_start_method('spawn')

1

2

importtorch.multiprocessingasmp

mp.set_start_method('spawn')

11 使用 PyCharm 行调试 PyTorch 项目时遇到 "KeyboardInterrupt"

如果只是 Debug 而不是 Run 的时候出现,此类问题是由于在 PyCharm 中开启了调试子线程的功能,在 File->Settings->Building, Execution, Deployment->Python Debugger 中,将 Attach to subprocess automatically while debugging关闭即可。如图所示:

12 修正 RuntimeError: CUDA error: no kernel image is available for execution on the device 错误

如果在运行 PyTorch 时出现这一次错误,一个可能的原因是你的显卡已经不被高版本的 PyTorch 所支持。

比如在最近的更新中 PyTorch 1.3.1 及以后版本的显卡支持已经升级为 Compute Capability >= 3.7,完整的各种设备支持的 Compute Capability 列表如下:

https://developer.nvidia.com/cuda-gpus

GPUCompute Capability

NVIDIA TITAN RTX7.5

Geforce RTX 2080 Ti7.5

Geforce RTX 20807.5

Geforce RTX 20707.5

Geforce RTX 20607.5

NVIDIA TITAN V7

NVIDIA TITAN Xp6.1

NVIDIA TITAN X6.1

GeForce GTX 1080 Ti6.1

GeForce GTX 10806.1

GeForce GTX 10706.1

GeForce GTX 10606.1

GeForce GTX 10506.1

GeForce GTX TITAN X5.2

GeForce GTX TITAN Z3.5

GeForce GTX TITAN Black3.5

GeForce GTX TITAN3.5

GeForce GTX 980 Ti5.2

GeForce GTX 9805.2

GeForce GTX 9705.2

GeForce GTX 9605.2

GeForce GTX 9505.2

GeForce GTX 780 Ti3.5

GeForce GTX 7803.5

GeForce GTX 7703

GeForce GTX 7603

GeForce GTX 750 Ti5

GeForce GTX 7505

GeForce GTX 6903

GeForce GTX 6803

GeForce GTX 6703

GeForce GTX 660 Ti3

GeForce GTX 6603

GeForce GTX 650 Ti BOOST3

GeForce GTX 650 Ti3

GeForce GTX 6503

GeForce GTX 560 Ti2.1

GeForce GTX 550 Ti2.1

GeForce GTX 4602.1

GeForce GTS 4502.1

GeForce GTS 450*2.1

GeForce GTX 5902

GeForce GTX 5802

GeForce GTX 5702

GeForce GTX 4802

GeForce GTX 4702

GeForce GTX 4652

GeForce GT 7403

GeForce GT 7303.5

GeForce GT 730 DDR3,128bit2.1

GeForce GT 7203.5

GeForce GT 705*3.5

GeForce GT 640 (GDDR5)3.5

GeForce GT 640 (GDDR3)2.1

GeForce GT 6302.1

GeForce GT 6202.1

GeForce GT 6102.1

GeForce GT 5202.1

GeForce GT 4402.1

GeForce GT 440*2.1

GeForce GT 4302.1

GeForce GT 430*2.1

解决方法有两种:

1)最简单的解决方法是降级成早期版本,比如 Pytorch 1.2:

Shell

conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch

1

condainstallpytorch==1.2.0torchvision==0.4.0cudatoolkit=10.0-cpytorch

参考文献

Copyright secured by Digiprove © 2020 Liu XiaoAll Rights ReservedOriginal content here is published under these license terms:

License Type:Read Only

Abstract:You may read the original content in the context in which it is published (at this web address). No other copying or use is permitted without written agreement from the author.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: segmentation_models_pytorch 是一个基于 PyTorch 的图像分割库,可以用来训练语义分割模型。下面是使用 segmentation_models_pytorch 实现单模型训练的基本步骤: 1. 安装 segmentation_models_pytorch 和其依赖项: ``` pip install segmentation-models-pytorch ``` 2. 加载数据集并进行预处理。可以使用 torchvision 或者其他图像处理库加载数据集,并对数据进行预处理,如裁剪、缩放、归一化等操作。 3. 定义模型。使用 segmentation_models_pytorch 中提供的模型类(如 UNet、FPN、PSPNet 等)来定义模型。 ```python import segmentation_models_pytorch as smp model = smp.Unet( encoder_name="resnet34", # 使用 ResNet34 作为编码器 encoder_weights="imagenet", # 加载预训练权重 in_channels=3, # 输入通道数 classes=2, # 分类数 ) ``` 4. 定义损失函数和优化器。可以选择使用交叉熵损失函数和 Adam 优化器。 ```python import torch.nn as nn import torch.optim as optim criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=0.001) ``` 5. 训练模型。使用 DataLoader 加载数据集,并对模型进行训练。 ```python from torch.utils.data import DataLoader train_loader = DataLoader(dataset, batch_size=4, shuffle=True) for epoch in range(num_epochs): running_loss = 0.0 for i, data in enumerate(train_loader, 0): inputs, labels = data optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() print(f"Epoch {epoch+1}, Loss: {running_loss/len(train_loader)}") ``` 6. 保存模型。训练完毕后,可以使用 torch.save() 方法将模型保存到本地。 ```python torch.save(model.state_dict(), "model.pth") ``` ### 回答2: segmentation_models_pytorch是一个基于PyTorch实现的语义分割模型库。使用segmentation_models_pytorch实现单模型训练可以通过以下步骤完成。 首先,安装segmentation_models_pytorch库。可以通过pip install segmentation_models_pytorch命令来安装。 导入所需的库和模型。常用的库包括torch,torchvision和segmentation_models_pytorch。可以使用以下命令导入库: ```python import torch import torchvision.transforms as transforms import segmentation_models_pytorch as smp ``` 加载和预处理训练数据。可以使用torchvision中的transforms来定义一系列的数据预处理操作,例如裁剪、缩放和标准化等。之后,使用torch.utils.data.DataLoader来加载和批量处理数据。 定义模型架构。可以选择使用segmentation_models_pytorch中预定义的模型架构,例如UNet、PSPNet和DeepLab等。根据任务需求选择合适的模型,并初始化相关参数。 定义优化器和损失函数。常见的优化器有Adam和SGD等,损失函数常选择交叉熵损失函数。可以使用torch.optim中的函数来定义优化器,使用torch.nn中的损失函数来定义损失函数。 进行模型训练。使用torch.utils.data.DataLoader加载训练数据集,并迭代训练数据集中的每个批次。将批次数据输入模型中进行前向传播,获取模型的输出。计算损失,并进行反向传播更新模型的参数。重复以上步骤直到达到预定的训练轮数或达到设定的训练目标。 保存和加载训练好的模型。可以使用torch.save函数将训练好的模型保存到指定的文件路径,使用torch.load函数加载保存的模型文件。 以上是使用segmentation_models_pytorch实现单模型训练的基本步骤。根据具体任务和数据的不同,可能还需要进行一些细节操作,例如数据增强、学习率调整和模型评估等。 ### 回答3: segmentation_models_pytorch是一个基于PyTorch的分割模型训练库,可以应用于图像分割任务。下面我将介绍如何使用segmentation_models_pytorch实现单模型训练。 首先,我们需要安装segmentation_models_pytorch库。可以使用pip命令进行安装: ``` pip install segmentation-models-pytorch ``` 在训练之前,我们需要准备好训练数据和标签。通常情况下,训练数据是一些图像,标签则是对应每个像素点的分类或分割结果。 接下来,我们需要导入所需的库: ``` import segmentation_models_pytorch as smp import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import DataLoader, Dataset ``` 然后,我们需要创建一个自定义的数据集类,该类继承自torch.utils.data.Dataset类,并实现__len__和__getitem__方法,用于加载和处理数据。 接着,我们可以选择一个合适的分割模型,比如Unet、FPN等。这些模型可以通过调用smp库中的函数进行初始化,比如: ``` model = smp.Unet( encoder_name="resnet34", encoder_weights="imagenet", classes=1, activation='sigmoid' ) ``` 在这里,我们选择了一个使用ResNet-34作为编码器、预训练权重为ImageNet数据集、分类数为1(二分类问题)的Unet模型。 然后,我们可以定义损失函数和优化器: ``` criterion = nn.BCELoss() optimizer = optim.Adam(model.parameters(), lr=0.001) ``` 接着,我们可以进行训练循环,依次迭代数据进行训练和优化: ``` for epoch in range(num_epochs): for batch in dataloader: inputs, labels = batch optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() ``` 最后,我们可以保存模型并在需要预测时加载模型进行测试: ``` torch.save(model.state_dict(), "segmentation_model.pt") model.load_state_dict(torch.load("segmentation_model.pt")) ``` 以上就是使用segmentation_models_pytorch实现单模型训练的过程。根据具体任务需求,你也可以调整模型、损失函数、优化器等参数来进行更灵活的训练。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值