深度学习03-全连接网络

全链接网络是啥?

每一个神经元都与每层连接

线性层

向前传播(forward)

输出=输入*weight+bias;out = x*w+b

 out = x.view(x.shape[0], -1).mm(w) + b

x的形状为(N, d_1, ..., d_k),其中N是样本的数量

每个样本的形状为:(d_1, ..., d_k)

w shape:(D, M)

b shape:(M,)

因输入和权重的形状不一样,故用x.view()修改x的形状为(N,D)。

向后传播(backward)

z = wx + b          dx = w;        dw = x ;         db = 1

用上游梯度计算梯度

downstream gradient = local gradient × upstream gradient

# dw = dx * dout
# dout (N, M)
dw = x.view(x.shape[0], -1).mm(dout)

# w (D, M)
# dx = dw * dout
dx = dout.mm(w.t.()). reshape(x.shape)

# b(M,) M-dim
db = torch.sum(dout, dim=0)

激活函数:

1. ReLU

forward

relu = max(0,x)

out = x.clone()
out[out < 0] = 0
backward
dx = dout * (x > 0)

"Sandwich" layers

input layer -> hidden layer(activation function)->output layer

ReLU 激活函数

第一层:线性层 (z=wx+b

第二层:ReLU (s = \alpha(z))

第三层:输出层

两层网络

第一层:z_1 = W_1x+b_1, h1 = \alpha(z_1)

第二层:z_2 = h_1W_2+b_2

初始化W_1, W_2, b_1, b_2

weight_scale=1e-3
input_dim=3 * 32 * 32
hidden_dim=100

# first layer
# initialized from a Gaussian centered at 0.0 with standard deviation equal to weight_scale
self.W1 = weight_scale * torch.randn(input_dim, hidden_dim, dtype=dtype).to(device)
# biases should be initialized to zero
self.b1 = torch.zeros(hidden_dim, dtype=dtype).to(device)
# second layer
self.W2 = weight_scale * torch.randn(input_dim, hidden_dim, dtype=dtype).to(device)

self.b2 = torch.zeros(hidden_dim, dtype=dtype).to(device)
# is stored in the dictionary. params
self.params = {'W1': self.W1, 'b1': self.b1, 'W2': self.W2, 'b2': self.b2}

计算loss

step 1: 计算h1

N = X.shape[0]
X_mat = X.view(N, -1) # 为了X和W计算,修改X的形状为(N,D)
# h1 = relu(W1*x+b1)
h1, cache1 = Linear_ReLU.forward(X_mat, self.params['W1'], self.params['b1'])

step 2:计算score(预测值)

scores, cache2 = Linear.forward(h1, self.params['W2'], self.params['b2'])

step 3: 用softmax计算loss,用L2 正则化惩罚。并计算超参数偏导

loss, dloss = softmax_loss(scores, y)
loss += self.reg * (torch.sum(self.params['W1'] * self.params['W1']) + torch.sum(self.params['W2'] * self.params['W2']))
dh1, dW2, db2 = Linear.backward(dloss, cache2)
dx, dW1, db1 = Linear_ReLU.backward(dh1, cache1)

dW1 += 2 * self.reg * self.params['W1']
dW2 += 2 * self.reg * self.params['W2']
grads = {'W1': dW1, 'b1': db1, 'W2': dW2, 'b2': db2}

多层网络

{linear - relu - [dropout]} x (L - 1) - linear - softmax

初始化loss和梯度验证

step 1: 初始化w和b

params = []
self.hidden_dims = hidden_dims
# input layer
params.append(('W1', weight_scale * torch.randn(input_dim, hidden_dims[0], device=device)))
params.append(('b1', torch.zeros(hidden_dims[0]).to(device)))
# hidden layer
for i in range(2, len(hidden_dims) + 1):
   params.append(('W' + str(i), weight_scale * torch.randn(hidden_dims[i - 2], hidden_dims[i - 1], device=device)))
   params.append(('b' + str(i), torch.zeros(hidden_dims[i - 1], device=device)))
# output layer
params.append(('W' + str(len(hidden_dims) + 1),weight_scale * torch.randn(hidden_dims[-1], num_classes, device=device)))
params.append(('b' + str(len(hidden_dims) + 1), torch.zeros(num_classes, device=device)))

self.params = dict(params)

step 2: 向前传播计算scores

# input layer + hidden layer
for n in range(self.num_layers - 1):
    i = n + 1
    last_out, cache_dict['cache_LR{}'.format(i)] = Linear_ReLU.forward(last_out, self.params['W{}'.format(i)], self.params['b{}'.format(i)])
    if self.use_dropout:
        last_out, cache_dict['cache_Dropout{}'.format(i)] = Dropout.forward(last_out, self.dropout_param)
# output layer
i += 1
last_out, cache_dict['cache_L{}'.format(i)] = Linear.forward(last_out, self.params['W{}'.format(i)],self.params['b{}'.format(i)])
scores = last_out

step 3: 计算loss (softmax), 反向传播计算梯度

# softmax
loss, dout = softmax_loss(scores, y)
# regularization
loss += (self.params['W{}'.format(i)] * self.params['W{}'.format(i)]).sum() * self.reg
last_dout, dw, db = Linear.backward(dout, cache_dict['cache_L{}'.format(i)])
grads['W{}'.format(i)] = dw + 2 * self.params['W{}'.format(i)] * self.reg
grads['b{}'.format(i)] = db
for n in range(self.num_layers - 1)[::-1]:
    i = n + 1
    if self.use_dropout:
        last_dout = Dropout.backward(last_dout, cache_dict['cache_Dropout{}'.format(i)])
last_dout, dw, db = Linear_ReLU.backward(last_dout, cache_dict['cache_LR{}'.format(i)])
grads['W{}'.format(i)] = dw + 2 * self.params['W{}'.format(i)] * self.reg
loss += (self.params['W{}'.format(i)] * self.params['W{}'.format(i)]).sum() * self.reg

优化器 

link:

Dropout

在输入和隐藏层中,舍弃一些神经元。

如图

 目的: 防止过拟合,因每层的数据的特征都进行学习,会有过多的特征出现。舍去一些,避免不必要的特征进行学习

向前传播:

假设舍弃的神经元的概率为p,则

# model = ‘train'
mask = torch.rand(x.shape) > p
out = x.clone()
out[mask] = 0

# model = 'test'
out = x

反向传播

# model = 'train'
# dout: Upstream derivatives, of any shape
dx = dout
dx[mask] = 0

#model = 'test'
dx = dout

dropout更详细的解释 by Lei Mao

Ref:

1. Upstream, Downstream, and Local Gradients

2. L1 and L2 Regularization Methods

3. Dropout in Neural Networks

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值