深度学习03-全连接网络

itspollyyy

已于 2023-11-07 06:14:48 修改

阅读量116

点赞数

文章标签：深度学习人工智能

于 2023-11-06 06:27:02 首次发布

本文链接：https://blog.csdn.net/weixin_43399179/article/details/134208713

版权

全链接网络是啥？

每一个神经元都与每层连接

线性层

向前传播（forward）

输出=输入*weight+bias；out = x*w+b

 out = x.view(x.shape[0], -1).mm(w) + b

x的形状为(N, d_1, ..., d_k)，其中N是样本的数量

每个样本的形状为：(d_1, ..., d_k)

w shape：(D, M)

b shape：(M,)

因输入和权重的形状不一样，故用x.view()修改x的形状为（N，D）。

向后传播（backward）

$z = wx + b$ $dx = w$ ; $dw = x$ ; $db = 1$

用上游梯度计算梯度

downstream gradient = local gradient × upstream gradient

# dw = dx * dout
# dout (N, M)
dw = x.view(x.shape[0], -1).mm(dout)

# w (D, M)
# dx = dw * dout
dx = dout.mm(w.t.()). reshape(x.shape)

# b(M,) M-dim
db = torch.sum(dout, dim=0)

激活函数：

1. ReLU

forward

$relu = max(0,x)$

out = x.clone()
out[out < 0] = 0

backward

dx = dout * (x > 0)

"Sandwich" layers

input layer -> hidden layer(activation function)->output layer

ReLU 激活函数

第一层：线性层（ $z=wx+b$ ）

第二层：ReLU ( $s = \alpha(z)$ )

第三层：输出层

两层网络

第一层： $z_1 = W_1x+b_1, h1 = \alpha(z_1)$

第二层： $z_2 = h_1W_2+b_2$

初始化 $W_1, W_2, b_1, b_2$

weight_scale=1e-3
input_dim=3 * 32 * 32
hidden_dim=100

# first layer
# initialized from a Gaussian centered at 0.0 with standard deviation equal to weight_scale
self.W1 = weight_scale * torch.randn(input_dim, hidden_dim, dtype=dtype).to(device)
# biases should be initialized to zero
self.b1 = torch.zeros(hidden_dim, dtype=dtype).to(device)
# second layer
self.W2 = weight_scale * torch.randn(input_dim, hidden_dim, dtype=dtype).to(device)

self.b2 = torch.zeros(hidden_dim, dtype=dtype).to(device)
# is stored in the dictionary. params
self.params = {'W1': self.W1, 'b1': self.b1, 'W2': self.W2, 'b2': self.b2}

计算loss

step 1: 计算h1

N = X.shape[0]
X_mat = X.view(N, -1) # 为了X和W计算，修改X的形状为（N，D）
# h1 = relu（W1*x+b1）
h1, cache1 = Linear_ReLU.forward(X_mat, self.params['W1'], self.params['b1'])

step 2:计算score（预测值）

scores, cache2 = Linear.forward(h1, self.params['W2'], self.params['b2'])

step 3: 用softmax计算loss，用L2 正则化惩罚。并计算超参数偏导

loss, dloss = softmax_loss(scores, y)
loss += self.reg * (torch.sum(self.params['W1'] * self.params['W1']) + torch.sum(self.params['W2'] * self.params['W2']))

dh1, dW2, db2 = Linear.backward(dloss, cache2)
dx, dW1, db1 = Linear_ReLU.backward(dh1, cache1)

dW1 += 2 * self.reg * self.params['W1']
dW2 += 2 * self.reg * self.params['W2']
grads = {'W1': dW1, 'b1': db1, 'W2': dW2, 'b2': db2}

多层网络

{linear - relu - [dropout]} x (L - 1) - linear - softmax

初始化loss和梯度验证

step 1: 初始化w和b

params = []
self.hidden_dims = hidden_dims
# input layer
params.append(('W1', weight_scale * torch.randn(input_dim, hidden_dims[0], device=device)))
params.append(('b1', torch.zeros(hidden_dims[0]).to(device)))
# hidden layer
for i in range(2, len(hidden_dims) + 1):
   params.append(('W' + str(i), weight_scale * torch.randn(hidden_dims[i - 2], hidden_dims[i - 1], device=device)))
   params.append(('b' + str(i), torch.zeros(hidden_dims[i - 1], device=device)))
# output layer
params.append(('W' + str(len(hidden_dims) + 1),weight_scale * torch.randn(hidden_dims[-1], num_classes, device=device)))
params.append(('b' + str(len(hidden_dims) + 1), torch.zeros(num_classes, device=device)))

self.params = dict(params)

step 2: 向前传播计算scores

# input layer + hidden layer
for n in range(self.num_layers - 1):
    i = n + 1
    last_out, cache_dict['cache_LR{}'.format(i)] = Linear_ReLU.forward(last_out, self.params['W{}'.format(i)], self.params['b{}'.format(i)])
    if self.use_dropout:
        last_out, cache_dict['cache_Dropout{}'.format(i)] = Dropout.forward(last_out, self.dropout_param)

# output layer
i += 1
last_out, cache_dict['cache_L{}'.format(i)] = Linear.forward(last_out, self.params['W{}'.format(i)],self.params['b{}'.format(i)])
scores = last_out

step 3: 计算loss （softmax）, 反向传播计算梯度

# softmax
loss, dout = softmax_loss(scores, y)
# regularization
loss += (self.params['W{}'.format(i)] * self.params['W{}'.format(i)]).sum() * self.reg
last_dout, dw, db = Linear.backward(dout, cache_dict['cache_L{}'.format(i)])
grads['W{}'.format(i)] = dw + 2 * self.params['W{}'.format(i)] * self.reg
grads['b{}'.format(i)] = db
for n in range(self.num_layers - 1)[::-1]:
    i = n + 1
    if self.use_dropout:
        last_dout = Dropout.backward(last_dout, cache_dict['cache_Dropout{}'.format(i)])
last_dout, dw, db = Linear_ReLU.backward(last_dout, cache_dict['cache_LR{}'.format(i)])
grads['W{}'.format(i)] = dw + 2 * self.params['W{}'.format(i)] * self.reg
loss += (self.params['W{}'.format(i)] * self.params['W{}'.format(i)]).sum() * self.reg

优化器

link：

Dropout

在输入和隐藏层中，舍弃一些神经元。

如图

目的：防止过拟合，因每层的数据的特征都进行学习，会有过多的特征出现。舍去一些，避免不必要的特征进行学习

向前传播：

假设舍弃的神经元的概率为p，则

# model = ‘train'
mask = torch.rand(x.shape) > p
out = x.clone()
out[mask] = 0

# model = 'test'
out = x

反向传播

# model = 'train'
# dout: Upstream derivatives, of any shape
dx = dout
dx[mask] = 0

#model = 'test'
dx = dout

dropout更详细的解释 by Lei Mao

Ref：

1. Upstream, Downstream, and Local Gradients

2. L1 and L2 Regularization Methods

3. Dropout in Neural Networks

itspollyyy

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
深度学习03-全连接网络

step 3: 计算loss （softmax）, 反向传播计算梯度。x的形状为(N, d_1, ..., d_k)，其中N是样本的数量。每个样本的形状为：(d_1, ..., d_k)step 3: 用softmax计算loss，用。step 2: 向前传播计算scores。step 2:计算score（预测值）因输入和权重的形状不一样，故用。step 1: 初始化w和b。修改x的形状为（N，D）。每一个神经元都与每层连接。step 1: 计算h1。第二层：ReLU (
复制链接

扫一扫