深度学习小白——CS231n Assignment2(FC)

最新推荐文章于 2023-06-05 15:54:39 发布

MargaretWG

最新推荐文章于 2023-06-05 15:54:39 发布

阅读量3k

点赞数

本文链接：https://blog.csdn.net/margretwg/article/details/70761543

版权

这篇博客主要针对深度学习初学者，详细介绍了CS231n课程的第二份作业，重点讨论了全连接神经网络的结构和操作。内容包括np.prod在矩阵乘积中的应用，np.linspace用于等间距数值生成，以及如何使用np.concatenate进行矩阵拼接。此外，还解释了网络结构中affine、批量归一化(batch norm)、ReLU激活函数、dropout正则化以及softmax损失和梯度计算。最后，通过一个Solver类展示了训练过程的实现。

摘要由CSDN通过智能技术生成

一、全连接神经网络

【补1】np.prod(a, axis=None, dtype=None, out=None, keepdims=<class 'numpy._globals._NoValue'>)

返回矩阵在某一维上的乘积

【补2】np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)

以固定间隔在一个范围内返回规定的数目的数，即从start 开始，到stop中间平分50个数输出

【补3】

a=(2,3,4)

print(*a) 返回 2 3 4

【补4】np.concatenate((a1, a2, ...), axis=0)

将多个矩阵按某一轴连接起来

【补5】dict 类的方法items()可以遍历dict的keys和values，而keys()只能遍历keys

网络结构：

{affine-[batch norm]- relu- [dropout]} * (L-1) -affine- softmax

1.前向传播

[全连接层 x*w+b]

def affine_forward(x, w, b):
  """
  Computes the forward pass for an affine (fully-connected) layer.

  The input x has shape (N, d_1, ..., d_k) and contains a minibatch of N
  examples, where each example x[i] has shape (d_1, ..., d_k). We will
  reshape each input into a vector of dimension D = d_1 * ... * d_k, and
  then transform it to an output vector of dimension M.

  Inputs:
  - x: A numpy array containing input data, of shape (N, d_1, ..., d_k)
  - w: A numpy array of weights, of shape (D, M)
  - b: A numpy array of biases, of shape (M,)
  
  Returns a tuple of:
  - out: output, of shape (N, M)
  - cache: (x, w, b)
  """

  #############################################################################
  # TODO: Implement the affine forward pass. Store the result in out. You     #
  # will need to reshape the input into rows.                                 #
  #############################################################################
  x_N = x.reshape(x.shape[0],-1)# N by D
  out = np.dot(x_N, w)+b
  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  cache = (x, w, b)
  return out, cache

【relu层】

def relu_forward(x):

  #############################################################################
  # TODO: Implement the ReLU forward pass.                                    #
  #############################################################################
  out=np.maximum(0,x)
  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  cache = x
  return out, cache

上述二者结合成为一个单元

def affine_relu_forward(x, w, b):
  """
  Convenience layer that perorms an affine transform followed by a ReLU

  Inputs:
  - x: Input to the affine layer
  - w, b: Weights for the affine layer

  Returns a tuple of:
  - out: Output from the ReLU
  - cache: Object to give to the backward pass
  """
  a, fc_cache = affine_forward(x, w, b)
  out, relu_cache = relu_forward(a)
  cache = (fc_cache, relu_cache)
  return out, cache

【batchnorm】

def batchnorm_forward(x, gamma, beta, bn_param):

  mode = bn_param['mode']
  eps = bn_param.get('eps', 1e-5)
  momentum = bn_param.get('momentum', 0.9)

  N, D = x.shape
  running_mean = bn_param.get('running_mean', np.zeros(D, dtype=x.dtype))
  running_var = bn_param.get('running_var', np.zeros(D, dtype=x.dtype))

  out, cache = None, None
  if mode == 'train':
    sample_mean=np.mean(x,axis=0) #每个特征的平均值 (D,)
    sample_var=np.var(x,axis=0) #(D,)
    x_normalized=(x-sample_mean)/np.sqrt(sample_var+eps)
    out=gamma*x_normalized+beta
    running_mean=momentum*running_mean+(1-momentum)*sample_mean
    running_var=momentum*running_var+(1-momentum)*sample_var
    cache=(x,sample_mean,sample_var,x_normalized,beta,gamma,eps)
  elif mode == 'test':
    x_normalized=(x-running_mean)/np.sqrt(running_var+eps)
    out=gamma*x_normalized+beta
  else:
    raise ValueError('Invalid forward batchnorm mode "%s"' % mode)

  # Store the updated running means back into bn_param
  bn_param['running_mean'] = running_mean
  bn_param['running_var'] = running_var

  return out, cache