Coursera吴恩达机器学习week2笔记

最新推荐文章于 2022-02-24 17:07:16 发布

loserChen.

最新推荐文章于 2022-02-24 17:07:16 发布

阅读量344

点赞数

分类专栏：吴恩达机器学习笔记机器学习文章标签：机器学习 Coursera 笔记吴恩达

本文链接：https://blog.csdn.net/qq_35564813/article/details/104226704

版权

机器学习同时被 2 个专栏收录

38 篇文章 3 订阅

订阅专栏

吴恩达机器学习笔记

8 篇文章 1 订阅

订阅专栏

Neural Network

dendrites树突，属于输入

axon轴突，属于输出

模型中x0为bias unit，偏差单元，总等于1

In neural networks, we use the same logistic function as in classification, 1/（1+e（−θTx））, yet we sometimes call it a sigmoid (logistic) activation function. In this situation, our “theta” parameters are sometimes called “weights”.

input layer-hidden layer-output layer

在这里插入图片描述

Cost function

L = total number of layers in the network
sl = number of units (not counting bias unit) in layer l
K = number of output units/classes

在这里插入图片描述

Backpropagation Algorithm

我们想要最小化代价函数，则须计算在这里插入图片描述

为了完成上述计算，我们采用后向传播算法

在这里插入图片描述

Unrolling Parameters

在这里插入图片描述

Gradient Checking

在这里插入图片描述

epsilon = 1e-4;
for i = 1:n,
  thetaPlus = theta;
  thetaPlus(i) += epsilon;
  thetaMinus = theta;
  thetaMinus(i) -= epsilon;
  gradApprox(i) = (J(thetaPlus) - J(thetaMinus))/(2*epsilon)
end;

gradApprox ≈ deltaVector

Once you have verified once that your backpropagation algorithm is correct, you don’t need to compute gradApprox again. The code to compute gradApprox can be very slow.

Random Initialization

在这里插入图片描述

Putting it Together

First, pick a network architecture; choose the layout of your neural network, including how many hidden units in each layer and how many layers in total you want to have.

Number of input units = dimension of features x(i)
Number of output units = number of classes
Number of hidden units per layer = usually more the better (must balance with cost of computation as it increases with more hidden units)
Defaults: 1 hidden layer. If you have more than 1 hidden layer, then it is recommended that you have the same number of units in every hidden layer.

Training a Neural Network

Randomly initialize the weights
Implement forward propagation to get hΘ(x(i)) for any x(i)
Implement the cost function
Implement backpropagation to compute partial derivatives
Use gradient checking to confirm that your backpropagation works. Then disable gradient checking.
Use gradient descent or a built-in optimization function to minimize the cost function with the weights in theta.

for i = 1:m,
   Perform forward propagation and backpropagation using example (x(i),y(i))
   (Get activations a(l) and delta terms d(l) for l = 2,...,L