Andrew Ng Machine Learning 第四、五周
前言
网易云课堂(双语字幕,不卡):https://study.163.com/course/courseMain.htm?courseId=1004570029
Coursera:https://www.coursera.org/learn/machine-learning
本人初学者,先在网易云课堂上看网课,再去Coursera上做作业,开博客以记录,文章中引用图片皆为课程中所截。
神经网络( Neural Network)
1.模型描述
Tips: 若只有2个特征值时,使用平方项或者里方向的sigmoid函数时间复杂度还可以接受,若特征值过多,logistic回归需要的时间太多
2.模型展示
《1》基本模型
Tips: layer1称为输入层,hθ(x)为输出层,中间的所有计算为隐藏层。对于隐藏层的每个单元,都是一个独立的sigmoid函数,即为g(θ0x0+θ1x1…)其中x0为1,即在输入层x1上的偏置单元(bias unit),同理a0也为layer2上的偏置单元,每个x对应的系数不同。
《2》向前传播(Forward propagation)
Tips: 将每个sigmoid里的多项式简化为z,举个例子,则a(2)1=g(z(2)1),其中z=θ0x0+θ1x1…,对于每层都用向量化的方法构成矩阵相乘的算法。
3.直觉理解
《1》逻辑AND和逻辑OR和逻辑NOT
Tips: 根据↑图列出真值表
《2》复杂函数XOR
Tips: 红线即为x1 and x2,青线即为(not x1) and (not x2),绿线为a1 or a2,一步步构成了复杂逻辑函数XOR
4.多元分类问题
5.代价函数
6.反向传播算法
Tips: δ意思是该结点的偏移程度,即和训练集的不同程度
Tips: 最后一层即输出层的偏移程度由向前传播算法求出来的对应激活值a-训练集的结果
Tips: 剩下每层的δ计算方法,g‘(z(3))由微积分可以求解,结果为该层激活值a.*(1-a),其中.*为矩阵元素直接相乘,δ(1)不存在因为第一层为输入层不会有偏差
Tips: 由此就算出了代价函数的导数
7.反向传播方法理解
Tips: 下一个有关于代价函数的定义
8.梯度检测
Tips: 另外一种求J(θ)导数方法,即为取一个极小的数ε,在左右范围内Δy/Δx
Tips: 之后对比该方法求出来的导数与反向传播算法的导数是否约等
9.参数初始化
10.总结
1.根据情况选择网络结构
2.随机初始化参数
3.向前传播求激活值a
4.计算代价函数J(θ)
5.反向传播求代价函数的导数
6.梯度检查导数项是否正确
7.梯度下降算法变化θ最小化J(θ)
题目
1.Question1
Which of the following statements are true? Check all that apply.
解答:CD
2.Question 2
Consider the following neural network which takes two binary-valued inputs x1,x 2∈{0,1} and outputs hΘ(x). Which of the following logical functions does it (approximately) compute?
解答:B
3.Question 3
Consider the neural network given below. Which of the following equations correctly computes the activation a1(3)? Note: g(z) is the sigmoid activation function.
解答:A
4.Question 4
You have the following neural network:
You’d like to compute the activations of the hidden layer a (2)∈R3. One way to do so is the following Octave code:
You want to have a vectorized implementation of this (i.e., one that does not use for loops). Which of the following implementations correctly compute a(2)? Check all that apply.
解答:A
(B:θ1是3*3,x是3位vector)
5.Question 5
解答:A
6.Question 6
You are training a three layer neural network and would like to use backpropagation to compute the gradient of the cost function. In the backpropagation algorithm, one of the steps is to update
for every i, ji,j. Which of the following is a correct vectorization of this step?
解答:B
7.Question 7
Suppose Theta1 is a 5x3 matrix, and Theta2 is a 4x6 matrix. You set thetaVec=[Theta1(:);Theta2(:)]. Which of the following correctly recovers Theta2?
解答:A
8.Question 8
解答:B
9.Question 9
Which of the following statements are true? Check all that apply.
解答:AB
10.Question 10
Which of the following statements are true? Check all that apply.
解答:CD