当特征的数量很多的时候,逻辑回归在分类问题上就难以发挥了,这时需要运用神经网络。
神经网络的起源
Origins: Algorithms that try to mimic the brain.
Was very widely used in 80s ans early 90s; popularity diminished in late 90s.
Recent resurgence: State-of-the-art technique for many applications
起源:试图模仿大脑的算法。
在80年代和90年代初被广泛使用; 人气在90年代末减少了。
最近的复兴:适用于许多应用的最先进技术(最近复兴的原因之一是计算机的计算能力提升了很多)。
简单的神经元模型:逻辑单元
可以认为中间的绿色神经元做了这样的操作
1,将输入值转化为向量x
\[x = \left[ {\begin{array}{*{20}{c}}
{\begin{array}{*{20}{c}}
{{x_0}}\\
{{x_1}}
\end{array}}\\
{{x_2}}\\
{{x_3}}
\end{array}} \right]\]
2,将x与θ相乘后,做sigmoid函数处理并输出
\[\theta = \left[ {\begin{array}{*{20}{c}}
{\begin{array}{*{20}{c}}
{{\theta _0}}\\
{{\theta _1}}
\end{array}}\\
{{\theta _2}}\\
{{\theta _3}}
\end{array}} \right]\]
\[{h_\theta }\left( x \right) = \frac{1}{{1 + {e^{ - {\theta ^T}x}}}}\]
注意:这里没有画出x0,它的值总等于1,通常称为“bias unit”
逻辑回归中的Sigmoid函数在这里称为“activation function”(激活函数)
\[g\left( z \right) = \frac{1}{{1 + {e^{ - z}}}}\]
逻辑回归中的θ,在这里称为“weights”(权重)
标准的神经网络
- 1,这是一个有三层结构的神经网络,三层分别是:输入层(input layer)、隐含层(hidden layer)、输出层(output layer)
- 2,按照惯例加上“bias unit”:x0, a0。
符号:
\[a_i^{\left( i \right)}\]
“activation” of unit i in layer j
第j层的第i个神经元的激活函数
\[{\Theta ^{\left( j \right)}}\]
matrix of wights controlling function mapping from layer j to layer j+1
控制从j层到j+1层函数映射的权重矩阵
对于上图所示的神经网络有
\[\begin{array}{l}
a_1^{\left( 2 \right)} = g\left( {\Theta _{10}^{\left( 1 \right)}{x_0} + \Theta _{11}^{\left( 1 \right)}{x_1} + \Theta _{12}^{\left( 1 \right)}{x_2} + \Theta _{13}^{\left( 1 \right)}{x_3}} \right)\\
a_2^{\left( 2 \right)} = g\left( {\Theta _{20}^{\left( 1 \right)}{x_0} + \Theta _{21}^{\left( 1 \right)}{x_1} + \Theta _{22}^{\left( 1 \right)}{x_2} + \Theta _{23}^{\left( 1 \right)}{x_3}} \right)\\
a_3^{\left( 2 \right)} = g\left( {\Theta _{30}^{\left( 1 \right)}{x_0} + \Theta _{31}^{\left( 1 \right)}{x_1} + \Theta _{32}^{\left( 1 \right)}{x_2} + \Theta _{33}^{\left( 1 \right)}{x_3}} \right)\\
{h_\theta }\left( x \right) = a_1^{\left( 3 \right)} = g\left( {\Theta _{10}^{\left( 2 \right)}{x_0} + \Theta _{11}^{\left( 2 \right)}{x_1} + \Theta _{12}^{\left( 2 \right)}{x_2} + \Theta _{13}^{\left( 2 \right)}{x_3}} \right)
\end{array}\]
if network has sj units in layer j, sj+1 units in layer j+1, then Θ(j) will be of dimension sj+1 x (sj + 1)
如果网络中第j层有sj个神经元,第j+1层有sj+1个神经元,那么Θ(j)将是一个sj+1 x (sj + 1)的矩阵(因为要加上“bias unit”)
Forward propagation: Vectorized implementation
前向传播:矢量化实现
定义
\[Z_1^{\left( 2 \right)} = \Theta _{10}^{\left( 1 \right)}{x_0} + \Theta _{11}^{\left( 1 \right)}{x_1} + \Theta _{12}^{\left( 1 \right)}{x_2} + \Theta _{13}^{\left( 1 \right)}{x_3}\]
则
\[a_1^{\left( 2 \right)} = g\left( {Z_1^{\left( 2 \right)}} \right)\]
同理
\[\begin{array}{l}
a_2^{\left( 2 \right)} = g\left( {Z_2^{\left( 2 \right)}} \right)\\
a_3^{\left( 2 \right)} = g\left( {Z_3^{\left( 2 \right)}} \right)\\
a_1^{\left( 3 \right)} = g\left( {Z_1^{\left( 3 \right)}} \right)
\end{array}\]
定义
\[x = \left[ {\begin{array}{*{20}{c}}
{{x_0}}\\
{{x_1}}\\
{{x_2}}\\
{{x_3}}
\end{array}} \right] = {a^{\left( 1 \right)}},{z^{\left( 2 \right)}} = \left[ {\begin{array}{*{20}{c}}
{z_1^{\left( 2 \right)}}\\
{z_2^{\left( 2 \right)}}\\
{z_2^{\left( 2 \right)}}
\end{array}} \right]\]
则
\[\begin{array}{l}
{z^{\left( 2 \right)}} = {\Theta ^{\left( 1 \right)}}{a^{\left( 1 \right)}}\\
{a^{\left( 2 \right)}} = g\left( {{z^{\left( 2 \right)}}} \right)\\
加入“bias unit” a_0^{\left( 2 \right)} = 1\\
{z^{\left( 3 \right)}} = {\Theta ^{\left( 2 \right)}}{a^{\left( 2 \right)}}\\
{h_\theta }\left( x \right) = {a^{\left( 3 \right)}} = g\left( {{z^{\left( 3 \right)}}} \right)
\end{array}\]
加深理解
如果挡住神经网路的左半部分,会发现这就是逻辑回归。只不过原来的x变成了现在的a(2)。
而a(2)又是由前面的x(1)确定
其他神经网络结构
神经网路可以由很多层,第一层为输入层,最后一层为输出层,其余为隐含层