斯坦福机器学习第四周（神经网络及其应用）

最新推荐文章于 2022-07-18 12:21:19 发布

VIP文章空字符（公众号：月来客栈）

最新推荐文章于 2022-07-18 12:21:19 发布

阅读量1k

点赞数 1

分类专栏：机器学习文章标签：机器学习神经网络

本文链接：https://blog.csdn.net/the_lastest/article/details/74246424

版权

1.为什么要引入神经网络（Neural Network）

一句话总结就是当特征值n特别大时，比如当n为100时；仅仅是其2次项特征值 $(x_1^2,x_1x_2,x_1x_3\dots x_1x_{100};x_2^2,x_2x_3\dots x_2x_{100};\dots)$ 就有大约5000个（从100累加到1）。而在实际问题中n的值往往有上百万，上亿。所以这样就非常容易导致过度拟合，以及计算量大的问题。因此，便引入了神经网络(neural network)。

2.神经网络模型（Neural Network Model）

Let’s examine how we will represent a hypothesis function using neural networks. At a very simple level, neurons are basically computational units that take inputs (dendrites) as electrical inputs (called “spikes”) that are channeled to outputs (axons). In our model, our dendrites are like the input features $x_1\dots x_n$ , and the output is the result of our hypothesis function. In this model our $x_0$ input node is sometimes called the “bias unit.” It is always equal to 1. In neural networks, we use the same logistic function as in classification, $\frac{1}{1+e^{-\theta^Tx}}$ , yet we sometimes call it a sigmoid (logistic) activation function. In this situation, our “theta” parameters are sometimes called “weights“.

如图就是一个只包含一个神经元的模型，黄色圆圈为神经元细胞(cell body)，
这里写图片描述

而真正的神经网络是若干个这样不同的神经元组合而成的，如下图

这里写图片描述

其中 $x_0 = 1$ ，称为 bias unit， $a_0^{(2)}$ 称为mixture bias unit，也为1。通常我们不需要表示出来，知道其存在就好。另外，我们称Layer1为输入层(input layer)，Layer2为输出层(output layer)，中间的所有（这儿仅Layer2）层都称为隐藏层(hidden layer)。并且在这个例子中，我们称 $a^2_0,a^2_1,a^2_2,a^2_3$ 为活化单元（activation unit）。

3.神经网络的数学定义（Mathematical definition）

这里写图片描述

$\Theta^{(j)}$ 是一个矩阵，表示第 $j$ 层所对应的权重(weights)；也就是说每一层都有这样一个矩阵，通过该层的活化单元 $a^{(j)}$ （输入层也可视为活化单元）与该层对应的权重 $\Theta^{(j)}$ 进行线性运算来得到下一层的活化单元 $a^{(j+1)}$ 。

比如一个 $4\times3$ 的权重矩阵：

Θ (i) = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ Θ (i) 10 Θ (i) 20 Θ (i) 30 Θ (i) 40 Θ (i) 11 Θ (i) 21 Θ (i) 31 Θ (i) 41 Θ (i) 12 Θ (i) 22 Θ (i) 32 Θ (i) 42 ⎤

最低0.47元/天解锁文章

空字符（公众号：月来客栈）

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
斯坦福机器学习第四周（神经网络及其应用）

1.为什么要引入神经网络（Neural Network）一句话总结就是当特征值n特别大时，比如当n为100时；仅仅是其2次项特征值(x21,x1x2,x1x3…x1x100;x22,x2x3…x2x100;…)(x_1^2,x_1x_2,x_1x_3\dots x_1x_{100};x_2^2,x_2x_3\dots x_2x_{100};\dots)就有大约5000个（从100累加到1）。而在实际
复制链接

扫一扫