1.为什么要引入神经网络(Neural Network)
一句话总结就是当特征值n特别大时,比如当n为100时;仅仅是其2次项特征值 (x21,x1x2,x1x3…x1x100;x22,x2x3…x2x100;…) 就有大约5000个(从100累加到1)。而在实际问题中n的值往往有上百万,上亿。所以这样就非常容易导致过度拟合,以及计算量大的问题。因此,便引入了神经网络(neural network)。
2.神经网络模型(Neural Network Model)
Let’s examine how we will represent a hypothesis function using neural networks. At a very simple level, neurons are basically computational units that take inputs (dendrites) as electrical inputs (called “spikes”) that are channeled to outputs (axons). In our model, our dendrites are like the input features x1…xn , and the output is the result of our hypothesis function. In this model our x0 input node is sometimes called the “bias unit.” It is always equal to 1. In neural networks, we use the same logistic function as in classification, 11+e−θTx , yet we sometimes call it a sigmoid (logistic) activation function. In this situation, our “theta” parameters are sometimes called “weights“.
如图就是一个只包含一个神经元的模型,黄色圆圈为神经元细胞(cell body),
而真正的神经网络是若干个这样不同的神经元组合而成的,如下图
其中 x0=1 ,称为 bias unit, a(2)0 称为mixture bias unit,也为1。通常我们不需要表示出来,知道其存在就好。另外,我们称Layer1为输入层(input layer),Layer2为输出层(output layer),中间的所有(这儿仅Layer2)层都称为隐藏层(hidden layer)。并且在这个例子中,我们称 a20,a21,a22,a23 为活化单元(activation unit)。
3.神经网络的数学定义(Mathematical definition)
Θ(j) 是一个矩阵,表示第 j 层所对应的权重(weights);也就是说每一层都有这样一个矩阵,通过该层的活化单元
比如一个 4×3 的权重矩阵: