Introduction to neural network
题外话:
Matlab的神经网络工具箱使得神经网络得到大力的推广,得到更多的人关注与研究使用. 然而,很多人也就是简单调用matlab的API而已,别人高手已经帮你写好了所有相关的处理函数,自己调用一下API就是了,有时候甚至都不用写代码。直接打开matlab工具箱就能用.
-------------------------------------------------------------------------------------------------------------------------------------------------
建议先跳到最后瞥一眼实现代码,然后再从这里看 :)
一 . 什么是神经网络?
Humans and other animals process information with neural networks. These are formed from trillions of neurons (nerve cells) exchanging brief electrical pulses called action potentials. Computer algorithms that mimic these biological structures are formally called artificial neural networks to distinguish them from the squishy things inside of animals. However, most scientists and engineers are not this formal and use the term neural network to
include both biological and nonbiological systems.
This neural network is formed in three layers, called the input layer,hidden layer, andoutput layer. Each layer consists of one or more nodes, represented in this diagram by the small circles. The lines between the nodes indicate the flow of information from one node to the next. In this particular type of neural network, the information flows only from the input to the output (that is, from left-to-right). Other types of neural networks have more intricate connections, such as feedback paths.
神经网络其实就是 “非线性方程组基于随机概率的暴力近似求解”,这是我的理解。至于为什么,单扯理论是讲不明白的,来个实际的例子会好很多.
二. 神经网络的简单实现——检测数字
这里又要感谢开明的washington university。我的程序根据他们的demo改变而来,原来他们的程序测试输入单一,我对程序进行了简单的改进,希望能更好的阐述nerual network :)
这里demo要求实现的功能就是允许用户输入这样的10*5的点阵(大写字母O以及空格组成),图中例子为数字0 和数字1数2的点整表示形式,计算机原本是不会识别这样的点阵的,就像计算机原本无法识别人脸。这些点阵的排布几乎不能用数学方程式表示出来,是典型的非线性模型.
首先我们对我们的对象——神经网络,进行数据抽象.
需要说明的是,这里仅仅是介绍神经网络的实现,使用的是最最简单的神经网络,没有中间层,
只有输入(InputLayyer)输出(OutputLayyer)两层。 INT REAL 分别是int 和double类型的数据(不得不说对于C的抽象封装来说typedef简直棒呆~)
typedef struct { /* A LAYER OF A NET: */
INT Units; /* - number of units in this layer */
REAL* Activation; /* - activation of ith unit */
INT* Output; /* - output of ith unit */
REAL* Error; /* - error term of ith unit */
REAL** Weight; /* - connection weights to ith unit */
} LAYER;
typedef struct { /* A NET: */
LAYER* InputLayer; /* - input layer */
LAYER* OutputLayer; /* - output layer */
REAL Eta; /* - learning rate */
REAL Error; /* - total net error */
REAL Epsilon; /* - net error to terminate training */
} NET;
这个实现的层和网络的各个子成员后面还会又介绍.
这里是基于随机概率产生某个系数的函数实现,比方说RandomEqualINT(),这个函数就是随机的产生一个位于[ Low,High ]之间的整数,而RandEqualREAL则是产生一个实数(double类型嘛~)
void InitializeRandoms()
{
srand(4711);
}
INT RandomEqualINT(INT Low, INT High)
{
return rand() % (High-Low+1) + Low;
}
REAL RandomEqualREAL(REAL Low, REAL High)
{
return ((REAL) rand() / RAND_MAX) * (High-Low) + Low;
}
Pattern数组定义了NUM_DATA ( 10 )个“二维点阵”,用来表示象形的阿拉伯数字1~9
宏定义X Y分别是这里点阵的列数和行数,N是一个点阵内所有的点的数目
网络是X×Y(35)个元素输入,M(10)个元素输出.
InitializeApplication 实现了把二维向量转化为一维向量的功能
(就是把二维的点阵转化到一维去,比方说
0 1 0
0 1 0
0 1 0
这个点阵转换成 0 1 0 0 1 0 0 1 0
)
如果Pattern 点阵里面是储存的字符‘O’ 就把全局变量Input数组对应的位设置成HI (1) 否则设置成LO(-1)
这里有NUM_DATA个点阵,于是转化成了十个一维向量,组合起来就是全局变量Input
void InitializeApplication(NET* Net)
{
INT n,i,j;
Net->Eta = 0.001;
Net->Epsilon = 0.0001;
for (n=0; n<NUM_DATA; n++) {
for (i=0; i<Y; i++) {
for (j=0; j<X; j++) {
Input[n][i*X+j] = (Pattern[n][i][j] == 'O') ? HI : LO;
}
}
}
f = fopen("ADALINE.txt", "w");
}
RandomWeights用于初始化输出层的权重. 取-0.5到0.5之间是随机数,这里就是随机初始化而已. Don't panic :)
void RandomWeights(NET* Net)
{
INT i,j;
for (i=1; i<=Net->OutputLayer->Units; i++) {
for (j=0; j<=Net->InputLayer->Units; j++) {
Net->OutputLayer->Weight[i][j] = RandomEqualREAL(-0.5, 0.5);
}
}
}
介绍了部分函数,接下来介绍main函数的主要的部分,借此进而介绍其他没有介绍的函数. 遇到问题,解决问题.
可以看到之里有个BOOL类型的变量Stop,每次循环开始都会被赋值为TRUE(1).默认是要while停止循环的
while(NOT Stop)。NOT 是 !的宏定义. main函数内部的局部变量Error用于记录全网络的误差
一旦网络误差Net.Error小于网络求解精度要求Net.Epsilon,那么就停止训练,否则一直训练.
第一个for循环调用了SimulateNet(),注意这里 BOOL变量Traning 和Protocoling被置为FALSE(0)
void SimulateNet(NET* Net, INT* Input, INT* Target, BOOL Training, BOOL Protocoling)
{
INT Output[M];
SetInput(Net, Input, Protocoling);
PropagateNet(Net);
GetOutput(Net, Output, Protocoling);
ComputeOutputError(Net, Target);
if (Training)
AdjustWeights(Net);
}
调用了三个函数
SetInput(), 还是要记得,这里我们在第一个for循环的时候,Protocoling是FALSE.
这里把当前输入向量Input_current_vector,作为当前网络的输入层的输出。
void SetInput(NET* Net, INT* Input_current_vector, BOOL Protocoling)
{
INT i;
for (i=1; i<=Net->InputLayer->Units; i++) {
Net->InputLayer->Output[i] = Input_current_vector[i-1];
}
if (Protocoling) {
WriteInput(Net, Input_current_vector);
}
}
Net->OutputLayer->Weight[i][j] 这是随机初始化得来的权重(后面会根据偏差进行不断的修正,直达达到误差精度要求).
Propagate(), 对把每个输入进行不同权重的组合相加,然后得到一个值Sum,大于0,取HI,小于0,取LO赋值给输出层的输出.
void PropagateNet(NET* Net)
{
INT i,j;
REAL Sum;
for (i=1; i<=Net->OutputLayer->Units; i++) {
Sum = 0;
for (j=0; j<=Net->InputLayer->Units; j++) {
Sum += Net->OutputLayer->Weight[i][j] * Net->InputLayer->Output[j];
}
Net->OutputLayer->Activation[i] = Sum;
if (Sum >= 0)
Net->OutputLayer->Output[i] = HI;
else
Net->OutputLayer->Output[i] = LO;
}
}
把当前网络输出Net->OutputLayer->Output输出到输出向量Output_current_vector当中.
GetOutput().
void GetOutput(NET* Net, INT* Output_current_vector, BOOL Protocoling)
{
INT i;
for (i=1; i<=Net->OutputLayer->Units; i++) {
Output_current_vector[i-1] = Net->OutputLayer->Output[i];
}
if (Protocoling) {
WriteOutput(Net, Output_current_vector);
}
}
ComputeOutputError()把输出层的输出和Target(训练目标,这里是矩阵Output)做差值,作为Err和输出层的Error[i]
网络的Error被更新为 原有误差 + 0.5×sqr(Err)。
void ComputeOutputError(NET* Net, INT* Target)
{
INT i;
REAL Err;
Net->Error = 0;
for (i=1; i<=Net->OutputLayer->Units; i++) {
Err = Target[i-1] - Net->OutputLayer->Activation[i];
Net->OutputLayer->Error[i] = Err;
Net->Error += 0.5 * sqr(Err);
}
}
接着main函数内部第二层for循环对网络进行训练,注意这里SImulateNet函数的第四个参数Traning变成了TURE
for (m=0; m<10*NUM_DATA; m++) {
n = RandomEqualINT(0, NUM_DATA-1);
SimulateNet(&Net, Input[n], Output[n], TRUE, FALSE);
}
于是会调用函数 AdjustWeights(),这才是反馈的过程 调整输出层的权重,具体方法是当前权重加上输出偏差Error[i]*输入层的输出Output[i]×精度Net->Eta.
可能会又疑问,为什么这么干?
void AdjustWeights(NET* Net)
{
INT i,j;
INT Out;
REAL Err;
for (i=1; i<=Net->OutputLayer->Units; i++) {
for (j=0; j<=Net->InputLayer->Units; j++) {
Out = Net->InputLayer->Output[j];
Err = Net->OutputLayer->Error[i];
Net->OutputLayer->Weight[i][j] += Net->Eta * Err * Out;
}
}
}
回想输出层的Error[i]如果大于0标志着,当前输出小于期望输出,如果小于0,反之。
就说明要加大该输入点j对应的输出点i的权重,以缩小实际输出和期望的差值(误差从大于0到趋近于0).
同理,如果当前输出层的输出比期望输出要大,那么输出层的Error[i]就是个负数,以减小对应节点间的权重,缩小误差.(误差从小于0趋向于0)
AHa~框架就是这样,剩下的部分都是测试,基本上把主要重要的代码都讲清楚了.接下来我们可以开心的玩测试了.
我故意对输入造成一定的误差(点阵数字都是有标准定义的Parttern数组),得到的检测结果,其实感觉效果还蛮好的,嘿嘿,具有一定的容错性。
毕竟都没有中间层,神经网络的鲁棒性还没有发挥出来,神经网络的中间层越多,鲁棒性就越强,同时计算量会增大,时间消耗加大.
下面是我经过修改过的代码:
/*********************************************************
Code writer : EOF
Code date : 2014.10.24
Code file : adaline_netwrok.c
e-mail: : jasonleaster@gmail.com
Code Description:
This program is a demo for beginner to understand
nerual network.
If there is something wrong with my code, please
touch me by e-mail.
**********************************************************/
#include <stdlib.h>
#include <stdio.h>
typedef int BOOL;
typedef char CHAR;
typedef int INT;
typedef double REAL;
#define FALSE 0
#define TRUE 1
#define NOT !
#define AND &&
#define OR ||
#define MIN(x,y) ((x)<(y) ? (x) : (y))
#define MAX(x,y) ((x)>(y) ? (x) : (y))
#define LO -1
#define HI +1
#define BIAS 1
#define sqr(x) ((x)*(x))
typedef struct { /* A LAYER OF A NET: */
INT Units; /* - number of units in this layer */
REAL* Activation; /* - activation of ith unit */
INT* Output; /* - output of ith unit */
REAL* Error; /* - error term of ith unit */
REAL** Weight; /* - connection weights to ith unit */
} LAYER;
typedef struct { /* A NET: */
LAYER* InputLayer; /* - input layer */
LAYER* OutputLayer; /* - output layer */
REAL Eta; /* - learning rate */
REAL Error; /* - total net error */
REAL Epsilon; /* - net error to terminate training */
} NET;
/******************************************************************************
R A N D O M S D R A W N F R O M D I S T R I B U T I O N S
******************************************************************************/
void InitializeRandoms()
{
srand(4711);
}
INT RandomEqualINT(INT Low, INT High)
{
return rand() % (High-Low+1) + Low;
}
REAL RandomEqualREAL(REAL Low, REAL High)
{
return ((REAL) rand() / RAND_MAX) * (High-Low) + Low;
}
/******************************************************************************
A P P L I C A T I O N - S P E C I F I C C O D E
******************************************************************************/
#define NUM_DATA 10
#define X 5
#define Y 7
#define N (X * Y)
#define M 10
CHAR Pattern[NUM_DATA][Y][X] = { { " OOO ",
"O O",
"O O",
"O O",
"O O",
"O O",
" OOO " },
{ " O ",
" OO ",
"O O ",
" O ",
" O ",
" O ",
" O " },
{ " OOO ",
"O O",
" O",
" O ",
" O ",
" O ",
"OOOOO" },
{ " OOO ",
"O O",
" O",
" OOO ",
" O",
"O O",
" OOO " },
{ " O ",
" OO ",
" O O ",
"O O ",
"OOOOO",
" O ",
" O " },
{ "OOOOO",
"O ",
"O ",
"OOOO ",
" O",
"O O",
" OOO " },
{ " OOO ",
"O O",
"O ",
"OOOO ",
"O O",
"O O",
" OOO " },
{ "OOOOO",
" O",
" O",
" O ",
" O ",
" O ",
"O " },
{ " OOO ",
"O O",
"O O",
" OOO ",
"O O",
"O O",
" OOO " },
{ " OOO ",
"O O",
"O O",
" OOOO",
" O",
"O O",
" OOO " } };
CHAR Pattern_for_testing[NUM_DATA][Y][X] = { { " OOO ",
" ",
" O",
" O",
" O",
" O",
" O " },
{ " O ",
" O ",
" O ",
" O ",
" O ",
" O ",
" O "},
{ " OOO ",
"O O",
" O",
" O ",
" O ",
" O ",
"OOOOO" },
{ " OOO ",
"O O",
"O O",
"OOOOO",
" O",
"O O",
" OOO " },
{ " O ",
" OO ",
" O O ",
"O O ",
"OOOOO",
" ",
" " },
{ "OOOOO",
"O ",
"O ",
"OOOOO",
"O O",
"O O",
"OOOOO" },
{ " OOO ",
"O O",
"O ",
"OOOO ",
"O O",
"O O",
" OOO " },
{ "OOOOO",
" O",
" O",
" O ",
" O ",
" O ",
"O " },
{ " OOO ",
"O O",
"O O",
" ",
"O O",
"O O",
" OOO " },
{ " OOO ",
"O O",
"O O",
"O O",
"O O",
"O O",
" OOO " } };
INT Input [NUM_DATA][N];
INT Input_for_testing [NUM_DATA][N];//EOF added
INT Output[NUM_DATA][M] =
{ {HI, LO, LO, LO, LO, LO, LO, LO, LO, LO},
{LO, HI, LO, LO, LO, LO, LO, LO, LO, LO},
{LO, LO, HI, LO, LO, LO, LO, LO, LO, LO},
{LO, LO, LO, HI, LO, LO, LO, LO, LO, LO},
{LO, LO, LO, LO, HI, LO, LO, LO, LO, LO},
{LO, LO, LO, LO, LO, HI, LO, LO, LO, LO},
{LO, LO, LO, LO, LO, LO, HI, LO, LO, LO},
{LO, LO, LO, LO, LO, LO, LO, HI, LO, LO},
{LO, LO, LO, LO, LO, LO, LO, LO, HI, LO},
{LO, LO, LO, LO, LO, LO, LO, LO, LO, HI} };
FILE* f;
void InitializeApplication(NET* Net)
{
INT n,i,j;
Net->Eta = 0.001;
Net->Epsilon = 0.0001;
for (n=0; n<NUM_DATA; n++) {
for (i=0; i<Y; i++) {
for (j=0; j<X; j++) {
Input[n][i*X+j] = (Pattern[n][i][j] == 'O') ? HI : LO;
/*
** EOF added
*/
Input_for_testing[n][i*X+j] = (Pattern_for_testing[n][i][j] == 'O') ? HI : LO;
}
}
}
f = fopen("ADALINE.txt", "w");
}
void WriteInput(NET* Net, INT* Input_current_vector)
{
INT i;
for (i=0; i<N; i++) {
if (i%X == 0) {
fprintf(f, "\n");
}
fprintf(f, "%c", (Input_current_vector[i] == HI) ? 'O' : ' ');
}
fprintf(f, " -> ");
}
void WriteOutput(NET* Net, INT* Output_current_vector)
{
INT i;
INT Count, Index;
Count = 0;
for (i=0; i<M; i++) {
if (Output_current_vector[i] == HI) {
Count++;
Index = i;
}
}
if (Count == 1)
fprintf(f, "%i\n", Index);
else
fprintf(f, "%s\n", "invalid");
}
void FinalizeApplication(NET* Net)
{
fclose(f);
}
/******************************************************************************
I N I T I A L I Z A T I O N
******************************************************************************/
void GenerateNetwork(NET* Net)
{
INT i;
Net->InputLayer = (LAYER*) malloc(sizeof(LAYER));
Net->OutputLayer = (LAYER*) malloc(sizeof(LAYER));
Net->InputLayer->Units = N;
Net->InputLayer->Output = (INT*) calloc(N+1, sizeof(INT));
Net->InputLayer->Output[0] = BIAS;
Net->OutputLayer->Units = M;
Net->OutputLayer->Activation = (REAL*) calloc(M+1, sizeof(REAL));
Net->OutputLayer->Output = (INT*) calloc(M+1, sizeof(INT));
Net->OutputLayer->Error = (REAL*) calloc(M+1, sizeof(REAL));
Net->OutputLayer->Weight = (REAL**) calloc(M+1, sizeof(REAL*));
for (i=1; i<=M; i++) {
Net->OutputLayer->Weight[i] = (REAL*) calloc(N+1, sizeof(REAL));
}
Net->Eta = 0.1;
Net->Epsilon = 0.01;
}
void RandomWeights(NET* Net)
{
INT i,j;
for (i=1; i<=Net->OutputLayer->Units; i++) {
for (j=0; j<=Net->InputLayer->Units; j++) {
Net->OutputLayer->Weight[i][j] = RandomEqualREAL(-0.5, 0.5);
}
}
}
void SetInput(NET* Net, INT* Input_current_vector, BOOL Protocoling)
{
INT i;
for (i=1; i<=Net->InputLayer->Units; i++) {
Net->InputLayer->Output[i] = Input_current_vector[i-1];
}
if (Protocoling) {
WriteInput(Net, Input_current_vector);
}
}
void GetOutput(NET* Net, INT* Output_current_vector, BOOL Protocoling)
{
INT i;
for (i=1; i<=Net->OutputLayer->Units; i++) {
Output_current_vector[i-1] = Net->OutputLayer->Output[i];
}
if (Protocoling) {
WriteOutput(Net, Output_current_vector);
}
}
/******************************************************************************
P R O P A G A T I N G S I G N A L S
******************************************************************************/
void PropagateNet(NET* Net)
{
INT i,j;
REAL Sum;
for (i=1; i<=Net->OutputLayer->Units; i++) {
Sum = 0;
for (j=0; j<=Net->InputLayer->Units; j++) {
Sum += Net->OutputLayer->Weight[i][j] * Net->InputLayer->Output[j];
}
Net->OutputLayer->Activation[i] = Sum;
if (Sum >= 0)
Net->OutputLayer->Output[i] = HI;
else
Net->OutputLayer->Output[i] = LO;
}
}
/******************************************************************************
A D J U S T I N G W E I G H T S
******************************************************************************/
void ComputeOutputError(NET* Net, INT* Target)
{
INT i;
REAL Err;
Net->Error = 0;
for (i=1; i<=Net->OutputLayer->Units; i++) {
Err = Target[i-1] - Net->OutputLayer->Activation[i];
Net->OutputLayer->Error[i] = Err;
Net->Error += 0.5 * sqr(Err);
}
}
void AdjustWeights(NET* Net)
{
INT i,j;
INT Out;
REAL Err;
for (i=1; i<=Net->OutputLayer->Units; i++) {
for (j=0; j<=Net->InputLayer->Units; j++) {
Out = Net->InputLayer->Output[j];
Err = Net->OutputLayer->Error[i];
Net->OutputLayer->Weight[i][j] += Net->Eta * Err * Out;
}
}
}
/******************************************************************************
S I M U L A T I N G T H E N E T
******************************************************************************/
void SimulateNet(NET* Net, INT* Input, INT* Target, BOOL Training, BOOL Protocoling)
{
INT Output[M];
SetInput(Net, Input, Protocoling);
PropagateNet(Net);
GetOutput(Net, Output, Protocoling);
ComputeOutputError(Net, Target);
if (Training)
AdjustWeights(Net);
}
/******************************************************************************
M A I N
******************************************************************************/
void main()
{
NET Net;
REAL Error;
BOOL Stop;
INT n,m;
InitializeRandoms();
GenerateNetwork(&Net);
RandomWeights(&Net);
InitializeApplication(&Net);
do {
Error = 0;
Stop = TRUE;
for (n=0; n<NUM_DATA; n++) {
SimulateNet(&Net, Input[n], Output[n], FALSE, FALSE);
Error = MAX(Error, Net.Error);
Stop = Stop AND (Net.Error < Net.Epsilon);
}
Error = MAX(Error, Net.Epsilon);
printf("Training %0.0f%% completed ...\n", (Net.Epsilon / Error) * 100);
if (NOT Stop) {
for (m=0; m<10*NUM_DATA; m++) {
n = RandomEqualINT(0, NUM_DATA-1);
SimulateNet(&Net, Input[n], Output[n], TRUE, FALSE);
}
}
} while (NOT Stop);
for (n=0; n<NUM_DATA; n++) {
SimulateNet(&Net, Input_for_testing[n], Output[n], FALSE, TRUE);
}
FinalizeApplication(&Net);
}
update : 2014.10.27
后话:
神经网络是用来“尝试”搞定目前数学水平无法搞定的问题,其设计思想“学习”里就隐含着让算法自动运行,人类不再过多干预算法的处理过程,就像一台机器,你按下“开始”按钮就行了,其他的,他自己会去完成。
而所谓的学习,是一个不断自我修正的过程,不断的调整网络内部的系数(不管怎么样,一旦上了项目,这里的计算量就大的吓人),从而不断的缩小期望输出与实际输出的差值,直到最后满足误差精度要求,即刻停止“学习”。
典型的项目——人脸识别,预测类的问题,都可以用上神经网络,我开blog开头说过,神经网络的实质就是非线性问题基于概率模型的暴力求解.
update: 2015.02.01
@XuGuowei 提到,这里为什么要把初始的参数随机化。
我想这里是为了让训练网络用的平均时间最短。
如果Viewer写过快排(Quick Sort)或者对快排有一定认识的话会发现,快排的时间复杂度是平均时间,而不是和堆排一样的最坏情况O(n lgn). 还有,二叉树,一般的最简单的二叉搜索树(BST, binary search tree),很可能是不平衡的,这样搜索的时间复杂度可能就不够好,而随机化生成的二叉搜索树的左右子树的高度则会”尽可能的平衡“,从而保证搜索的时候时间复杂度不会很差,因为是随机的,随意时间复杂度是平均化的时间。
如果这里初始化的值都是0,或者某些极端的或特殊的值,那么往往不利于网络的收敛。