想从0开始学习神经网络,所用语言python。可是对python也不熟姑且算作从0开始。至于如何学习神经网络也是一头雾水。索性通过一本书《Neural Networks for Applied Sciences and Engineering》来一窥神经网络的究竟。至于这个博客能写多少,对于神经网络的研究能走多远我也心里发虚。从没把一件事情做好过。还是走一步算一步吧。废话以后再说。
感知器
由简入繁,先要搞明白的就是单个感知器了。单个感知器就是单个神经元。单个神经元如图所示
树突是接收数据的,细胞体是处理数据的,轴突是传送数据的导线,髓鞘是保护导线的绝缘套,突触是输出用的。于是感知器就用来模拟这个神经元。如何模拟呢?大致如下图所示:
这是什么意思?慢慢来,先看树突。树突是用来接收数据的,可以看到树突有好多条,树突就像一条大河的许多支流。有的支流水流大,有的支流水流小。这样就要对应感知器的输入,有的输入作用较大,有的输入作用较小。这就对应每个输入的weight也就是权重。再看细胞体,细胞体对输入的数据进行整合处理,输出结果到轴突。在感知器中这个细胞体就是函数。我们现在只讨论最简单的情况,也就是阀值函数threshold function。这个函数的输出只有两种情况0或1. 就像一条河流。当流量大于某个阀值的时候我们称它为大河否则就说这是小河。这种阀值函数的结果输出只要一个突触就够了。这个突触如果从轴突传过来的是电信号就是1,无信号就是0. 这是最简单的了。
总结一下。我们现在研究的神经元是简化后的神经元,多个树突,一个细胞核,轴突,和一个突触。这个细胞核只能产生0或1传给突触。类比河流,我们研究的河流是多条支流流入一条干流,通过流量判断这条河是大河还是小河。再看看我们的模拟神经元–感知器。我们的感知器如上图,有多个输入,每个输入都有权值代表支流的大小。然后是一个Input-output function这个就是细胞核了。函数计算结果就是输出了。
训练感知器
假如我们有了一个感知器也就是单个的神经元。那么我们如何训练我们的感知器,让它可以分类呢?比如我们有一堆水果,我们可能会教一个新生儿哪个是苹果哪个不是苹果。如此反复训练后,小孩子就有自己的判断了。对于小孩子来说我们的训练改变了他大脑的记忆。那么对于感知器来说,我们可以改变每条输入的权重。就像改变支流的流量一样。那么如何改感知器的输入权重呢?更改的规则又是什么?这里就涉及到Hebbian Learning。现在你可以搜索一下什么是hebbian learning。简单的说就是,如果两个神经有输出x和y。如果x能够刺激y的生成。那么这两个神经元的连接就会变强。数学表达一下就是两个神经元之间的权重改变量与x和y的乘积成比例。
为了方便计算我们写成如下形式
这个乘积的系数贝塔的值称为learning rate,这个值决定了学习的速度。但是并不是越大越好,太大会导致学习不收敛。
因此新的权重就可以写成
例子
现在对感知器有一个简单的理解了么?我要写一个程序模拟这个感知器。我们的数据来自下面这个网站
MNIST手写数字数据集
这个网站里有四个文件可以下载,分别是训练图片集,训练标签集,测试图片集,测试标签集。训练图片包含了60000张手写数字图片,所对应的训练标签是这些图片都写了什么数字。还是看图吧。
我从训练图片集里拿到的前几张手写数字图片如下所示。
相应的python代码如下
import struct
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
if __name__ == '__main__':
print 'read start'
#experiment file can download at
#http://yann.lecun.com/exdb/mnist/
#download train-images-idx3-ubyte and uncompress it
#then rename to train_images
raw_data_images = open('train_images','rb')
filedata_images = raw_data_images.read()
#download train-labels-idx1-ubyte, uncompress, rename to train_labels
raw_data_labels = open('train_labels','rb')
filedata_labels = raw_data_labels.read()
print 'read end'
fig = plt.figure()
#find the file format on the website
#http://yann.lecun.com/exdb/mnist/
raw_data_images.seek(16)
#four piece of images
for j in range(4):
#size of image is 28*28
imgg = raw_data_images.read(28*28)
#variance for contain converted image data
img=()
for i in range(28*28):
img = img + struct.unpack('>b', imgg[i])
#convert data to unsigned int
img = np.uint8(img)
#reshape data from 1D to 2D
img = np.reshape(img, (28,28))
#new_im is used as a photo
new_im = Image.fromarray(img)
#resize new_im show you can see it clear
new_im = new_im.resize((140, 140))
#place four image in one canvas
imageshow = fig.add_subplot(1,4,j+1)
#label data is stored in tuple
(a,)=struct.unpack('>b', filedata_labels[8+j])
#title for each image
plt.title(a)
imageshow.imshow(new_im)
#turn off axis
plt.axis('off')
plt.show()
#close two file
raw_data_images.close()
raw_data_labels.close()
我们现在有实验数据了,但是还要修改一下这个数据。因为我们研究的上面的感知器是二分感知器也就是说这个感知器只能说“是”或“不是”。所以我们要改造一下实验数据。不如我们来判断一个手写图片是不是0,那就把所有不是0的图片的标签设置为1。这样所有是0的图片标签还是0,但是不是0的图片标签都是1. 这样我们的感知器就能工作了。它只要告诉我们是不是0就行。
感知器源代码如下:
# encoding=utf-8
# @Author: WenDesi
# @Date: 09-08-16
# @Email: wendesi@foxmail.com
# @Last modified by: YuXiaoyu
# @Last modified time: 16-6-2017
import pandas as pd
import numpy as np
#import cv2
import random
import time
from PIL import Image
import matplotlib.pyplot as plt
import struct
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
class Perceptron(object):
# initialization of the class
def __init__(self):
# learn rate
self.learning_step = 0.00001
self.max_iteration = 5000
# weight for input
self.w = [0.0] * 784
# prediction
def predict_(self, x):
# sum up all product between weight and input correspondingly
wx = sum([self.w[j] * x[j] for j in xrange(len(self.w))])
return int(wx > 0)
# train perceptron
# learn all of the images but don't care about how much they learn
def train(self, features, labels):
for index in xrange(0, len(labels)-1):
x = list(features[index])
y = 2 * labels[index] - 1
wx = sum([self.w[j] * x[j] for j in xrange(len(self.w))])
# if percetron not predict correctly,
# perceptron will learn something according Hebbian law
if wx * y <= 0:
for i in xrange(len(self.w)):
self.w[i] += self.learning_step * (y * x[i])
# output predicted labels for test images
def predict(self,features):
labels = []
for feature in features:
x = list(feature)
labels.append(self.predict_(x))
return labels
if __name__ == '__main__':
# experiment file can download at
# http://yann.lecun.com/exdb/mnist/
# download train-images-idx3-ubyte and uncompress it
# then rename to train_images
print 'preparing data'
train_images_file = open('train_images','rb')
# why seek 16 bytes? You should check the format of file
# at the bottom of the website which is
# http://yann.lecun.com/exdb/mnist/
train_images_file.seek(16)
train_images_data = train_images_file.read()
# download train-labels-idx1-ubyte, uncompress, rename to train_labels
train_labels_file = open('train_labels','rb')
train_labels_data = train_labels_file.read()
test_images_file = open('test_images','rb')
test_images_file.seek(16)
test_images_data = test_images_file.read()
# download train-labels-idx1-ubyte, uncompress, rename to train_labels
test_labels_file = open('test_labels','rb')
test_labels_data = test_labels_file.read()
# training images have 60000 pieces of handwriting image, but we choose 6000 here
train_images=[]
for tid in range(6000):
img=()
# each image is 28*28 piexls.
for i in range(28*28):
img = img + struct.unpack('>b', train_images_data[tid*28*28+i])
img = np.uint8(img)
train_images = train_images + [list(img)]
# training labels corresponding to training images
train_labels=[]
lab=()
for il in range(6000):
# Why is 8+il? You should check the file format at
# the website http://yann.lecun.com/exdb/mnist/
(a,)=struct.unpack('>b', train_labels_data[8+il])
# if handwriting images is not 0 change the value to 1
# else the value unchange
if(a!=0):
lab = lab + (1,)
else:
lab = lab + (a,)
lab = np.uint8(lab)
train_labels = list(lab)
# test images for test set
test_images=[]
for tid in range(60):
img=()
for i in range(28*28):
img = img + struct.unpack('>b', test_images_data[tid*28*28+i])
img = np.uint8(img)
test_images = test_images + [list(img)]
# test labels corresponding to test images
test_labels=[]
lab=()
for il in range(60):
(a,) = struct.unpack('>b', test_labels_data[8+il])
if(a!=0):
lab = lab + (1,)
else:
lab = lab + (a,)
lab = np.uint8(lab)
test_labels = list(lab)
print 'Start training'
p = Perceptron()
p.train(train_images, train_labels)
print 'Start predicting'
test_predict = p.predict(test_images)
# performance for our perceptron
score = accuracy_score(test_labels, test_predict)
print "The accruacy socre is ", score
# show some images and the labels we predicted
fig = plt.figure()
# N*N is the number of images we want display
# N row and N column of images
N=6
for i in range(0,N):
for j in range(0,N):
imgg = test_images[i*N+j]
imgg = np.uint8(imgg)
imgg = np.reshape(imgg, (28,28))
new_im = Image.fromarray(imgg)
new_im = new_im.resize((140, 140))
imageshow = fig.add_subplot(N,N,i*N+j+1)
plt.title(test_predict[i*N+j])
imageshow.imshow(new_im)
plt.axis('off')
plt.show()
train_images_file.close()
train_labels_file.close()
test_images_file.close()
test_labels_file.close()