ML--神经网络

最新推荐文章于 2023-05-01 11:18:34 发布

V_lq6h

最新推荐文章于 2023-05-01 11:18:34 发布

阅读量1k

点赞数

分类专栏： ML

本文链接：https://blog.csdn.net/V_lq6h/article/details/87889544

版权

ML 专栏收录该内容

29 篇文章 2 订阅

订阅专栏

ML–神经网络

主要涉及的知识点有：

神经网络的前世今生
神经网络的原理和非线性矫正
神经网络的模型参数调节
使用神经网络训练手写数字识别模型

一.神经网络的前世今生

其实神经网络并不是什么新鲜事物了，早在1943年，美国神经解剖家沃伦.麦克洛奇(Warren McCulloch)和数学家沃尔特.皮茨(Walter Pitts)就提出了第一个脑神经元的抽象模型，被称为M-P模型(McCulloch-Pitts neuron,MCP)

1.神经网络的起源

神经元是大脑中相互连接的神经细胞，它可以处理和传递化学和电信号。有意思的是，神经元具有两种常规工作状态：兴畚和抑制，这和计算机中的"1"和"0"原理几乎完全一样。所以将神经元描述为一个具备二进制输出的逻辑门：当传入的神经冲动使细胞膜电位升高超过阈值时，细胞进入兴畚状态，产生神经冲动并由轴突输出；反之当传入的冲动使细胞膜电位下降低于阈值时，细胞进入抑制状态，便没有神经冲动输出

2.神经网络之父–杰弗瑞.欣顿

杰弗瑞.欣顿等人提出了反向传播算法(Back propagation,BP)，解决了两层神经网络所需要的复杂计算问题，重新带动业界的热潮

二.神经网络的原理及使用

1.神经网络中的非线性矫正

从数学的角度来说，如果每一个隐藏层只是进行加权求和，得到的结果和普通的线性模型不会有什么不同。所以为了让模型能够比普通线性模型更强大一些，我们还需要进行一点处理

这种处理方法是：在生成隐藏层之后，我们要对结果进行非线性矫正(rectifying nonlinearity)，简称为relu(rectified linear unit)或者是进行双曲正切处理(tangens hyperbolicus)，简称为tanh。我们用图像来进行直观展示

# 导入numpy
import numpy as np
# 导入画图工具
import matplotlib.pyplot as plt

# 生成一个等差数列
line=np.linspace(-5,5,200)

# 画出非线性矫正的图形表示
plt.plot(line,np.tanh(line),label='tanh')
plt.plot(line,np.maximum(line,0),label='relu')

# 设置图注位置
plt.legend(loc='best')

plt.xlabel('x')
plt.ylabel('relu(x) and tanh(x)')

plt.show()

[结果分析] tanh函数把特征x的值压缩进-1到1的区间内，-1代表的是x中较小的数值，而1代表x中较大的数值。relu函数则索性把小于0的x值全部去掉，用0来代替。这两种非线性处理的方法，都是为了将样本特征进行简化，从而使神经网络可以对复杂的非线性数据集进行学习

2.神经网络的参数设置

# 导入MLP神经网络
from sklearn.neural_network import MLPClassifier
# 导入红酒数据集
from sklearn.datasets import load_wine
# 导入数据集拆分工具
from sklearn.model_selection import train_test_split

wine=load_wine()
X=wine.data[:,:2]
y=wine.target

# 拆分数据集
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=0)

# 定义分类器
mlp=MLPClassifier(solver='lbfgs')
mlp.fit(X_train,y_train)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=None, shuffle=True, solver='lbfgs', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False)

下面我们重点看一下各个参数的含义：

alpha值和线性模型的alpha值是一样的，是一个L2惩罚项，用来控制正则化的程度，默认的数值是0.0001

hidden_layer,sizes参数，默认情况下，hidden_layer_sizes的值是[100,]，这意味着模型中只有一个隐藏层，而隐藏层中的节点数是100.如果我们给hidden_layer_sizes定义为[10,10]，那就意味着模型中有两个隐藏层，每层有10个节点

# 导入画图工具
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

# 使用不同色块表示不同分类
cmap_light=ListedColormap(['#FFAAAA','#AAFFAA','#AAAAFF'])
cmap_bold=ListedColormap(['#FF0000','#00FF00','#0000FF'])

x_min,x_max=X_train[:,0].min()-1,X_train[:,0].max()+1
y_min,y_max=X_train[:,1].min()-1,X_train[:,1].max()+1

xx,yy=np.meshgrid(np.arange(x_min,x_max,.02),np.arange(y_min,y_max,.02))

Z=mlp.predict(np.c_[xx.ravel(),yy.ravel()])

Z=Z.reshape(xx.shape)

plt.figure()
plt.pcolormesh(xx,yy,Z,cmap=cmap_light)

# 将数据特征用散点图表示出来
plt.scatter(X[:,0],X[:,1],c=y,edgecolor='k',s=60)
plt.xlim(xx.min(),xx.max())
plt.ylim(yy.min(),yy.max())

plt.title('MLPClassifier:solver=lbfgs')

plt.show()

下面我们试试吧隐藏层的节点数变少，如减少至10个，看会发生什么

# 设定隐藏层中的节点数为10
mlp_10=MLPClassifier(solver='lbfgs',hidden_layer_sizes=[10])
mlp_10.fit(X_train,y_train)

Z10=mlp_10.predict(np.c_[xx.ravel(),yy.ravel()])

Z10=Z10.reshape(xx.shape)

plt.figure()
plt.pcolormesh(xx,yy,Z10,cmap=cmap_light)

# 使用散点图画出X
plt.scatter(X[:,0],X[:,1],c=y,edgecolor='k',s=60)
plt.xlim(xx.min(),xx.max())
plt.ylim(yy.min(),yy.max())

plt.title("MLPClassifier:nodes=10")

plt.show()

[结果分析] 在每一个隐藏层当中，节点数就代表了决定边界中最大的直线数，这个数值越大，则决定边界看起来越平滑。当然，除了增加单个隐藏层中的节点数之外，还有两种方法可以让边界看起来更细腻：一个是增加隐藏层的数量；另一个是把activation参数改为tanh

现在我们试着给MLP分类器增加隐藏层数量，如增加到2层

# 设置神经网络有两个节点数为10的隐藏层
mlp_2L=MLPClassifier(solver='lbfgs',hidden_layer_sizes=[10,10])
mlp_2L.fit(X_train,y_train)

Z2L=mlp_2L.predict(np.c_[xx.ravel(),yy.ravel()])

Z2L=Z2L.reshape(xx.shape)

plt.figure()
plt.pcolormesh(xx,yy,Z2L,cmap=cmap_light)

# 使用散点图画出X
plt.scatter(X[:,0],X[:,1],c=y,edgecolor='k',s=60)
plt.xlim(xx.min(),xx.max())
plt.ylim(yy.min(),yy.max())

plt.title("MLPClassifier:layers=2")

plt.show()

下面使用activation=tanh实验一下

# 设置激活函数为tanh
mlp_tanh=MLPClassifier(solver='lbfgs',hidden_layer_sizes=[10,10],activation='tanh')
mlp_tanh.fit(X_train,y_train)

Z2=mlp_tanh.predict(np.c_[xx.ravel(),yy.ravel()])

Z2=Z2.reshape(xx.shape)

plt.figure()
plt.pcolormesh(xx,yy,Z2,cmap=cmap_light)

# 使用散点图画出X
plt.scatter(X[:,0],X[:,1],c=y,edgecolor='k',s=60)
plt.xlim(xx.min(),xx.max())
plt.ylim(yy.min(),yy.max())

plt.title("MLPClassifier:layers=2 with tanh")

plt.show()

调节alpha值来进行模型复杂度控制

# 修改模型的alpha参数
mlp_alpha=MLPClassifier(solver='lbfgs',hidden_layer_sizes=[10,10],activation='tanh',alpha=1)
mlp_alpha.fit(X_train,y_train)

Z3=mlp_alpha.predict(np.c_[xx.ravel(),yy.ravel()])

Z3=Z3.reshape(xx.shape)

plt.figure()
plt.pcolormesh(xx,yy,Z3,cmap=cmap_light)

# 使用散点图画出X
plt.scatter(X[:,0],X[:,1],c=y,edgecolor='k',s=60)
plt.xlim(xx.min(),xx.max())
plt.ylim(yy.min(),yy.max())

plt.title("MLPClassifier:alpha=1")

plt.show()

到目前为止，我么有四种方法可以调节模型的复杂程度了，第1种是跳转神经网络每一个隐藏层上的节点数，第2种是调节神经网络隐藏层的层数，第3种是调节activation的方式，第4种是通过调整alpha值来改变模型正则化的程度

[注意] 由于神经网络算法中，样本特征的权重是在模型开始学习之前，就已经随机生成了。而随机生成的权重会导致模型的形态也完全不一样。所以如果我们不指定random_state的话，即便模型所有的参数都是相同的，生成的决定边界也不一样。所以如果重新运行我们之前的代码，，也会得到不同的结果。不过不用担心，只要模型的复杂度不变，其预测结果的准确率不会受什么影响

三.神经网络实例–手写识别

在神经网络的学习中，使用MNIST数据集训练图像识别，就如同程序员刚入门时要写"hello world"一样，是非常基础的必修课

1.使用MNIST数据集

MNIST数据集是一个专门用来训练各种图形处理系统的庞大数据集，它包含70000个手写数字图像，其中60000个是训练数据，另外10000个是测试数据。而在机器学习领域，该数据集也被广泛用于模型的训练和测试。MNIST数据集实际上是从NIST原始数据集中提取的，其训练集和测试集有一半是来自NIST数据集的训练集，而另一半是来自NIST的测试集

接下来我们就用scikit-learn的fetch_mldata来获取MNIST数据集，输入代码如下：

# 导入数据集获取工具
from sklearn.datasets import fetch_mldata

# 加载MNIST手写数字数据集
mnist=fetch_mldata('MNIST original')
mnist

E:\Anaconda\envs\mytensorflow\lib\site-packages\sklearn\utils\deprecation.py:77: DeprecationWarning: Function fetch_mldata is deprecated; fetch_mldata was deprecated in version 0.20 and will be removed in version 0.22
  warnings.warn(msg, category=DeprecationWarning)
E:\Anaconda\envs\mytensorflow\lib\site-packages\sklearn\utils\deprecation.py:77: DeprecationWarning: Function mldata_filename is deprecated; mldata_filename was deprecated in version 0.20 and will be removed in version 0.22
  warnings.warn(msg, category=DeprecationWarning)



---------------------------------------------------------------------------

TimeoutError                              Traceback (most recent call last)

<ipython-input-25-c42d12ebe31a> in <module>()
      3 
      4 # 加载MNIST手写数字数据集
----> 5 mnist=fetch_mldata('MNIST original')
      6 mnist


E:\Anaconda\envs\mytensorflow\lib\site-packages\sklearn\utils\deprecation.py in wrapped(*args, **kwargs)
     76         def wrapped(*args, **kwargs):
     77             warnings.warn(msg, category=DeprecationWarning)
---> 78             return fun(*args, **kwargs)
     79 
     80         wrapped.__doc__ = self._update_doc(wrapped.__doc__)


E:\Anaconda\envs\mytensorflow\lib\site-packages\sklearn\datasets\mldata.py in fetch_mldata(dataname, target_name, data_name, transpose_data, data_home)
    131         urlname = MLDATA_BASE_URL % quote(dataname)
    132         try:
--> 133             mldata_url = urlopen(urlname)
    134         except HTTPError as e:
    135             if e.code == 404:


E:\Anaconda\envs\mytensorflow\lib\urllib\request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    221     else:
    222         opener = _opener
--> 223     return opener.open(url, data, timeout)
    224 
    225 def install_opener(opener):


E:\Anaconda\envs\mytensorflow\lib\urllib\request.py in open(self, fullurl, data, timeout)
    524             req = meth(req)
    525 
--> 526         response = self._open(req, data)
    527 
    528         # post-process response


E:\Anaconda\envs\mytensorflow\lib\urllib\request.py in _open(self, req, data)
    542         protocol = req.type
    543         result = self._call_chain(self.handle_open, protocol, protocol +
--> 544                                   '_open', req)
    545         if result:
    546             return result


E:\Anaconda\envs\mytensorflow\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args)
    502         for handler in handlers:
    503             func = getattr(handler, meth_name)
--> 504             result = func(*args)
    505             if result is not None:
    506                 return result


E:\Anaconda\envs\mytensorflow\lib\urllib\request.py in http_open(self, req)
   1344 
   1345     def http_open(self, req):
-> 1346         return self.do_open(http.client.HTTPConnection, req)
   1347 
   1348     http_request = AbstractHTTPHandler.do_request_


E:\Anaconda\envs\mytensorflow\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args)
   1319             except OSError as err: # timeout error
   1320                 raise URLError(err)
-> 1321             r = h.getresponse()
   1322         except:
   1323             h.close()


E:\Anaconda\envs\mytensorflow\lib\http\client.py in getresponse(self)
   1329         try:
   1330             try:
-> 1331                 response.begin()
   1332             except ConnectionError:
   1333                 self.close()


E:\Anaconda\envs\mytensorflow\lib\http\client.py in begin(self)
    295         # read until we get a non-100 response
    296         while True:
--> 297             version, status, reason = self._read_status()
    298             if status != CONTINUE:
    299                 break


E:\Anaconda\envs\mytensorflow\lib\http\client.py in _read_status(self)
    256 
    257     def _read_status(self):
--> 258         line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    259         if len(line) > _MAXLINE:
    260             raise LineTooLong("status line")


E:\Anaconda\envs\mytensorflow\lib\socket.py in readinto(self, b)
    584         while True:
    585             try:
--> 586                 return self._sock.recv_into(b)
    587             except timeout:
    588                 self._timeout_occurred = True


TimeoutError: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应，连接尝试失败。

使用fetch_mldata加载MNIST数据集时，可以出现下列错误，可以参考：参考文档
重新运行代码如下：

# 导入数据集获取工具
from sklearn.datasets import fetch_mldata

# 加载MNIST手写数字数据集
mnist=fetch_mldata('MNIST original')
mnist

{'COL_NAMES': ['label', 'data'],
 'DESCR': 'mldata.org dataset: mnist-original',
 'data': array([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ..., 
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]], dtype=uint8),
 'target': array([ 0.,  0.,  0., ...,  9.,  9.,  9.])}

print("样本数量：{},样本特征数：{}".format(mnist.data.shape[0],mnist.data.shape[1]))

样本数量：70000,样本特征数：784

[结果分析] 数据集中有70000个样本，每个样本有784个特征。这是因为，数据集中存储的样本是28x28像素的手写数字图片的像素信息，因此特征数为28x28=784个

在开始训练MLP神经网络之前，我们还需要将数据进行一些预处理，由于样本特征是从0–255的灰度值，为了让特征的数值更利于建模，我们把特征向量的值全部除以255，这样全部数值就会在0和1之间，再用我们熟悉的train_test_split函数将数据集分为训练集和测试集

# 建立训练数据集和测试数据集
X=mnist.data/255.
y=mnist.target
X_train,X_test,y_train,y_test=train_test_split(X,y,train_size=5000,test_size=1000,random_state=62)

为了控制神经网络的训练时长，我们只选5000个样本作为训练数据集，选取1000个数据作为测试数据集。同时为了每次选取的数据保持一致，我们指定random_state为62

2.训练MLP神经网络

# 设置神经网络有两个100个节点的隐藏层
mlp_hw=MLPClassifier(solver='lbfgs',hidden_layer_sizes=[100,100],activation='relu',alpha=1e-5,random_state=62)

# 使用数据训练神经网络模型
mlp_hw.fit(X_train,y_train)

print('测试数据集得分：{:.2f}%'.format(mlp_hw.score(X_test,y_test)*100))

测试数据集得分：93.60%

3.使用模型进行数字识别

注意因为图像是28x28像素，所以放大后看起来会不够清晰

# 导入图像处理工具
from PIL import Image
# 打开图像
image=Image.open('8.png').convert('F')

# 调整图像的大小
image=image.resize((28,28))
arr=[]
# 将图像中的像素作为预测数据点的特征
for i in range(28):
    for j in range(28):
        pixel=1.0-float(image.getpixel((j,i)))/255.
        arr.append(pixel)
        
# 由于只有一个样本,所以需要进行reshape操作
arr1=np.array(arr).reshape(1,-1)

# 进行图像识别
print("图片中的数字是：{:.0f}".format(mlp_hw.predict(arr1)[0]))