多层感知器
多层感知器(Multilayer Perceptron,缩写MLP)是一种前向结构的人工神经网络,映射一组输入向量到一组输出向量。MLP可以被看作是一个有向图,由多个的节点层所组成,每一层都全连接到下一层。除了输入节点,每个节点都是一个带有非线性激活函数的神经元(或称处理单元)。一种被称为反向传播算法的监督学习方法常被用来训练MLP。 MLP是感知器的推广,克服了感知器不能对线性不可分数据进行识别的弱点。
前言
本博文主要介绍一下多层感知器的结构,并且用代码实现网络结构的初始化。随笔而已,写的比较粗躁。
模型结构
在之前的博文中说过一个感知器。其可以看作是一个单个的神经元。这次的多层感知器其实是对其的一个衍生和发展。
首先一个最简单的MLP
其中输入向量为x。产生的刺激为z=wx+b
w为权重向量
b就是偏置(网上把偏置说的很悬其实就是这玩意)
激活函数使用最简单的sigmod函数(其实可以为很多
很简单吧,模型图上的每一根线就是一个z=wx+b 所以图上需要两个w向量,两个b向量。(数数问题)
稍微复杂一点
注意:一般神经网络中的输入层是不算的。再就是一个神经元输出只有一个实数。所以我们可以把输入层看成是一个 X 向量。(不是一个神经元能产生一个X向量,一个神经元只能产生一个x1)
如图:输入层为三个我们可以看作x1,x2,x3 组成X向量。
产生的“刺激”:w1X+b1。经过刺激函数转换成y1 (输入层—>隐藏层).
同样产生“刺激” w2X+b1。经过刺激函数转换为y2
所以我们可以得到下面结论:
w1、w2是一个2X3的矩阵。
b1是一个2X1的矩阵。
同理:
“刺激“ w3Y+b2 得到最后结果Z(隐藏层->输出层)
w3是一个1X2的矩阵
b2是一个1X1的矩阵
综上所述
初始化我们需要3个W矩阵,两个b矩阵。
w权重矩阵维度为连线的末尾处神经元(输出)单元数
乘以开头处(输入)单元数
b偏置矩阵维度为连线的末尾处(输出)单元数
乘以1
初始化代码
import numpy as np
import random
def sigmoid(z):
return 1.0/(1.0+np.exp(-z))
def sigmoid_prime(z):
return sigmoid(z)(1-sigmoid(z))
class MLP_np:
def __init__(self,sizes):
"""
:param sizes: [784, 30, 10]
"""
self.sizes=sizes
self.num_layers = len(sizes) - 1
# sizes: [784, 30, 10]
# w: [ch_out, ch_in]
# b: [ch_out]
self.weights = [np.random.randn(ch2, ch1) for ch1, ch2 in zip(sizes[:-1], sizes[1:])] # [784, 30], [30, 10]
self.baises=[np.random.randn(ch, 1) for ch in sizes[1:]]
def forward(self, x):
"""
:para x: [784, 1]
:return:[10, 1]
"""
for b, w in zip(self.biases, self.weights):
# [30, 784]@[784, 1]=>[30, 1]+[30, 1]=>[30, 1] z=w@x
z = np.dot(w, x)+ b
x = sigmoid(z)
return x
def backprop(self, x, y):
"""
:param x: [784, 1]
:param y: [10, 1], one_hot encoding
:return:
"""
nabla_w = [np.zeros(w.shape) for w in self.weights]
nabla_b = [np.zeros(b.shape) for w in self.biases]
# 1. forward
# savae activations for every layer
activations = [x]
# savae z for every layer
zs = []
activation = x
for b, w in zip(self.biases, self.weights):
z = np.dot(w, x) + b
activation = sigmoid(z)
zs.append(z)
activations.append(activation[-1] - y, 2)
loss = np.power(activations)
# 2. backward
# 2.1 compute gradient on output layer
delta = activations[-1] * (1-activations[-1]) * (activations[-1] - y)
# [10,1] with [10, 1] => [10, 1]
nabla_b[-1] = delta
# [10 ,1]@[1, 30] => [10, 30]
# activations: [30, 1]
nabla_w[-1] = np.dot(delta, activations[-2].T)
# 2.2 compute hidden gradient
for l in range(2, self.num_layers+1):
l = -l
z = zs[l]
a = activations[l]
# [10, 30]T @ [10, 1] => [30, 1]
delta = np.dot(self.weights[l+1].T, delta) * a * (1-a)
nabla_b[l] = delta
# [30, 1] @ [784, 1]T => [30, 784]
nabla_w[l] = np.dot(delta, activations[l-1].T)
return nabla_w, nabla_b, loss
def train(self, training_data, epochs, batchsz, lr, test_data):
"""
:param training_data: list of (x, y)
:param epochs: 1000
:param batchsz: 10
:param lr: 0.01
:param test_data: list of (x, y)
:return:
"""
if test_data:
n_test = len(test_data)
n = len(training_data)
for j in range(epochs):
random.shuffle(training_data)
mini_batches = [
training_data[k:k+batchsz]
for k in range(0, n, batchsz)]
# for every batch in current data
for mini_batch in mini_batches:
self.update_mini_batch(mini_batch, lr)
if test_data:
print("Epoch {0}: {1} / {2}".format(
j, self.evaluate(test_data), n_test) )
else:
print("Epoch {0} compute".format(j) )
def update_mini_batch(self, batch, lr):
"""
:param batch: list of (x, y)
:param lr: 0.01
"""
nabla_w = [np.zeros(w.shape) for w in self.weights]
nabla_b = [np.zeros(b.shape) for w in self.biases]
loss = 0
# for every sample in current batch
for x, y in batch:
# list of evert w/b gradient
# [w1, w2, w3]
nabla_w, nabla_b, loss = self.backprop(x, y)
nabla_w = [accu_cur for accu, cur in zip(nabla_w, nabla_w)]
nabla_b = [accu_cur for accu, cur in zip(nabla_b, nabla_b)]
loss += loss
nabla_w = [w/len(batch) for w in nabla_w]
nabla_b = [b/len(batch) for b in nabla_b]
# w = w - lr * nabla_w
self.weights = [w - lr * nabla for w, nabla in zip(self.weights, nabla_w)]
self.biases = [w - lr * nabla for b, nabla in zip(self.biases, nabla_b)]
return loss
def evaluate(self, test_data):
"""
:param test_data: list of (x, y)
:return:
"""
result = [(np.argamx(self.forward(x)), y) for x, y in test_data]
correct = sum(int(int(pred==y)) for pred, y in result
retrurn correct
def main():
import mnist_loader
training_data, validation, test_data = mnist_loader.load_data_wrapper()
print(len(training_data), training_data[0][0].shape, training_data[0][1].shape)
print(len(test_data), test_data[0][0].shape, test_data[0][1].shape)
net = MLP_np([784, 30, 10])
net.train(training_data, 1000, 10, 0.1, test_data=test_data)