人工神经网络

原创 2017年05月24日 23:45:16

一、神经网络的模型:


图1 两层全连接神经网络模型

    这个是一个带有两个全连接层的神经网络,神经网络,一般不把输入层算在层数之中。

1、神经元:


图2 神经元的数学模型

    从单个神经元来看,每个神经元可以看做是一个感知机,可以用来做决策,从图中可以看出,根据输入的线性组合,经过函数f来预测,比如Sigmoid函数,当输出值大于0.5的时候可以判定为正类,当输出值小于0.5的时候,可以判定为负类。这像什么?逻辑回归!举个神经元的例子:


图3 神经元示例

2、激活函数

    激活函数有很多种类,比如,线性函数、Sigmoid函数、tanh函数、ReLU函数等,本文只列举三个类型的激活函数:


图4 Sigmoid函数和tanh函数

    Sigmoid函数和tanh函数,有一个共同的问题就是,当数据很大的时候,就是图中两个所指的方向,在反向传播的过程中,会导致梯度消失。神经元的激活值在0或者1附近的时候,梯度几乎为零,那么在反向传播的时候梯度就会被“杀死”。其实很好理解,在两个所指方向上,函数越来越趋于水平,因此在求导的时候,所得梯度为0或者很小---近似于0,这样在反向传播的时候,梯度就为零了(或者近似于0),因此梯度就消失了,权重就无法更新,因此这两个函数在神经网络中不在常用。但是相比较而言tanh比Sigmoid函数要更受欢迎,因为tanh的数据输出是中心化的。


图5 ReLU函数

    ReLU的优缺点图中已经的说明,但是对于缺点,我的理解是:当有一个特别大的梯度经过的时候,权重会被更新为一个特别小的负数(比如负无穷),那么当再次经过该神经元的时候,该神经元的输入值为负数,那么ReLU输出值就是0了,但是依旧会有梯度来更新对应权重,但是更新的值不足以让该神经元的输入值为正(就好比一个负无穷加上一个整数N,它依旧是一个很小的数),那么这样就会导致该神经元的不可逆转的失活。

那么到此可以对神经网络有一个理解:可以认为神经网络定义了一个由一系列的函数组成的函数族,网络的权重就是函数中的参数,神经经网络可以近似任何函数。(这个是被证明了的)。

二、前向传播和反向传播:

    前向传播很简单,如下一个例子:


图6 神经网络示例

    蓝色数字代表对应权重和偏值的数值,我们的目标是让这神经网络输出值为0.5或者非常接近0.5.

    计算隐藏层神经元h1的输入,0.15 * 0.05 + 0.2 * 0.1 + 0.35 = 0.38,h1的输出:f(0.38) = 0.59,其中f为Sigmoid函数,同理可以计算出神经元h2的输入和输出。于是有下图:


图7 前向传播示例

    由于输出值为0.63,不是我们想要的0.5,最起码不太接近0.5,那么我们就要更新权重和偏值,训练神经网络的目的就是训练权重和偏值,他们使得网络输出我们想要的结果。那么如何来更新权重呢?采用反向传播的方法,反向传播其实很简单,说白一点就是一系列的链式求导,本文不准备讲理论推到,具体的理论推到请看:

http://blog.csdn.net/hearthougan/article/details/55812728

    我也不建议只看理论推到,看懂了,不如找一个小例子实际推到一番来的透彻,本文只简单讲一下如何的反向传播,比如我们现在得到了损失,也有了损失函数,我们要更新w5、w6、和b3,要更新他们,就要分别求出它们的梯度,现以求w5的梯度来说明,如下图:


图8 反向传播示例

    w5如果懂了,其他的也就是知道了。假设学习率为0.5,那么最后更新权重为:


三、在CIFAR10和MNIST数据集上的实验结果

    为了试验神经网络的效果我在,CIFAR10和MNIST上各测试了一下,MNIST的准确率为97.2%,而CIFAR10只有52%,但是相比较SVM和Softmax,神经网络的正确率还是提高了很多,之所以在CIFAR10上的正确率不高,个人认为它的图片背景更加复杂,图片为三通道的,而MNIST的图片背景就比较简单而且是单通道的。

本实验总共采用了75次寻参,其中隐藏层神经元的个数有[50, 75, 100]三类,每一类寻参(学习率和正则项)过程中,第一层权重W1的样子,和对应的损失以及在训练集合验证集上的正确率,现以gif的形式呈现:

1、现展示CIFAR10:


图9 50个神经元对应的W1和50个神经元寻参时对应的损失和正确率


图10 75个神经元对应的W1和75个神经元寻参时对应的损失和正确率


图11 100个神经元对应的W1和100个神经元寻参时对应的损失和正确率

    在CIFAR10数据集上整个寻参之后,得到的最优W1和对应的损失以及正确率如下:


图12 最优参数显示

    在寻到最优参数之后,在测试集上的正确率为52%


MNIST数据集显示:

    (Sorry)MNIST数据集的gif超过2M,无法上传,只贴一张最优参数的静态图图吧:


    在测试集上的正确率97.2%

全连接代码:

# -*- coding: utf-8 -*-
"""
Created on Fri May 19 23:23:42 2017

@author: Abner
"""

import numpy as np
import matplotlib.pyplot as plt


class TwoLayerNet(object):
    """
    一个两层的全连接神经网络,输入层的神经元的个数为D,隐藏层的神经元的个数为:H,
    输出层神经元的个数为C,利用Softmax损失函数和L2正则项来训练神经网络,第一个
    全连接层的激活函数为ReLU
    
    网络的结构为:
    输入层 - 全连接层(第一个隐藏层) - ReLU - 全连接层(输出层) - Softmax
    
    第二全连接层是输出层,输出的结果就是每个类别的得分。
    """
    def __init__(self, input_size, hidden_size, output_size, std = 1e-4):
        """
        初始化模型:
        权重被初始化很小的随机值,偏值被初始化为0,权重和偏值存放在self.params中,
        params是一个字典结构:
        
        W1:第一个全连接层的权重,大小为:(D, H)
        b1:第一层的偏值,大小为:(H,)
        W2:第二个全连接层的权重,大下为:(H, C)
        b2:第二层的偏值,大小为:(C,)
        
        input_size:输入层的维数,D
        hidden_size:隐藏层神经元的个数,H
        output_size:输出层类别数,C
        
        """
        self.params = {}
        self.params['W1'] = std * np.random.rand(input_size, hidden_size)
        self.params['b1'] = np.zeros(hidden_size)
        self.params['W2'] = std * np.random.rand(hidden_size, output_size)
        self.params['b2'] = np.zeros(output_size)
    
    
    def loss(self, X, y = None, reg = 0.0):
        """
        计算两层全连接神经网络的损失和梯度。
        
        输入:
        X:数据的输入大小为(N, D),每个X[i]是一个样本
        y: 训练样本标签,y[i]是X[i]对应的标签,参数y是可选择的,如果没有对y传参,
        那么该损失函数只返回得分;如果传参,那么loss就返回损失和梯度。
        reg:正则系数
        
        返回:
        如果y为空,返回一个大小为(N, C)的scores矩阵,其中scores[i, c]是样本X[i]是类别
        c时的得分。
        
        如果y不为空,那么就返回一个元组:
        -loss:每批训练样本的损失,数据损失和正则损失
        -grads:一个字典,存放的是权重(或者是偏值)和其对应的梯度
        
        """
    
        W1, b1 = self.params['W1'], self.params['b1']
        W2, b2 = self.params['W2'], self.params['b2']
        N, D = X.shape
        
        #计算前向传播
        scores = None
        
        f = lambda x : np.maximum(0, x)
        h1 = f(np.dot(X, W1) + b1)
        h2 = np.dot(h1, W2) + b2
        
        scores = h2
        
        if y is None:
            return scores
        
        loss = None
        shift_scores = scores - np.max(scores, axis = 1).reshape(-1, 1)
        softmax_output = np.exp(shift_scores)/np.sum(np.exp(shift_scores), axis = 1).reshape(-1, 1)
        loss = -np.sum(np.log(softmax_output[range(N), list(y)]))
        loss /= N
        loss += reg * 0.5 * (np.sum(W1 * W1) + np.sum(W2 * W2))
        
        #反向传播,计算梯度
        grads = {}
        """
        计算反向传播,对权重和偏值进行求导,然后存放在一个字典中 ,比如,grads['W1']
        应该存放W1的梯度,grads的梯度应该和W1的大小是相同的,grads['b1']与b1的大小
        是相同的。
        """
        #得分对应损失函数的梯度
        descores = softmax_output.copy()#N*C
        descores[range(N), list(y)] -= 1#N*C
        grads['W2'] = 1.0/N * h1.T.dot(descores) + reg * W2#H*C
        grads['b2'] = 1.0/N * np.sum(descores, axis = 0)#C*1
        
        dh1 = descores.dot(W2.T)#N*H
        dh1_ReLU = (X.dot(W1) + b1 > 0) * dh1#N*H
        grads['W1'] = 1.0/N * X.T.dot(dh1_ReLU) + reg * W1#D*H
        grads['b1'] = 1.0/N * np.sum(dh1_ReLU, axis = 0)#H*!
        
        return loss, grads
    
    def train(self, X, y, X_val, y_val, learning_rate = 1e-3,
              learning_rate_decay = 0.95, reg = 1e-5, num_iters = 100,
              batch_size = 200, verbose = False):
        """
        训练神经网络,利用随机梯度
        输入:
        X:(N,D)
        y:(N,)
        X_val:给定的验证数据集,(N_val, D)
        y_val:给定验证集的标签,(N_val,)
        learning_rate:最优化时候的步长
        learning_rate_decay:用于每个epoch学习率减少的标量值
        reg:正则强度
        num_iters:最优化时迭代的次数
        batch_size:每批样本的个数
        verbose:布尔值,如果为真,就打印最优化的过程
        """
        
        num_train = X.shape[0]
        iterations_per_epoch = max(num_train/batch_size, 1)
        
        #使用随机梯度(SGD)来最优化self.model中的参数
        loss_history = []
        train_acc_history = []
        val_acc_history = []
        
        for it in range(num_iters):
            X_batch = None
            y_batch = None
        
            """
            TODO:
            创建一个训练数据集和对应标签的随机minibatch,并把他们分别储存在X_batch和y_batch
            """
        
            idx = np.random.choice(num_train, batch_size, replace = True)
            X_batch = X[idx]
            y_batch = y[idx]
        
            #利用当前的minibatch来计算损失和梯度
            loss, grads = self.loss(X_batch, y = y_batch, reg = reg)
            loss_history.append(loss)
            
            """
            TODO:
                利用self.params中的梯度来更新网络中的参数
            """
            self.params['W1'] = self.params['W1'] - learning_rate*grads['W1']
            self.params['W2'] = self.params['W2'] - learning_rate*grads['W2']
            self.params['b1'] = self.params['b1'] - learning_rate*grads['b1']
            self.params['b2'] = self.params['b2'] - learning_rate*grads['b2']
            
            if verbose and it % 100 == 0:
                print 'iteratrion %d / %d: loss %f' % (it, num_iters, loss)
            
            #每个epoch检查训练、验证集上的准确率,以及缩减学习率
            if it % iterations_per_epoch == 0:
                #检验正确率
                train_acc = (self.predict(X_batch) == y_batch).mean()
                val_acc = (self.predict(X_val) == y_val).mean()
                train_acc_history.append(train_acc)
                val_acc_history.append(val_acc)
                
                #缩减学习率
                learning_rate *= learning_rate_decay
        return {
            'loss_history':loss_history,
            'train_acc_history':train_acc_history,
            'val_acc_history':val_acc_history,
        }
    def predict(self, X):
        """
        利用已经训练好权重的两层全连接神经网络训来预测数据的标签,对于每一数据我们预测
        C个类别的得分,然后把得分最高的标签定位数据的标签。
        
        输入:
        - X输入数据,N*D
        返回:
        - y_pred:给定数据集对应的预测标签
        """
        
        y_pred = None
        
        f = lambda x: np.maximum(0, x)
        h1 = f(np.dot(X, self.params['W1']) + self.params['b1'])
        h2 = np.dot(h1, self.params['W2']) + self.params['b2']
        scores = h2
        
        y_pred = np.argmax(scores, axis = 1)
        
        return y_pred
        
        
导入数据CIFAR10数据集:

# -*- coding: utf-8 -*-
"""
Created on Sun May  7 19:32:30 2017

@author: admin
"""
import numpy as np
import pickle
import os

def Load_CIFAR_Batch(filename):
    with open(filename, 'rb') as f:
        datadict = pickle.load(f)
        X = datadict['data']
        Y = datadict['labels']
        X = X.reshape(10000, 3, 32, 32).transpose(0, 2, 3, 1).astype('float')#1000*32*32*3
        Y = np.array(Y)
        return X, Y

def Load_CIFAR10(Root):
    xs = []
    ys = []
    
    for b in range(1,  6):
        f = os.path.join(Root, 'data_batch_%d'%(b, ))
        X, Y = Load_CIFAR_Batch(f)
        xs.append(X)
        ys.append(Y)
    Xtr = np.concatenate(xs)
    Ytr = np.concatenate(ys)
    
    del X, Y
    Xte, Yte = Load_CIFAR_Batch(os.path.join(Root, 'test_batch'))
    return Xtr, Ytr, Xte, Yte
导入MNIST数据集:

# -*- coding: utf-8 -*-
"""
Created on Sun May 21 16:58:49 2017

@author: Abner
"""

'''
    使用python解析二进制文件
'''
import numpy as np
import cPickle as pickle
import struct



def Load_MNIST(Data_dir,label_dir):
    """
    读取数据
    """
    binfile = open(Data_dir, 'rb') # 读取二进制文件
    buffers = binfile.read()

    head = struct.unpack_from('>IIII', buffers, 0) # 取前4个整数,返回一个元组

    offset = struct.calcsize('>IIII')  # 定位到data开始的位置
    imgNum = head[1]
    width = head[2]
    height = head[3]

    bits = imgNum * width * height  # data一共有60000*28*28个像素值
    bitsString = '>' + str(bits) + 'B'  # fmt格式:'>47040000B'

    imgs = struct.unpack_from(bitsString, buffers, offset) # 取data数据,返回一个元组

    binfile.close()
    imgs = np.reshape(imgs, [imgNum, width * height]) # reshape为[60000,784]型数组
    """"
    读取Label
    """
    binfile = open(label_dir, 'rb') # 读二进制文件
    buffers = binfile.read()

    head = struct.unpack_from('>II', buffers, 0) # 取label文件前2个整形数

    labelNum = head[1]
    offset = struct.calcsize('>II')  # 定位到label数据开始的位置

    numString = '>' + str(labelNum) + "B" # fmt格式:'>60000B'
    labels = struct.unpack_from(numString, buffers, offset) # 取label数据

    binfile.close()
    labels = np.reshape(labels, [labelNum]) # 转型为列表(一维数组)
    
    return imgs, labels
    
def Load_MNIST_Data():
    train_path_lable_dir = 'MNIST\\train-labels.idx1-ubyte'
    train_path_Data_dir = 'MNIST\\train-images.idx3-ubyte'
    test_path_Data_dir = 'MNIST\\t10k-images.idx3-ubyte'
    test_path_lable_dir = 'MNIST\\t10k-labels.idx1-ubyte'
    
    Xtr, ytr = Load_MNIST(train_path_Data_dir ,train_path_lable_dir)
    Xte, yte = Load_MNIST(test_path_Data_dir, test_path_lable_dir)
    
    return Xtr, ytr, Xte, yte

"""
if __name__ == "__main__":
    Xtr, ytr, Xte, yte = Load_MNIST_Data()

    print("Xtr: ", Xtr.shape)
    print("ytr: ", ytr.shape)

    print('----------我是分割线-----------')

    print("Xte: ", Xte.shape)
    print("yte: ", yte.shape)
"""

权重可视化:

# -*- coding: utf-8 -*-
"""
Created on Sun May 21 19:08:24 2017

@author: Abner
"""

from math import sqrt, ceil
import numpy as np

def visualize_grid(Xs, ubound=255.0, padding=1):
  """
  Reshape a 4D tensor of image data to a grid for easy visualization.

  Inputs:
  - Xs: Data of shape (N, H, W, C)
  - ubound: Output grid will have values scaled to the range [0, ubound]
  - padding: The number of blank pixels between elements of the grid
  """
  (N, H, W, C) = Xs.shape
  grid_size = int(ceil(sqrt(N)))
  grid_height = H * grid_size + padding * (grid_size - 1)
  grid_width = W * grid_size + padding * (grid_size - 1)
  grid = np.zeros((grid_height, grid_width, C))
  next_idx = 0
  y0, y1 = 0, H
  for y in xrange(grid_size):
    x0, x1 = 0, W
    for x in xrange(grid_size):
      if next_idx < N:
        img = Xs[next_idx]
        low, high = np.min(img), np.max(img)
        grid[y0:y1, x0:x1] = ubound * (img - low) / (high - low)
        # grid[y0:y1, x0:x1] = Xs[next_idx]
        next_idx += 1
      x0 += W + padding
      x1 += W + padding
    y0 += H + padding
    y1 += H + padding
  return grid

def visualize_grid1(Xs, ubound=255.0, padding=1):
  """
  Reshape a 4D tensor of image data to a grid for easy visualization.

  Inputs:
  - Xs: Data of shape (N, H, W)
  - ubound: Output grid will have values scaled to the range [0, ubound]
  - padding: The number of blank pixels between elements of the grid
  """
  (N, H, W) = Xs.shape
  grid_size = int(ceil(sqrt(N)))
  grid_height = H * grid_size + padding * (grid_size - 1)
  grid_width = W * grid_size + padding * (grid_size - 1)
  grid = np.zeros((grid_height, grid_width))
  next_idx = 0
  y0, y1 = 0, H
  for y in xrange(grid_size):
    x0, x1 = 0, W
    for x in xrange(grid_size):
      if next_idx < N:
        img = Xs[next_idx]
        low, high = np.min(img), np.max(img)
        grid[y0:y1, x0:x1] = ubound * (img - low) / (high - low)
        # grid[y0:y1, x0:x1] = Xs[next_idx]
        next_idx += 1
      x0 += W + padding
      x1 += W + padding
    y0 += H + padding
    y1 += H + padding
  return grid

def vis_grid(Xs):
  """ visualize a grid of images """
  (N, H, W, C) = Xs.shape
  A = int(ceil(sqrt(N)))
  G = np.ones((A*H+A, A*W+A, C), Xs.dtype)
  G *= np.min(Xs)
  n = 0
  for y in range(A):
    for x in range(A):
      if n < N:
        G[y*H+y:(y+1)*H+y, x*W+x:(x+1)*W+x, :] = Xs[n,:,:,:]
        n += 1
  # normalize to [0,1]
  maxg = G.max()
  ming = G.min()
  G = (G - ming)/(maxg-ming)
  return G
  
def vis_nn(rows):
  """ visualize array of arrays of images """
  N = len(rows)
  D = len(rows[0])
  H,W,C = rows[0][0].shape
  Xs = rows[0][0]
  G = np.ones((N*H+N, D*W+D, C), Xs.dtype)
  for y in range(N):
    for x in range(D):
      G[y*H+y:(y+1)*H+y, x*W+x:(x+1)*W+x, :] = rows[y][x]
  # normalize to [0,1]
  maxg = G.max()
  ming = G.min()
  G = (G - ming)/(maxg-ming)
  return G

 
全连接神经网络分类:

# -*- coding: utf-8 -*-
"""
Created on Sat May 20 10:55:11 2017

@author: Abner
"""

import numpy as np
import matplotlib.pyplot as plt
import matplotlib

matplotlib.use('Agg') 

from vis_uitls import visualize_grid
from vis_uitls import visualize_grid1
from LoadData import Load_CIFAR10
from Load_MNIST import Load_MNIST_Data
from Fullc_NN import TwoLayerNet

#matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
#%load_ext autoreload
#%autoreload 2

#读取MNIST数据集
##############################################################################
def get_MNIST_data(num_training=59000, num_validation=1000, num_test=1000):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the two-layer neural net classifier. These are the same steps as
    we used for the SVM, but condensed to a single function.  
    """
    # Load the raw CIFAR-10 data
    X_train, y_train, X_test, y_test = Load_MNIST_Data()
    X_train = X_train.reshape(60000, 28, 28)
    X_test = X_test.reshape(10000, 28, 28)
    print "Before reshape Data:"
    print "X_train: ", X_train.shape
    print "y_train: ", y_train.shape
    print "X_test: ", X_test.shape
    print "y_test: ", y_test.shape
        
    # Subsample the data
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = range(num_training)
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]
    

    # Normalize the data: subtract the mean image
    mean_image = np.mean(X_train, axis=0)
    X_train = X_train - mean_image
    X_val = X_val - mean_image
    X_test = X_test - mean_image

    # Reshape data to rows
    X_train = X_train.reshape(num_training, -1)
    X_val = X_val.reshape(num_validation, -1)
    X_test = X_test.reshape(num_test, -1)


    return X_train, y_train, X_val, y_val, X_test, y_test
    
    ##########################################################################

'''
def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the two-layer neural net classifier. These are the same steps as
    we used for the SVM, but condensed to a single function.  
    """
    # Load the raw CIFAR-10 data
    cifar10_dir = 'cifar-10-batches-py'
    X_train, y_train, X_test, y_test = Load_CIFAR10(cifar10_dir)
    print "Before reshape Data:"
    print "X_train: ", X_train.shape
    print "y_train: ", y_train.shape
    print "X_test: ", X_test.shape
    print "y_test: ", y_test.shape
        
    # Subsample the data
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = range(num_training)
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]

    # Normalize the data: subtract the mean image
    mean_image = np.mean(X_train, axis=0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image

    # Reshape data to rows
    X_train = X_train.reshape(num_training, -1)
    X_val = X_val.reshape(num_validation, -1)
    X_test = X_test.reshape(num_test, -1)


    return X_train, y_train, X_val, y_val, X_test, y_test

'''
# Invoke the above function to get our data.
#X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()

#MNIST
X_train, y_train, X_val, y_val, X_test, y_test = get_MNIST_data()

print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape


'''
##############################################################################
"""
不经过寻参,隐藏层设置神经元个数为50个
"""

#输入层神经元的个数是数据的维数,隐藏层神经元的个数为50个,输出层神经元的个数为10个
#input_size = 32 * 32 * 3
input_size = 28*28
hidden_size = 50
num_classes = 10
net = TwoLayerNet(input_size, hidden_size, num_classes)

# Train the network
stats = net.train(X_train, y_train, X_val, y_val,
            num_iters=1000, batch_size=200,
            learning_rate=1e-4, learning_rate_decay=0.95,
            reg=0.5, verbose=True)

# Predict on the validation set
val_acc = (net.predict(X_val) == y_val).mean()
print 'Validation accuracy: ', val_acc

plt.subplot(2, 1, 1)
plt.plot(stats['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')

plt.subplot(2, 1, 2)
train_acc, = plt.plot(stats['train_acc_history'], label='train')
val_acc, = plt.plot(stats['val_acc_history'], label='val')
plt.legend([train_acc, val_acc], ['Training Accuracy', 'Validation Accuracy'], loc='lower right')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
#plt.show()
plt.savefig('E:\\MNIST\\loss.jpg')

plt.close()

#def show_net_weights(net):
#  W1 = net.params['W1']
#  W1 = W1.reshape(32, 32, 3, -1).transpose(3, 0, 1, 2)
#  plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))
#  plt.gca().axis('off')
##  plt.show()
#  plt.savefig('E:\\MNIST\\weight.jpg')

def show_net_weights(net):
  W1 = net.params['W1']
  W1 = W1.reshape(28, 28, -1).transpose(2, 0, 1)
  plt.imshow(visualize_grid1(W1, padding=1).astype('uint8'))
  plt.gca().axis('off')
#  plt.show()
  plt.savefig('E:\\MNIST\\weight.jpg')
show_net_weights(net)

##############################################################################
'''



# best_net = None # store the best model into this 

#################################################################################
# TODO: Tune hyperparameters using the validation set. Store your best trained  #
# model in best_net.                                                            #
#                                                                               #
# To help debug your network, it may help to use visualizations similar to the  #
# ones we used above; these visualizations will have significant qualitative    #
# differences from the ones we saw above for the poorly tuned network.          #
#                                                                               #
# Tweaking hyperparameters by hand can be fun, but you might find it useful to  #
# write code to sweep through possible combinations of hyperparameters          #
# automatically like we did on the previous exercises.                          #
#################################################################################

stats = {}
results = {}
best_val = -1
best_stats = None

#input_size = 32 * 32 * 3
input_size = 28*28
num_classes = 10

# hidden_sizes = (100 * np.random.rand(5)).round().astype(int)
# learning_rates = (5e-3 - 5e-5) * np.random.rand(5) + 5e-5
# regularization_strengths = np.random.rand(5)

# hidden_sizes = [50, 75, 100]
# learning_rates = [5e-4]
# regularization_strengths = [0.65, 0.75, 0.85]

# hidden_sizes = np.round(10 ** np.random.uniform(1.7,2.3,3)).astype(int)
#hidden_sizes = [100]

#CIFAR10
#def show_net_weights(net, k):
#  W1 = net.params['W1']
#  W1 = W1.reshape(32, 32, 3, -1).transpose(3, 0, 1, 2)
#  plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))
#  plt.gca().axis('off')
#  plt.savefig('E:\\NN\\%d times_W.jpg' % k)
#  plt.show()

def show_net_weights(net, k):
  W1 = net.params['W1']
  W1 = W1.reshape(28, 28, -1).transpose(2, 0, 1)
  plt.imshow(visualize_grid1(W1, padding=1).astype('uint8'))
  plt.gca().axis('off')
  plt.savefig('E:\\MNIST\\%d times_W1.jpg' % k)


hidden_sizes = [50, 75, 100]
learning_rates = 10 ** np.random.uniform(-3.5,-2.5,5)
regularization_strengths = 10 ** np.random.uniform(-5,1,5)

k = 0
for hidden_size in hidden_sizes:
    
    for learning_rate in learning_rates:

        for regularization_strength in regularization_strengths:
            
            # Print hyperparameters
            print 'Size = %d, Learning rate = %e, Reg. Strength = %e' % (
                hidden_size, learning_rate, regularization_strength)

            # Initialize net
            net = TwoLayerNet(input_size, hidden_size, num_classes)

            # Training
            stats[hidden_size, learning_rate, regularization_strength] = \
                net.train(X_train, y_train, X_val, y_val,
                          num_iters=2000, batch_size=500,
                          learning_rate=learning_rate, learning_rate_decay=0.95,
                          reg=regularization_strength, verbose=True)

            # Testing
            y_train_pred = net.predict(X_train)
            y_val_pred = net.predict(X_val)

            # Evaluation
            train_num_correct = np.sum(y_train_pred == y_train)
            training_accuracy = float(train_num_correct) / X_train.shape[0]

            val_num_correct = np.sum(y_val_pred == y_val)
            validation_accuracy = float(val_num_correct) / X_val.shape[0]

            results[hidden_size, learning_rate, regularization_strength] = training_accuracy, validation_accuracy

            if validation_accuracy > best_val:
                best_val = validation_accuracy
                best_net = net
                best_stats = stats[hidden_size, learning_rate, regularization_strength]
                
            current_stats = stats[hidden_size, learning_rate, regularization_strength]
                
            # Plot the loss function and train / validation accuracies
            plt.subplot(2, 1, 1)
            plt.plot(current_stats['loss_history'])
            plt.title('Loss history')
            plt.xlabel('Iteration')
            plt.ylabel('Loss')
            
            plt.subplot(2, 1, 2)
            train_acc, = plt.plot(current_stats['train_acc_history'], label='train')
            val_acc, = plt.plot(current_stats['val_acc_history'], label='val')
            plt.legend([train_acc, val_acc], ['Training Accuracy', 'Validation Accuracy'], loc = 'lower right')
            plt.title('Classification accuracy history')
            plt.xlabel('Epoch')#寻参的时候,每次训练时,验证集和训练集的正确率
            plt.ylabel('Clasification accuracy')
#            plt.show()
            k += 1
            plt.savefig('E:\\MNIST\\%d times_Loss.jpg' % (k))
            # visualize the weights of the current network
            plt.close()
            print "第%d次寻参的权重" % k
            show_net_weights(net, k)

pass

# Print out results.
for hid_size, lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(hid_size, lr, reg)]
    print 'size %d lr %e reg %e train accuracy: %f val accuracy: %f' % (
                hid_size, lr, reg, train_accuracy, val_accuracy)
    
print 'best validation accuracy achieved during cross-validation: %f' % best_val

plt.close()
# Plot the loss function and train / validation accuracies
plt.subplot(2, 1, 1)
plt.plot(best_stats['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')

plt.subplot(2, 1, 2)
train_acc, = plt.plot(best_stats['train_acc_history'], label='train')
val_acc, = plt.plot(best_stats['val_acc_history'], label='val')
plt.legend([train_acc, val_acc], ['Training Accuracy', 'Validation Accuracy'], loc = 'lower right')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
#plt.show()
plt.savefig('E:\\MNIST\\Best_Loss.jpg')
plt.close()

# visualize the weights of the current network
print "最优时候的权重:"
show_net_weights(best_net, 0)

pass
#################################################################################
#                               END OF YOUR CODE                                #
#################################################################################

#test_acc = (best_net.predict(X_test) == y_test).mean()
#print 'Test accuracy: ', test_acc


test_acc = (best_net.predict(X_test) == y_test).mean()
print 'Test accuracy: ', test_acc









版权声明:本文为博主原创文章,转载需注明出处。 举报

相关文章推荐

cs231n的第一次作业2层神经网络

一个小测试,测试写的函数对不对首先是初始化input_size = 4 hidden_size = 10 num_classes = 3 num_inputs = 5def init_toy_mode...

人工神经网络

人工神经网络是一种模仿动物神经网络行为特征,进行分布式并行信息处理的算法数学模型。这种网络依靠系统的复杂程度,通过调整内部大量节点之间的相互连接关系,从而达到处理信息的目的。

我是如何成为一名python大咖的?

人生苦短,都说必须python,那么我分享下我是如何从小白成为Python资深开发者的吧。2014年我大学刚毕业..

BP人工神经网络

今天来讲BP神经网络,神经网络在机器学习中应用比较广泛,比如函数逼近,模式识别,分类,数据压缩,数据 挖掘等领域。接下来介绍BP神经网络的原理及实现。   Contents  ...

人工神经网络

1 人工神经网络简介生物学动机:据估计人类的大脑是由大约101110^{11}次方个神经元相互连接组成的密集网络,平均每个神经元与其他10410^4个神经元相连。 神经元的活性通常被通向其他神经元的...

人工神经网络(ANN)

神经网络模型具有高度的并行性、高度的非线性全局作用、良好的容错性与联想记忆功能、十分强的自适应自学习功能等突出优点。 1. 人工神经网络简介 人工神经网络的研究是由模拟生物神经系统而激发的...

人工神经网络

人工神经网络 二分类问题,输出层使用softmax函数,所以输出层有两个神经元,output1表示样本属于类1的概率,output2表示样本属于类2的概率;输入的每个样本有两个特征,所以输入层有两个神...

人工神经网络

一、神经网络的模型: 图1 两层全连接神经网络模型     这个是一个带有两个全连接层的神经网络,神经网络,一般不把输入层算在层数之中。 1、神经元: 图2 神经元的数学模型     从单个神...

人工神经网络

0. 分类 神经网络最重要的用途是分类,为了让大家对分类有个直观的认识,咱们先看几个例子: 垃圾邮件识别:现在有一封电子邮件,把出现在里面的所有词汇提取出来,送进一个机器里,机器需要判断这封邮件是否...

人工神经元网络

一:人工神经网络的通用框架包括以下几方面的内容: 1、功能函数f(从输入WX到输出y的函数):          1)简单线性模型       2)线性阈值单元 ...

人工神经网络

人工神经网络(Artificial Neural Network,ANN)通过对大量历史数据的计算来建立分类和预测模型。神经网络的学习就是通过迭代算法对权值逐步修改优化的过程。学习的目标是通过修改权值...
返回顶部
收藏助手
不良信息举报
您举报文章:深度学习:神经网络中的前向传播和反向传播算法推导
举报原因:
原因补充:

(最多只允许输入30个字)