神经网络实现数字识别(机器学习)

我们有很多0到9的图片集,我们要训练一个网络来自动识别数字,我们有20*20的图像5000个。

把图片展平,这样每个记录就有400个特征,最后一列是标签值,1-9表示数字1-9;10表示数字0。数据集:ex_2/ex3data1.mat · Orange_Xiao/Machine_Learning - 码云 - 开源中国 (gitee.com)

Onehot-编码介绍

首先我们需要将y设置为One-hot编码。

下面是数据y。我们可以看出,第一行的是10,也对应着0。我们需要把第一行转化为[1,0,0,...0]。

即原来5000*1的矩阵转化为5000*10的矩阵。

array([

            [10]

            [9]

            [8]

                ])

我们需要将上面转化为

y=\begin{bmatrix} 1\\ 0 \\ \vdots \\ 0 \end{bmatrix}

每个样本中的单个特征只有1位处于状态1,其他都处于0。上面那个就代表数字1。1在哪个位置就代表哪个标签。

导入数据

import pandas as pd
import numpy as np
import scipy.io as sio
import matplotlib
from skimage import transform
from PIL import Image
from scipy.optimize import minimize

matplotlib.use('tkAgg')
import matplotlib.pyplot as plt

file_path = "D:\\JD\\Documents\\大学等等等\\自学部分\\机器学习自学画图\\手写数字识别\\ex3data1.mat"
data = sio.loadmat(file_path)
row_X = data['X']
row_y = data['y']
print("-------------------------------------------------")
print(row_X.shape, row_y.shape)

-------------------------------------------------
(5000, 400) (5000, 1)

下面说如何讲y转化乘OneHot编码:

我们可以导入一个库。

from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(sparse_output=False)#不使用稀疏形式
y_onehot  = encoder.fit_transform(row_y)
print(y_onehot.shape)
print(y_onehot[0])

(5000, 10)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 

网络结构

再来一个更加直观的图:

输入:

\vec{x}=\begin{pmatrix} x_{1,1} &\cdots & x_{1,n}\\ \vdots& \vdots& \vdots\\ x_{m,1}& \cdots &x_{m,n} \end{pmatrix}

修改:

a_{1}=\begin{pmatrix} x_{1,0}&x_{1,1} &\cdots & x_{1,n}\\ \vdots&\vdots& \vdots& \vdots\\ x_{m,0}&x_{m,1}& \cdots &x_{m,n} \end{pmatrix}

\theta_{1}=\begin{pmatrix} \theta^{[1]}_{1,0}& \cdots& \theta^{[1]}_{1,n}\\ \vdots& \vdots& \vdots\\ \theta^{[1]}_{layer1,0}& \cdots& \theta^{[1]}_{layer1,n}\end{pmatrix}

z_{2} =a_{1}\cdot \theta_{1}^{T}=\begin{pmatrix} \sum_{i=0}^{n}x_{1,i}\theta_{1,i}^{[1]} &\cdots & \sum_{i=0}^{n}x_{1,i}\theta_{layer1,i}^{[]1]}\\ \vdots& \cdots& \vdots\\ \sum_{i=0}^{n}x_{m,i}\theta_{1,i}^{[1]} & \cdots&\sum_{i=0}^{n}x_{m,i}\theta_{layer1,i}^{[]1]} \end{pmatrix}

代价函数

注意基本上机器学习的代价函数都是要表示出:“预测值与真实值”之间的差距。

于是我们有:

J(\theta)=\frac{1}{m}\sum_{i=1}^{m}\sum_{k=1}^{K}\left [ -y_{k}^{(i)}log(h_{\theta}(x^{(i)})_{k})-(1-y_{k}^{(i)})log(1-(h_{\theta}(x^{(i)}))_{k}) \right ]+\frac{\lambda}{m}\left [ \sum_{j=1}^{25}\sum_{k=1}^{400}(\Theta_{j,k}^{(1)})^{2} + \sum_{j=1}^{10}\sum_{k=1}^{25}(\Theta_{j,k}^{(2)})^{2}\right ]

def sigmoid(z):
    return 1 / (1 + np.exp(-z))


def forward_propagate(X, theta1, theta2):
    m = X.shape[0]
    a1 = np.insert(X, 0, values=np.ones(m), axis=1)  # 多加一列,用于与theta中的常数相乘
    z2 = a1 * theta1.T
    a2 = np.insert(z2, 0, values=np.ones(m), axis=1)
    z3 = a2 * theta2.T
    h = sigmoid(z3)
    return a1, z2, a2, z3, h


def cost(params, input_size, hidden_size, num_labels, X, y, lamda):
    m = X.shape[0]
    X = np.matrix(X)
    y = np.matrix(y)
    theta1 = np.matrix(np.reshape(params[:hidden_size * (input_size + 1)], (hidden_size, (input_size + 1))))
    theta2 = np.matrix(np.reshape(params[hidden_size * (input_size + 1):], (num_labels, (hidden_size + 1))))
    a1, z2, a2, z3, h = forward_propagate(X, theta1, theta2)
    J = 0
    for i in range(m):
        first_item = np.multiply(-y[i, :], np.log(h[i, :]))
        second_item = np.multiply((1 - y[i,:]), np.log(1 - h[i,:]))
        J += np.sum(first_item - second_item)
    J = J / m

    J += (float(lamda) / (2 * m)) * (np.sum(np.power(theta1[:, 1:], 2)) + np.sum(np.power(theta2[:, 1:], 2)))
    return J


input_size = 400
hidden_size = 25
num_labels = 10
lamda = 1
params = (np.random.random(size=hidden_size * (input_size + 1) + num_labels * (hidden_size + 1)) - 0.5) * 0.25
m = X.shape[0]
X = np.matrix(X)
y = np.matrix(y)

theta1 = np.matrix(np.reshape(params[:hidden_size * (input_size + 1)], (hidden_size, (input_size + 1))))
theta2 = np.matrix(np.reshape(params[hidden_size * (input_size + 1):], (num_labels, (hidden_size + 1))))

print(theta1.shape, theta1.shape)

print(cost(params, input_size, hidden_size, num_labels, X, y_onehot, lamda))

计算梯度

我们现在规定

a_{1}^{(t)}表示第t条数据的a1。

J_{1}=\sum_{k=1}^{K}-y^{(t)}_{k}log(h^{(t)}_{k})-(1-y^{(t)}_{k})log(1-h^{(t)}_{k}))

假如损失函数对z_{3}^{(t)}的梯度叫d_{3}^{(t)}

\frac{\partial}{\partial x}\frac{1}{1+e^{-x}}=\frac{1}{1+e^{-x}}\frac{e^{-x}}{1+e^{-x}}

\frac{\partial log(h_{k}^{(t)})}{\partial z_{3,k}^{t}}=\frac{1}{h_{k}^{(t)}}\frac{\mathrm{d} h_{k}^{(t)}}{\mathrm{d} z_{3,k}^{t}}=\frac{h_{k}^{(t)}(1-h_{k}^{(t)})}{h_{k}^{(t)}}=1-h_{k}^{(t)}

\frac{\partial log(1-h_{k}^{(t)})}{\partial z_{3,k}^{t}}=-h_{k}^{(t)}

\frac{\partial J_{1}}{\partial z_{3}^{(t)}}=h_{k}^{(t)}-y_{k}^{(t)}

\frac{\partial J_{1}}{\partial a_{2}^{(t)}}=\frac{\partial J_{1}}{\partial z_{3}^{(t)}}\theta_{2}

\frac{\partial J_{1}}{\partial z_{2,k}^{(t)}}=\frac{\partial J_{1}}{\partial a_{2,k}^{(t)}}\frac{\partial a_{2,k}^{(t)}}{\partial z_{2,k}^{(t)}}=

特别注意:当有y=x\theta^{T},\theta=\bigl(\begin{smallmatrix} a &b \\ c& d \end{smallmatrix}\bigr)

(y_{1},y_{2})=(x_{1},x_{2})\begin{pmatrix} a & c\\ b& d \end{pmatrix}

y_{1}=ax_{1}+bx_{2}\\ \\ y_{2}=cx_{1}+dx_{2}

\frac{\partial J}{\partial x}=\frac{\partial J}{\partial y}\theta

\frac{\partial J}{\partial \theta}=(\frac{\partial J}{\partial y})^{T}x

def sigmoid_gradient(z):
    return np.multiply(sigmoid(z), (1 - sigmoid(z)))


def backprop(params, input_size, hidden_size, num_labels, X, y, lamda):
    m = X.shape[0]
    X = np.matrix(X)
    y = np.matrix(y)
    theta1 = np.matrix(np.reshape(params[:hidden_size * (input_size + 1)], (hidden_size, (input_size + 1))))
    theta2 = np.matrix(np.reshape(params[hidden_size * (input_size + 1):], (num_labels, (hidden_size + 1))))
    a1, z2, a2, z3, h = forward_propagate(X, theta1, theta2)
    J = 0
    delta1 = np.zeros(theta1.shape)
    delta2 = np.zeros(theta2.shape)
    for i in range(m):
        first_term = np.multiply(-y[i, :], np.log(h[i, :] ))
        second_term = np.multiply((1 - y[i, :]), np.log(1 - h[i, :] ))
        J += np.sum(first_term - second_term)
    J = J / m

    for t in range(m):
        a1t = a1[t, :]
        z2t = z2[t, :]
        a2t = a2[t, :]
        ht = h[t, :]
        yt = y[t, :]
        d3t = ht - yt
        z2t = np.insert(z2t, 0, values=np.ones(1))
        d2t = np.multiply((theta2.T * d3t.T).T, sigmoid_gradient(z2t))
        delta1 = delta1 + (d2t[:, 1:]).T * a1t
        delta2 = delta2 + d3t.T * a2t

    delta1[:, 1:] = delta1[:, 1:] + (theta1[:, 1:] * lamda) / m
    delta2[:, 1:] = delta2[:, 1:] + (theta2[:, 1:] * lamda) / m
    grad = np.concatenate((np.ravel(delta1), np.ravel(delta2)))

    return J, grad


J, grad = backprop(params, input_size, hidden_size, num_labels, X, y_onehot, lamda)

print("+++++++++++++++++++++++++++++++")
print(J, grad.shape)

fmin = minimize(fun=backprop, x0=params, args=(input_size, hidden_size, num_labels, X, y_onehot, lamda),
                method='TNC', jac=True, options={'maxiter': 250})
print((fmin))

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

背水

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值