MNIST手写字体识别(算法基础)

快教程

10分钟入门神经网络 PyTorch 手写数字识别

image-20240624194626103

慢教程

【深度学习Pytorch入门】

简单回归问题-1

梯度下降算法

梯度下降算法

l o s s = x 2 ∗ s i n ( x ) loss = x^2 * sin(x) loss=x2sin(x)

求导得:

f ‘ ( x ) = 2 x s i n x + x 2 c o s x f^`(x)=2xsinx + x^2cosx f(x)=2xsinx+x2cosx

迭代式:
x ‘ = x − δ x x^`=x-\delta x x=xδx

δ x \delta x δx 前乘上学习速度 l r lr lr , 使得梯度慢慢往下降无限趋近合适解,在最优解附近波动 ,得到一个近似解

求解器

  • sgd
  • rmsprop
  • adam

求解一个简单的二元一次方程

噪声

实际数据是含有高斯噪声的,我们拿来做观测值,通过观察数据分布为线型分布时,不断优化loss,即求loss极小值

  • y = w ∗ x + b + ϵ y = w * x + b + \epsilon y=wx+b+ϵ

  • ϵ ∽ N ( 0.01 , 1 ) \epsilon \backsim N(0.01,1) ϵN(0.01,1)

求解loss极小值,求得 y y y 近似于 W X + b WX + b WX+b 的取值:
l o s s = ( W X + b − y ) 2 loss = (WX + b - y)^2 loss=(WX+by)2
最后
l o s s = ∑ i ( w ∗ x i + b − y i ) 2 loss = \sum_i(w*x_i+b-y_i)^2 loss=i(wxi+byi)2
从而
w ‘ ∗ x + b ‘ → y ‘ w^` * x + b^` \rightarrow y^` wx+by

简单回归问题-2

凸优化(感兴趣可以查阅资料)

image-20240624160924656

  • linear Regression

取值范围是连续的

w x + b w 1 + b . . . . . . . w n + b wx+b\\ w_1+b\\ .......\\ w_n+b wx+bw1+b.......wn+b

用以上实际数据(8)预测 W X + B WX + B WX+B

# 梯度下降的应用
def compute_error_for_line_given_points(b,w,points):
    totalError = 0
    for i in range(0,len(points)):
        x = points[i,0]
        y = points[i,1]
        #  totalError += (y - (w * x + b ) ) ** 2
        b_gradient += -(2/N) * (y - ((w_current * x) + b_current))
        w_gradient += -(2/N) * x * (y - ((w_current * x) + b_current))

    new_b = b_current - (learningRate * b_gradient)
    new_w = w_current - (learningRate * w_gradient)

    return [new_b, new_w]
    #return totalError / float(len(points))


def gradient_descent_runner(points, starting_b , starting_m, learning_rate, num_iterations):
    b = starting_b
    m = starting_m

    for i in range(num_iterations):
        b, m = step_gradient(b,m,np.array(points),learning_rate)

    return [b,m]
  • Logistic Regression

值域压缩到 [0-1] 的范围

  • Classification

在上一种regression基础上,每个点的概率加起来为1

简单回归实战案例

import numpy as np

# y = wx + b
def compute_error_for_line_given_points(b, w, points):
    totalError = 0
    for i in range(0, len(points)):
        x = points[i, 0]
        y = points[i, 1]
        totalError += (y - (w * x + b)) ** 2
    return totalError / float(len(points))

def step_gradient(b_current, w_current, points, learningRate):
    b_gradient = 0
    w_gradient = 0
    N = float(len(points))
    for i in range(0, len(points)):
        x = points[i, 0]
        y = points[i, 1]
        b_gradient += -(2/N) * (y - ((w_current * x) + b_current))
        w_gradient += -(2/N) * x * (y - ((w_current * x) + b_current))
    new_b = b_current - (learningRate * b_gradient)
    new_m = w_current - (learningRate * w_gradient)
    return [new_b, new_m]

def gradient_descent_runner(points, starting_b, starting_m, learning_rate, num_iterations):
    b = starting_b
    m = starting_m
    for i in range(num_iterations):
        b, m = step_gradient(b, m, np.array(points), learning_rate)
    return [b, m]

def run():
    points = np.genfromtxt("data.csv", delimiter=",")
    learning_rate = 0.0001
    initial_b = 0 # initial y-intercept guess
    initial_m = 0 # initial slope guess
    num_iterations = 1000
    print("Starting gradient descent at b = {0}, m = {1}, error = {2}"
          .format(initial_b, initial_m,
                  compute_error_for_line_given_points(initial_b, initial_m, points))
          )
    print("Running...")
    [b, m] = gradient_descent_runner(points, initial_b, initial_m, learning_rate, num_iterations)
    print("After {0} iterations b = {1}, m = {2}, error = {3}".
          format(num_iterations, b, m,
                 compute_error_for_line_given_points(b, m, points))
          )

if __name__ == '__main__':
    run()
# 跑完结果
Starting gradient descent at b = 0, m = 0, error = 5565.107834483211
Running...
After 1000 iterations b = 0.08893651993741346, m = 1.4777440851894448, error = 112.61481011613473

分类问题引入-1

MNIST数据集

  • 每个数字有7000张图像
  • 训练数据和测试数据划分为:60k 和 10k
H3:[1,d3]

Y:[0/1/.../9]

(1) Nutshell

在最简单的二元一次线性方程基础上进行三次线性模型嵌套,使线性输出更稳定,每一次嵌套后的结果作为后一个的输入
p r e d = W 3 ∗ { W 2 [ W 1 X + b 1 ] + b 2 } + b 3 pred = W_3 *\{W_2[W_1X+b_1]+b_2\}+b_3\nonumber pred=W3{W2[W1X+b1]+b2}+b3

(2) Non-linear Factor

  • segmoid

  • ReLU

    • 梯度离散

      三层嵌套整流函数

      H 1 = r e l u ( X W 1 + b 1 ) H1=relu(XW1 + b1) H1=relu(XW1+b1)

      H 2 = r e l u ( H 1 W 2 + b 2 ) H2=relu(H1W2 + b2) H2=relu(H1W2+b2)

      H 3 = r e l u ( H 2 W 3 + b 3 ) H3=relu(H2W3 + b3) H3=relu(H2W3+b3)

      增加了非线性变化的容错

(3) Gradient Descent

o b j e c t i v e = ∑ ( r e d − Y ) 2 objective = \sum(red-Y)^2 objective=(redY)2

  • [ W 1 , W 2 , W 3 ] [W1,W2,W3] [W1,W2,W3]
  • [ b 1 , b 2 , b 3 ] [b1,b2,b3] [b1,b2,b3]

说人话就是让模型愈来愈贴近真实的变化(从正常的字体,到倾斜,模糊,笔画奇特等字体),以便更好的预测

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值