机器学习学习笔记4

本文探讨了L7CNN卷积神经网络的工作原理,包括一维和二维图像表示、卷积层和最大池化层的应用,以及与回传传播的关系。同时,介绍了马尔科夫决策过程在决策制定中的角色,涉及状态机、动作、奖励函数和策略价值计算。
摘要由CSDN通过智能技术生成

L7 CNN 卷积神经网络

Images

灰度图,每个像素点有一个值在0到p之间,这里 0 black ,1 white

可以用一维向量x或二维矩阵X来表示一张图并作为NN的输入

若是RGB图,采用三维张量 tensor

Convolutional Layer 卷积层

1D example

  •  A 1D image : [\tilde0,0,0,1,1,1,0,1,0,0,0,\tilde0]  \tilde 0为边界延拓数据
  • A filter : [-1,1,-1]\ [\omega_1,\omega_2,\omega_3]\ with\ bias\ b
  • After convolution* : [0,-1,0,-1,-2,1,-1,0,0] 

                dot product,conv(x^{(1)},x^{(2)})=\sum x_i^{(1)}x_i^{(2)}

  • After ReLU : [0,0,0,0,0,1,0,0]

what's happening in that image? Find a lonely pixel

2D example

  •  A 2D image :

                \begin{bmatrix} 1&0 &1 &0 &0 \\ 1& 0 &1 &0 &1 \\ 1& 1& 1& 0& 0\\ 1& 0 &1 &0 &1 \\ 1& 0& 1& 0&1 \end{bmatrix}

  • A filter : \begin{bmatrix} -1 & -1 &-1 \\ -1 & 1& -1\\ -1& -1 & -1 \end{bmatrix}
  • After convolution & ReLU: \begin{bmatrix} 0& 0 &0 &0 &0 \\ 0& 0& 0& 0 & 1\\ 0 &0 & 0 &0&0 \\ 0& 0 &0& 0& 0\\ 0&0&0 &0&0 \end{bmatrix} 

Max pooling layer 最大池化层

2D example

  • Output from the convolutional layer & ReLU:

        \begin{bmatrix} 0 & 0 &0 &0 &0 &0 \\ 0 & 0 &0 &0 &1 &0 \\ 0 & 0 &0 &0 &0 &0 \\ 0 & 1 &0 &0 &0 &0 \\ 0 & 0 &0 &0 &0 &0 \\ 0 & 0 &0 &0 &0 &0 \end{bmatrix}

  • Max pooling : returns max of its arguments             

    e.g. size 3x3("size 3")  e.g. stride 3

  • After max pooling:

        \begin{bmatrix} 0 &1 \\ 1& 0 \end{bmatrix}

  • Can use stride with filters too
  • No weight in max pooling

CNNs : typical architecture

input\to feature \ learning\to classification\\ x\to NN(x;W,W_0)

A familiar pattern

ith data pointprediction for  ith pointtraining loss over points 1 to n
Logistic regressionx^{(i)}LogiReg(x^{(i)};\theta,\theta_0)J_{Logi}(\theta,\theta_0)

Linear

regression

x^{(i)}LinReg(x^{(i)};\theta,\theta_0)J_{Lin}(\theta,\theta_0)

Neutral

networks

x^{(i)}NN(x^{(i)};W,W_0)J_{NN}(W,W_0)

CNNs:a taste of backpropagation

Regression:1 filter : size 3 & padding ; x^{(j)} dimension: 5x1

Forward pass:

        Z_i^1 = (W^1)^{\top}X_{[i-1,i,i+1]}\ (5\times1)\\ A_I^1 = ReLU(Z_i^1)\ (5\times1) \\A^2=(W^2)^{\top}A^1\ (1\times 1) \\L(A^2,y) = (A^2-y)^2\ (1\times 1)

Part of the derivative of SGD : 

\frac{\partial loss}{\partial W^1}=\frac{\partial Z^1}{\partial W^1}\cdot\frac{\partial A^1}{\partial Z^1}\cdot\frac{\partial loss}{\partial A^1}\\ \\3\times1= 3\times 5\cdot 5\times5\cdot 5\times 1

L8 状态机与马尔科夫决策过程

Markov Decision Process

  • S = set of possible states     {rich,poor}
  • A = set of possible actions   {plant,fallow}
  • T:S\times A\times S\to\mathbb{R}: transition model    e.g.

         0.9=\\P(S_t=poor|S_{t-1}=rich,A_{t-1}=plant)=T(rich,plant,poor)

  • R:S\times A\to\mathbb{R}: reward function

        e.g. R(rich,plant) = 100 bushels; R(poor,plant) = 10 bushels; R(rich,fallow) = 0 bushels         ;R(poor,fallow) = 0 bushels

  • A discount factor :  \gamma
  • A policy \pi:S\to A

What's the value of a policy?

  • h:horizon(e.g. how many growing seasons left)

  •  V_{\pi}^h(s) : value(expected reward) with policy \pi starting at s

        V_{\pi}^0(s)=0;V_{\pi}^h(s)=R(s,\pi_h(s))+\sum_{s^{'}}T(s,\pi_h(s),s^{'})\cdot V_{\pi}^{h-1}(s^{'})

What's the best policy?

  • Q^h(s,a) : expected reward if starting at s, making action a, and then making the 'best' action for the h-1 steps keft
  • With Q ,can find an optimal policy: \pi_h^{*}(s)=argmax_aQ^{h}(s,a)

        Q^0(s,a)=0;Q^h(s,a)=R(s,a)+\sum_{s^{'}}T(s,a,s^{'})max_{a^{'}}Q^{h-1}(s^{'},a^{'})

        Q^1(rich,plant)=100;Q^1(rich,fallow)=0; \\Q^1(poor,plant)=10;Q^1(poor,fallow)=0; \\Q^2(rich,plant)=119;Q^2(rich,fallow)=91; \\Q^2(poor,plant)=29;Q^2(poor,fallow)=91;

What's best? Any s,  \pi_1^{*}(s)=plant;\pi_2^{*}(rich)=plant;\pi_2^{*}(poor)=fallow;

What if I don't stop farming?

  • Problem:100 bushels today > 100 bushels in ten years
    • A solution: discount factor \gamma:0<\gamma<1
    • Value of 1 bushel after t time steps : \gamma^t bushels
    • Example: What's the value of 1 bushels per year forever?

                V=1+\gamma+\gamma^2+...=1+\gamma(V) V=1/(1-\gamma)\\ E.g. \ \gamma=0.99\ \Rightarrow V=100 \ bushels

  • V_{\pi}^h(s) : value(expected reward) with policy \pi starting at s

        V_{\pi}(s)=R(s,\pi_h(s))+\gamma\sum_{s^{'}}T(s,\pi(s),s^{'})\cdot V_{\pi}(s^{'})

        |S| linear equations in |S| unknowns

                                                                                         

               

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值