c++ 实现linear和CNN神经网络(数学公式)

数学:
https://www.cnblogs.com/pinard/p/6422831.html
https://www.cnblogs.com/pinard/p/6494810.html
https://www.cnblogs.com/pinard/p/10750718.html
https://www.cnblogs.com/pinard/p/10773942.html
https://www.cnblogs.com/pinard/p/6494810.html
https://blog.csdn.net/qq_37951753/article/details/79672615
https://blog.csdn.net/evanxxxnnn/article/details/83552318
https://zhuanlan.zhihu.com/p/45310446
https://blog.csdn.net/qq_36342854/article/details/103863741
https://mp.weixin.qq.com/s/2xYgaeLlmmUfxiHCbCa8dQ
c++:
https://www.cnblogs.com/xuefeng00/p/11093425.html
http://www.cplusplus.com/reference/random/normal_distribution/
https://people.sc.fsu.edu/~jburkardt/cpp_src/truncated_normal/truncated_normal.html
https://www.cnblogs.com/jingshikongming/p/9037881.html
https://www.zhihu.com/question/63507542
https://blog.csdn.net/yahamatarge/article/details/89380164
https://www.cnblogs.com/jhmu0613/p/7750798.html
https://blog.csdn.net/qq_25175067/article/details/80266003
https://blog.csdn.net/lmb1612977696/article/details/80035487
http://blog.chinaunix.net/uid-20773165-id-1847733.html
https://baijiahao.baidu.com/s?id=1651645857687261494&wfr=spider&for=pc
https://blog.csdn.net/weixin_34007291/article/details/93528095

理论

多层感知机前向传播及导数

z l = w l a l − 1 + b l a l = σ ( z l ) m l × n = σ ( m l × m l − 1 × m l − 1 × n + m l × 1 ) ( m l , ) = σ ( ( m l , m l − 1 ) × ( m l − 1 , ) + ( m l , ) ) z^l={w^{l}}a^{l-1}+b^l \\ a^l=\sigma(z^l)\\ m_l\times n=\sigma(m_{l}\times m_{l-1}\times m_{l-1}\times n + m_l\times 1) \\ (m_l,)=\sigma((m_l, m_{l-1})\times(m_{l-1},)+(m_l ,)) zl=wlal1+blal=σ(zl)ml×n=σ(ml×ml1×ml1×n+ml×1)(ml,)=σ((ml,ml1)×(ml1,)+(ml,))

多层感知机反向传播

损失函数:
J ( y , y ^ ) J(y, \hat{y}) J(y,y^)
导数传播
∂ J ( y , y ^ ) ∂ w l = ∂ J ( y , y ^ ) ∂ z l ∂ z l ∂ w l ∂ J ( y , y ^ ) ∂ z l 记 为 δ l ∂ J ( y , y ^ ) ∂ w l = δ l ∂ z l ∂ w l = δ l a l − 1 T ∂ J ( y , y ^ ) ∂ z L = ▽ J ( y , y ^ ) δ L − 1 = ∂ J ( y , y ^ ) ∂ z L − 1 = ∂ J ( y , y ^ ) ∂ z L ∂ z L ∂ a L − 1 ∂ a L − 1 ∂ z L − 1 = w L T δ L ⊙ σ ′ ( z L − 1 ) ∂ J ( y , y ^ ) ∂ b l = ∑ m δ m l \begin{aligned} \frac{\partial J(y, \hat{y})}{\partial w^l}& =\frac{\partial J(y, \hat{y})}{\partial z^l}\frac{\partial z^l}{\partial w^l} \\ \frac{\partial J(y, \hat{y})}{\partial z^l}&记为\delta^l \\ \frac{\partial J(y, \hat{y})}{\partial w^l}& =\delta^l\frac{\partial z^l}{\partial w^l}=\delta^l{a^{l-1}}^T \\ \frac{\partial J(y, \hat{y})}{\partial z^L} & = \triangledown J(y,\hat{y}) \\ \delta^{L-1}= \frac{\partial J(y, \hat{y})}{\partial z^{L-1}} & = \frac{\partial J(y, \hat{y})}{\partial z^L}\frac{\partial z^L}{\partial a^{L-1}}\frac{\partial a^{L-1}}{\partial z^{L-1}}={w^L}^T\delta^L \odot \sigma^{'}(z^{L-1}) \\ \frac{\partial J(y, \hat{y})}{\partial b^l}&=\sum_m\delta^l_m \end{aligned} wlJ(y,y^)zlJ(y,y^)wlJ(y,y^)zLJ(y,y^)δL1=zL1J(y,y^)blJ(y,y^)=zlJ(y,y^)wlzlδl=δlwlzl=δlal1T=J(y,y^)=zLJ(y,y^)aL1zLzL1aL1=wLTδLσ(zL1)=mδml
所以每层 l l l 维护
∂ z l ∂ w l = a l − 1 \frac{\partial z^{l}}{\partial w^l}=a^{l-1} wlzl=al1

每层 l l l 反向传播时返回
∂ J ( y , y ^ ) ∂ z l \frac{\partial J(y, \hat{y})}{\partial z^{l}} zlJ(y,y^)

CNN前向传播

卷 积 : z l = a l − 1 ∗ W l + b l a l = σ ( z l ) 池 化 : a l = p o o l i n g ( a l − 1 ) 卷积: \\z^l=a^{l-1}*W^l+b^l \\ a^l = \sigma(z^l) \\ 池化: \\ a^l = pooling(a^{l-1}) :zl=al1Wl+blal=σ(zl):al=pooling(al1)

CNN反向传播

卷 积 : ∂ J ( y , y ^ ) ∂ z l = ( ∂ z l + 1 ∂ z l ) T ∂ J ( y , y ^ ) ∂ z l + 1 ∂ J ( y , y ^ ) ∂ z l = δ l z l + 1 = a l ∗ W l + 1 + b l + 1 = σ ( z l ) ∗ W l + 1 + b l + 1 ∂ z l + 1 ∂ z l = r o t ( W l + 1 ) ⊙ σ ′ ( z l ) ∂ J ( y , y ^ ) ∂ z l = δ l = δ l + 1 ∗ r o t ( W l + 1 ) ⊙ σ ′ ( z l ) ∂ J ( y , y ^ ) ∂ w l = a l − 1 ∗ δ l ∂ J ( y , y ^ ) ∂ b l = ∑ i , j δ i , j l 池 化 : 将 ∂ J ( y , y ^ ) ∂ a l − 1 按 照 池 化 的 权 重 回 填 成 输 入 的 矩 阵 ∂ J ( y , y ^ ) ∂ a l \begin{aligned} 卷积: \\ \frac{\partial J(y, \hat{y})}{\partial z^{l}} &=(\frac{\partial z^{l+1}}{\partial z^l} )^T\frac{\partial J(y, \hat{y})}{\partial z^{l+1}} \\ \frac{\partial J(y, \hat{y})}{\partial z^l} &=\delta^l \\z^{l+1}&=a^{l}*W^{l+1}+b^{l+1}=\sigma(z^{l})*W^{l+1}+b^{l+1} \\\frac{\partial z^{l+1}}{\partial z^l}&=rot(W^{l+1}) \odot \sigma^{'}(z^{l}) \\\frac{\partial J(y, \hat{y})}{\partial z^{l}} &= \delta^l=\delta^{l+1}*rot(W^{l+1})\odot \sigma^{'}(z^{l}) \\ \frac{\partial J(y, \hat{y})}{\partial w^{l}} &= a^{l-1}*\delta^l \\ \frac{\partial J(y, \hat{y})}{\partial b^{l}} &=\sum_{i,j} \delta^l_{i,j} \\ 池化: \\ 将\frac{\partial J(y, \hat{y})}{\partial a^{l-1}}&按照池化的权重回填成输入的矩阵\frac{\partial J(y, \hat{y})}{\partial a^l} \end{aligned} :zlJ(y,y^)zlJ(y,y^)zl+1zlzl+1zlJ(y,y^)wlJ(y,y^)blJ(y,y^)al1J(y,y^)=(zlzl+1)Tzl+1J(y,y^)=δl=alWl+1+bl+1=σ(zl)Wl+1+bl+1=rot(Wl+1)σ(zl)=δl=δl+1rot(Wl+1)σ(zl)=al1δl=i,jδi,jlalJ(y,y^)

每层 l l l 维护:
雅 克 比 矩 阵 : ∂ J ( y , y ^ ) ∂ a l − 1 = ( ∂ z l ∂ a l − 1 ) T ∂ J ( y , y ^ ) ∂ z l = δ l ∗ r o t ( W l ) 雅克比矩阵:\frac{\partial J(y,\hat{y})}{\partial a^{l-1}}=(\frac{\partial z^{l}}{\partial a^{l-1}} )^T\frac{\partial J(y, \hat{y})}{\partial z^{l}} = \delta^{l}*rot(W^{l})\\ al1J(y,y^)=(al1zl)TzlJ(y,y^)=δlrot(Wl)

BatchNormal 前向传播

在这里插入图片描述

BatchNormal 反向传播

∂ J ∂ β = ∑ ∂ J ∂ y i ∂ J ∂ γ = ∑ ∂ J ∂ y i x i ^ ∂ C ∂ x i = ∑ k m ∂ C ∂ x ^ k ∂ x ^ k ∂ x i = ∂ C ∂ x ^ i 1 σ 2 + ϵ + ∑ k m ∂ C ∂ x ^ k ∂ x ^ k ∂ σ 2 ∂ σ 2 ∂ x i + ∑ k m ∂ C ∂ x ^ k ∂ x ^ k ∂ μ ∂ μ ∂ x i = ∂ C ∂ x ^ i 1 σ 2 + ϵ + ∂ σ 2 ∂ x i ⋅ ∑ k m ∂ C ∂ x ^ k ∂ x ^ k ∂ σ 2 + ∂ μ ∂ x i ⋅ ∑ k m ∂ C ∂ x ^ k ∂ x ^ k ∂ μ ∑ i m ∂ C ∂ x ^ i ∂ x ^ i ∂ σ 2 = ∑ i m ∂ C ∂ x ^ i [ − 1 2 x i − μ ( σ 2 + ϵ ) 3 ] = − 1 2 1 ( σ 2 + ϵ ) 3 ∑ i m ∂ C ∂ x ^ i ( x i − μ ) ∂ σ 2 ∂ x i = 2 m ( x i − μ ) ∑ i m ∂ C ∂ x ^ i ∂ x ^ i ∂ μ = − 1 σ 2 + ϵ ⋅ ∑ i m ∂ C ∂ x ^ i ∂ C ∂ x i = ∂ C ∂ x ^ i 1 σ 2 + ϵ + [ − 1 2 1 ( σ 2 + ϵ ) 3 ∑ i m ∂ C ∂ x ^ i ( x i − μ ) ] 2 m ( x i − μ ) + − 1 σ 2 + ϵ ⋅ ∑ i m ∂ C ∂ x ^ i 1 m = ∂ C ∂ x ^ i 1 σ 2 + ϵ − 1 m x i − μ ( σ 2 + ϵ ) 3 ∑ i m ∂ C ∂ x ^ i ( x i − μ ) − 1 m 1 σ 2 + ϵ ⋅ ∑ i m ∂ C ∂ x ^ i = 1 m 1 σ 2 + ϵ ⋅ { m ∂ C ∂ x ^ i − x i − μ ( σ 2 + ϵ ) 2 ∑ i m ∂ C ∂ x ^ i ( x i − μ ) − ∑ i m ∂ C ∂ x ^ i } \begin{aligned} \frac{\partial J}{\partial \beta}&=\sum\frac{\partial J}{\partial y_i} \\ \frac{\partial J}{\partial \gamma}&=\sum\frac{\partial J}{\partial y_i} \hat{x_i} \\\frac{\partial C}{\partial x_{i}} &=\sum_{k}^{m} \frac{\partial C}{\partial \hat{x}_{k}} \frac{\partial \hat{x}_{k}}{\partial x_{i}} \\ &=\frac{\partial C}{\partial \hat{x}_{i}} \frac{1}{\sqrt{\sigma^{2}+\epsilon}}+\sum_{k}^{m} \frac{\partial C}{\partial \hat{x}_{k}} \frac{\partial \hat{x}_{k}}{\partial \sigma^{2}} \frac{\partial \sigma^{2}}{\partial x_{i}}+\sum_{k}^{m} \frac{\partial C}{\partial \hat{x}_{k}} \frac{\partial \hat{x}_{k}}{\partial \mu} \frac{\partial \mu}{\partial x_{i}} \\ &=\frac{\partial C}{\partial \hat{x}_{i}} \frac{1}{\sqrt{\sigma^{2}+\epsilon}}+\frac{\partial \sigma^{2}}{\partial x_{i}} \cdot \sum_{k}^{m} \frac{\partial C}{\partial \hat{x}_{k}} \frac{\partial \hat{x}_{k}}{\partial \sigma^{2}}+\frac{\partial \mu}{\partial x_{i}} \cdot \sum_{k}^{m} \frac{\partial C}{\partial \hat{x}_{k}} \frac{\partial \hat{x}_{k}}{\partial \mu} \\ \sum_{i}^{m} \frac{\partial C}{\partial \hat{x}_{i}} \frac{\partial \hat{x}_{i}}{\partial \sigma^{2}} &=\sum_{i}^{m} \frac{\partial C}{\partial \hat{x}_{i}}\left[-\frac{1}{2} \frac{x_{i}-\mu}{(\sqrt{\sigma^{2}+\epsilon})^{3}}\right] \\ &=-\frac{1}{2} \frac{1}{(\sqrt{\sigma^{2}+\epsilon})^{3}} \sum_{i}^{m} \frac{\partial C}{\partial \hat{x}_{i}}\left(x_{i}-\mu\right) \\ \frac{\partial \sigma^{2}}{\partial x_{i}} &=\frac{2}{m}\left(x_{i}-\mu\right) \\ \sum_{i}^{m} \frac{\partial C}{\partial \hat{x}_{i}} \frac{\partial \hat{x}_{i}}{\partial \mu} &=\frac{-1}{\sqrt{\sigma^{2}+\epsilon}} \cdot \sum_{i}^{m} \frac{\partial C}{\partial \hat{x}_{i}} \\ \frac{\partial C}{\partial x_{i}} &=\frac{\partial C}{\partial \hat{x}_{i}} \frac{1}{\sqrt{\sigma^{2}+\epsilon}}+\left[-\frac{1}{2} \frac{1}{(\sqrt{\sigma^{2}+\epsilon})^{3}} \sum_{i}^{m} \frac{\partial C}{\partial \hat{x}_{i}}\left(x_{i}-\mu\right)\right] \frac{2}{m}\left(x_{i}-\mu\right)\\&+\frac{-1}{\sqrt{\sigma^{2}+\epsilon}} \cdot \sum_{i}^{m} \frac{\partial C}{\partial \hat{x}_{i}} \frac{1}{m} \\ &=\frac{\partial C}{\partial \hat{x}_{i}} \frac{1}{\sqrt{\sigma^{2}+\epsilon}}- \frac{1}{m} \frac{x_{i}-\mu}{(\sqrt{\sigma^{2}+\epsilon})^{3}} \sum_{i}^{m} \frac{\partial C}{\partial \hat{x}_{i}}\left(x_{i}-\mu\right) \\&-\frac{1}{m} \frac{1}{\sqrt{\sigma^{2}+\epsilon}} \cdot \sum_{i}^{m} \frac{\partial C}{\partial \hat{x}_{i}} \\ &=\frac{1}{m} \frac{1}{\sqrt{\sigma^{2}+\epsilon}} \cdot\left\{m \frac{\partial C}{\partial \hat{x}_{i}}-\frac{x_{i}-\mu}{(\sqrt{\sigma^{2}+\epsilon})^{2}} \sum_{i}^{m} \frac{\partial C}{\partial \hat{x}_{i}}\left(x_{i}-\mu\right)-\sum_{i}^{m} \frac{\partial C}{\partial \hat{x}_{i}}\right\} \end{aligned} βJγJxiCimx^iCσ2x^ixiσ2imx^iCμx^ixiC=yiJ=yiJxi^=kmx^kCxix^k=x^iCσ2+ϵ 1+kmx^kCσ2x^kxiσ2+kmx^kCμx^kxiμ=x^iCσ2+ϵ 1+xiσ2kmx^kCσ2x^k+xiμkmx^kCμx^k=imx^iC[21(σ2+ϵ )3xiμ]=21(σ2+ϵ )31imx^iC(xiμ)=m2(xiμ)=σ2+ϵ 1imx^iC=x^iCσ2+ϵ 1+[21(σ2+ϵ )31imx^iC(xiμ)]m2(xiμ)+σ2+ϵ 1imx^iCm1=x^iCσ2+ϵ 1m1(σ2+ϵ )3xiμimx^iC(xiμ)m1σ2+ϵ 1imx^iC=m1σ2+ϵ 1{mx^iC(σ2+ϵ )2xiμimx^iC(xiμ)imx^iC}

softmax

原始公式:
y i = x i ∑ j x j y_i=\frac{x_i}{\sum_jx_j} yi=jxjxi
问题:
e n e^n en 当n稍微大点就爆精度了

解决:
log ⁡ ∑ i e x i = a + log ⁡ ∑ i e x i − a \log{\sum_ie^{x_i}}=a+\log{\sum_ie^{x_i-a}} logiexi=a+logiexia
a a a x i x_i xi 中的最大值

初始化

truncated normal

产生截断正态分布随机数,取值范围为 [ mean - 2 * stddev, mean + 2 * stddev ]
只会循环产生随机数,将不满足的去掉
后来翻墙找到了一个库

实践

总目标是跑的快,可能牺牲代码可读性和安全性(瞎几把写)。

类设计

框架

  1. Tensor 类, 维度,计算,名字,初始化
  2. Layer类,输入,输出,参数,反向传播
    子类包括CNN、全连接、池化、batchnormal、激活和损失函数层
  3. 网络类,一个计算图DAG,用拓扑排序挨个计算,每个节点都是layer,记录入度,包含损失函数类和train、test接口

细节

1. Tensor
元素
  1. data[],一维
  2. shape,vector
  3. grad,null或者和data一样长
  4. name,char数组,默认null
方法
  1. 初始化
    1. 传入shape和初始化值
    2. 传入shape,random初始化
  2. dot
    点乘 检查shape 遍历
  3. mul
    矩阵乘法
    检查shape
  4. add
    加法 检查shape
  5. sub
    减法 检查shape
  6. div
    除法 检查shape
  7. print
    名字(如果有)维度一行
    数据一行
    梯度(如果有)一行
    可能重载流实现
  8. setName、getName()
  9. 重载运算符[]、()
    取出data和grad
  10. reshape
2. Layer
元素

方法
  1. forward
  2. backward
子类1 linear
  1. 神经元个数m,输入的维度
  2. Tensor类,二维,参数W
  3. Tensor类,参数bias
  4. Tensor类,input
子类2 cnn
  1. m,n,h,w,filter 个数,输入的channel,高宽
  2. stride
  3. padding
  4. W 参数 (m,n,h,w)
  5. bias 参数 (m,)
子类3 maxpooling
  1. h,w,stride
  2. 权重图,用于反向传播
子类4 softmax
子类5 relu
子类6 batchnormal
  1. momentum (mean = momentum * mean + (1.0 - momentum) * nowbatchmean)
  2. gamma, beta
  3. mean,var
  4. 形状以前一层是CNN还是linear来定
子类7 mse
子类8 cross validation loss
子类9 自定义loss
3. Net
元素
  1. 图DAG,链式前向星还是vector再说
  2. 每个节点是一个layer,但是要有输入输出buffer,防止分叉的情况重复计算
方法
  1. init 建立好DAG,也就是不支持动态图
  2. train 传入一个batch
  3. test
  4. ()重载运算符,前向传播
  5. save
  6. load
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值