交叉熵损失结合反向传播算法不调包实现逻辑回归

相关知识

首先介绍交叉熵损失结合反向传播算法不调包实现逻辑回归的相关基本知识。

逻辑回归

逻辑回归是统计学习中的经典分类方法,其可以解决二分类和多分类问题。
对于连续的线性值的输出通过Sigmoid激活函数转为0-1之间的概率值。
公式: P ( Y = 1 ∣ x ) = e x p ( w x + b ) 1 + e x p ( w x + b ) P(Y=1|x)=\frac{exp(wx+b)}{1+exp(wx+b)} P(Y=1∣x)=1+exp(wx+b)exp(wx+b) P ( Y = 0 ∣ x ) = 1 1 + e x p ( w x + b ) P(Y=0|x)=\frac{1}{1+exp(wx+b)} P(Y=0∣x)=1+exp(wx+b)1

矩阵求导

更多具体的矩阵求导请参考开源书籍:The matrix cookbook

反向传播

反向传播是神经网络参数优化的基石。下面我们给出结合随机梯度下降的反向传播算法。
在这里插入图片描述

关于神经网络更多反向传播的细节请参考:深度学习(花书)

损失函数

假设数据集中有m个样本:
Z = [ z 1 , z 2 , z 3 , z 4 , ⋯   , z m ] T . Z = \begin{bmatrix} z_1 ,& z_2 ,& z_3 ,& z_4 ,& \cdots, & z_m \end{bmatrix}^{T}. Z=[z1,z2,z3,z4,,zm]T.
w = [ w 1 , w 2 , w 3 , w 4 , ⋯   , w k ] T . w = \begin{bmatrix} w_1 ,& w_2 ,& w_3 ,& w_4 ,& \cdots, & w_k \end{bmatrix}^{T}. w=[w1,w2,w3,w4,,wk]T.
X = [ x 11 , x 12 , x 13 , x 14 , ⋯   , x 1 m x 21 , x 22 , x 23 , x 24 , ⋯   , x 2 m x 31 , x 32 , x 33 , x 34 , ⋯   , x 3 m ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ x k 1 , x k 2 , x k 3 , x k 4 , ⋯   , x k m ] X = \begin{bmatrix} x_{11} ,& x_{12} ,& x_{13} ,& x_{14} ,& \cdots, & x_{1m} \\ x_{21} ,& x_{22} ,& x_{23} ,& x_{24} ,& \cdots, & x_{2m} \\ x_{31} ,& x_{32} ,& x_{33} ,& x_{34} ,& \cdots, & x_{3m} \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \\ x_{k1} ,& x_{k2} ,& x_{k3} ,& x_{k4} ,& \cdots, & x_{km} \\ \end{bmatrix} X= x11,x21,x31,xk1,x12,x22,x32,xk2,x13,x23,x33,xk3,x14,x24,x34,xk4,,,,,x1mx2mx3mxkm
b = [ b 0 , b 0 , b 0 , b 0 , ⋯   , b 0 ] T b= \begin{bmatrix} b_0 ,& b_0 ,& b_0 ,& b_0 ,& \cdots, & b_0 \end{bmatrix}^{T} b=[b0,b0,b0,b0,,b0]T .

Z = w T X + b Z = w^{T}X+b Z=wTX+b.
g ( z i ) = 1 1 + e x p ( z i ) , z i ∈ Z g(z_i)=\frac{1}{1+exp(z_i)}, z_i\in Z g(zi)=1+exp(zi)1,ziZ.
损失函数为: J S i g m o i d = − 1 m ∑ i = 1 m y i l o g ( g ( z i ) ) + ( 1 − y i ) l o g ( 1 − g ( z i ) ) . J_{Sigmoid}=-\frac{1}{m}\sum_{i=1}^{m}y_{i}log(g(z_i))+(1-y_{i})log(1-g(z_i)). JSigmoid=m1i=1myilog(g(zi))+(1yi)log(1g(zi)).
∂ J S i g m o i d ∂ w = − 1 m ∑ i = 1 m y i 1 g ( z i ) g ( z i ) ( 1 − g ( z i ) ) ∂ z i ∂ w + ( 1 − y i ) − 1 1 − g ( z i ) g ( z i ) ( 1 − g ( z i ) ) ∂ z i ∂ w = − 1 m ∑ i = 1 m y i ( 1 − g ( z i ) ) ∂ z i ∂ w + ( y i − 1 ) g ( z i ) ∂ z i ∂ w = 1 m ∑ i = 1 m ( g ( z i ) − y i ) ∂ z i ∂ w \begin{align} \frac{\partial J_{Sigmoid}}{\partial w}&=-\frac{1}{m}\sum_{i=1}^{m}y_i\frac{1}{g(z_i)}g(z_i)(1-g(z_i))\frac{\partial z_i}{\partial w}+(1-y_i)\frac{-1}{1-g(z_i)}g(z_i)(1-g(z_i))\frac{\partial z_i}{\partial w} \\ & =-\frac{1}{m}\sum_{i=1}^{m}y_i(1-g(z_i))\frac{\partial z_i}{\partial w}+(y_i-1)g(z_i)\frac{\partial z_i}{\partial w} \\ & =\frac{1}{m}\sum_{i=1}^{m}(g(z_i)-y_i)\frac{\partial z_i}{\partial w} \end{align} wJSigmoid=m1i=1myig(zi)1g(zi)(1g(zi))wzi+(1yi)1g(zi)1g(zi)(1g(zi))wzi=m1i=1myi(1g(zi))wzi+(yi1)g(zi)wzi=m1i=1m(g(zi)yi)wzi

∂ z i ∂ w \frac{\partial z_i}{\partial w} wzi:代表标量对向量求导。
∂ J S i g m o i d ∂ b = − 1 m ∑ i = 1 m y i 1 g ( z i ) g ( z i ) ( 1 − g ( z i ) ) ∂ z i ∂ b + ( 1 − y i ) − 1 1 − g ( z i ) g ( z i ) ( 1 − g ( z i ) ) ∂ z i ∂ b = − 1 m ∑ i = 1 m y i ( 1 − g ( z i ) ) ∂ z i ∂ b + ( y i − 1 ) g ( z i ) ∂ z i ∂ b = 1 m ∑ i = 1 m ( g ( z i ) − y i ) \begin{align} \frac{\partial J_{Sigmoid}}{\partial b}&=-\frac{1}{m}\sum_{i=1}^{m}y_i\frac{1}{g(z_i)}g(z_i)(1-g(z_i))\frac{\partial z_i}{\partial b}+(1-y_i)\frac{-1}{1-g(z_i)}g(z_i)(1-g(z_i))\frac{\partial z_i}{\partial b} \\ & =-\frac{1}{m}\sum_{i=1}^{m}y_i(1-g(z_i))\frac{\partial z_i}{\partial b}+(y_i-1)g(z_i)\frac{\partial z_i}{\partial b} \\ & =\frac{1}{m}\sum_{i=1}^{m}(g(z_i)-y_i) \end{align} bJSigmoid=m1i=1myig(zi)1g(zi)(1g(zi))bzi+(1yi)1g(zi)1g(zi)(1g(zi))bzi=m1i=1myi(1g(zi))bzi+(yi1)g(zi)bzi=m1i=1m(g(zi)yi)

∂ z i ∂ b \frac{\partial z_i}{\partial b} bzi:代表标量对标量求导。

代码复现

Sigmoid激活函数

> binary.sigmoid = function(x) 1 / (1 + exp(-x)) # sigmoid函数

梯度与损失信息统计

> binary.summary = function(w, b, X, Y){
+     m = ncol(X)
+     G = binary.sigmoid(t(w) %*% X + b) # sigmoid激活函数值
+     dw = (1 / m) * (X %*% t(G - Y)) # 参数w的梯度
+     db = (1 / m) * rowSums(G - Y) # 参数b的梯度
+    Loss = (-1 / m) * sum(Y * log(G) + (1 - Y) * log(1 - G)) # 损失
+ 
+   gradient = list(dw, db) # 梯度统计
+   return(list(gradient, Loss))   }


二分类预测

> binary.predict = function(w, b, X){
+          m = ncol(X)
+        temp = matrix(0,nrow = m, ncol = 1)
+      G = binary.sigmoid(t(w) %*% X + b) 
+        for(i in 1:m){
+           if(G[1,i] > 0.5) { temp[i,1] = 1  
+            } else { temp[i,1] = 0}}
+      return(temp) }

反向传播

> binary.optimize = function(w, b, X, Y, learning.rate){
+     grad = binary.summary(w, b, X, Y)[[1]]
+     Loss = binary.summary(w, b, X, Y)[[2]]
+     
+     dw = matrix(grad[[1]])
+     db = grad[[2]]
+     
+     w = w - learning.rate * dw
+     b = b - learning.rate * db
+ 
+     params = list(w, b, Loss)
+     return(params)  } # 梯度下降

预测

> binary.model = function(Xtrain, ytrain, Xtest, ytest, learning.rate,
+                number ){ # 迭代次数
+      k = nrow(Xtrain)
+     w = binary.Initalize(k)[[1]] 
+     b = binary.Initalize(k)[[2]] 
+     df = matrix(0,nrow = number, ncol = 3,dimnames = list(NULL, c('train',
+           'test', 'Loss'))) # 准确率 LOSS
+       for(i in 1:number){
+         
+    res = binary.optimize(w, b, Xtrain, ytrain, learning.rate) 
+        w = as.matrix(res[[1]])
+        b = res[[2]]
+     ytrain.pred = binary.predict(w, b, Xtrain)
+     ytest.pred = binary.predict(w, b, Xtest)
+      df[i,] = c(mean(ytrain == ytrain.pred) * 100, 
+                 mean(ytest == ytest.pred) * 100, res[[3]])
+       }
+    return(df)    }

可视化

不同学习率以及不同迭代次数下的分类准确率以及LOSS

以下只是列举了学习率为0.01,0.005,0.002,迭代次数为500,1000时的情况,对于其他的情况欢迎感兴趣的读者尝试以及提出改进意见.

> S1 = binary.model(df_train, y_train, df_test, y_test, 0.01, 500)
> S1 = binary.model(df_train, y_train, df_test, y_test, 0.005, 500)
> S1 = binary.model(df_train, y_train, df_test, y_test, 0.01, 500)
> S2 = binary.model(df_train, y_train, df_test, y_test, 0.005, 500)
> S3 = binary.model(df_train, y_train, df_test, y_test, 0.002, 500)
> S4 = binary.model(df_train, y_train, df_test, y_test, 0.01, 1000)
> S5 = binary.model(df_train, y_train, df_test, y_test, 0.005, 1000)
> S6 = binary.model(df_train, y_train, df_test, y_test, 0.002, 1000)
> p1 = ggplot(melt(S1,id.vars = c('x'))[1:1000,], 
+       aes(x = x, y = value, colour = variable))+
+ geom_point()+
+ geom_line()+
+ labs(colour = 'accuration')
> 
> plot_grid(p1, p2, p3, nrow = 3, ncol = 1)
> p1 = ggplot(melt(S1,id.vars = c('x'))[1:1000,], 
+       aes(x = x, y = value, colour = variable))+
+ geom_point()+
+ geom_line()+
+ labs(colour = 'accuration')+
+ xlab('')
> 
> p1
> p1 = ggplot(melt(S1,id.vars = c('x'))[1:1000,], 
+       aes(x = x, y = value, colour = variable))+
+ geom_point()+
+ geom_line()+
+ labs(colour = 'accuration')+
+ xlab('')
> 
> p2 = ggplot(melt(S2,id.vars = c('V4'))[1:1000,],
+        aes(x = V4, y = value, colour = variable))+
+ geom_point()+
+ geom_line()+
+ scale_colour_manual(values = c('#4DBBD5FF', '#00A087FF'))+
+ labs(colour = 'accuration')+
+ xlab('')
> 
> 
> p3 = ggplot(melt(S3,id.vars = c('V4'))[1:1000,],
+        aes(x = V4, y = value, colour = variable))+
+ geom_point()+
+ geom_line()+
+ scale_colour_manual(values = c('#4DBBD5FF', '#00A087FF'))+
+ labs(colour = 'accuration')+
+ xlab('')
> 
> 
> p4 = ggplot(melt(S4,id.vars = c('V4'))[1:2000,],
+        aes(x = V4, y = value, colour = variable))+
+ geom_point()+
+ geom_line()+
+ scale_colour_manual(values = c('#DC0000FF', '#7E6148FF'))+
+ labs(colour = 'accuration')+
+ xlab('')
> 
> 
> 
> p5 = ggplot(melt(S5,id.vars = c('V4'))[1:2000,],
+        aes(x = V4, y = value, colour = variable))+
+ geom_point()+
+ geom_line()+
+ scale_colour_manual(values = c('#3C5488FF', '#F39B7FFF'))+
+ labs(colour = 'accuration')+
+ xlab('')
> 
> p6 = ggplot(melt(S6,id.vars = c('V4'))[1:2000,],
+        aes(x = V4, y = value, colour = variable))+
+ geom_point()+
+ geom_line()+
+ scale_colour_manual(values = c('#DC0000FF', '#7E6148FF'))+
+ labs(colour = 'accuration')+
+ xlab('')
> 
> 
> plot_grid(p1, p2, p3, nrow = 3, ncol = 1)

在这里插入图片描述
在这里插入图片描述

> plot(NULL, type = 'n', xlab = ' ', ylab = 'Loss',
+      xlim = c(1,500), ylim = c(0,1),
+      xaxt = 'n', yaxt = 'n', cex.lab = .7, main = 'circulate=500')
> lines(y1, col = '#E64B35FF', lty = 1, lwd = 2)
> lines(y2, col = '#4DBBD5FF', lty = 1, lwd = 2)
> lines(y3, col = '#00A087FF', lty = 1, lwd = 2)
> legend('topright', legend = c('learning rate 0.01',
+                   'learning rate 0.005', 'learning rate 0.002'),
+        col = c('#E64B35FF', '#4DBBD5FF', '#00A087FF'),
+        lty = c(1, 1, 1), lwd = c(2, 2, 2),
+        cex = .65, inset = .001)
> axis(side = 1, col = '#7E6148FF', cex.axis = .7)
> axis(side = 2, col = '#7E6148FF', cex.axis = .7)
> 
> plot(NULL, type = 'n', xlab = ' ', ylab = 'Loss',
+      xlim = c(1,1000), ylim = c(0,1),
+      xaxt = 'n', yaxt = 'n', cex.lab = .7, main = 'circulate=1000')
> lines(y4, col = '#3C5488FF', lty = 2, lwd = 2)
> lines(y5, col = '#F39B7FFF', lty = 2, lwd = 2)
> lines(y6, col = '#8491B4FF', lty = 2, lwd = 2)
> legend('topright', legend = c('learning rate 0.01',
+                   'learning rate 0.005', 'learning rate 0.002'),
+        col = c('#3C5488FF', '#F39B7FFF', '#8491B4FF'),
+        lty = c(2, 2, 2), lwd = c(2, 2, 2),
+        cex = .65, inset = .001)
> axis(side = 1, col = '#7E6148FF', cex.axis = .7)
> axis(side = 2, col = '#7E6148FF', cex.axis = .7)


在这里插入图片描述

在这里插入图片描述

数据资源及代码资源下载

链接如下:
本篇博客数据以及代码资源下载.

关于交叉熵损失结合反向传播算法实现逻辑回归在这里要和大家说声再见了,非常欢迎感兴趣的读者提出问题和建议!

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

麻利麻利哄吧

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值