2020-10-28 LogisticRegression Exercise

3 篇文章 0 订阅
data = data[which(data$FY == '2014'),]
data$FQ <- NULL

我认为在这里我们只需要2014年的数据就足够了,并且财季对回归的建立没有帮助

我们先建立一个模型m0,在summary中我们可以看到各个factor的系数

m0 = glm(OnlineOrderFlag~Bikes+Clothing+Components+Accessories, data=data, 
         family=binomial(link='logit'))
summary(m0)

## 
## Call:
## glm(formula = OnlineOrderFlag ~ Bikes + Clothing + Components + 
##     Accessories, family = binomial(link = "logit"), data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.5203   0.0639   0.0736   0.1408   0.5093  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   3.5617     0.1605  22.191   <2e-16 ***
## Bikes        -1.3016     0.1373  -9.477   <2e-16 ***
## Clothing     -0.2829     0.1122  -2.521   0.0117 *  
## Components  -23.4698   327.5423  -0.072   0.9429    
## Accessories   2.6323     0.1366  19.266   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 10109  on 21461  degrees of freedom
## Residual deviance:  3121  on 21457  degrees of freedom
## AIC: 3131
## 
## Number of Fisher Scoring iterations: 18

注意小星标是significant level 的意思,Components没有小星标所以我们可以认为这个系数是极其不可靠的

所以接下来我们建立模型m1,不包含Componments

m1 = glm(OnlineOrderFlag~Bikes+Clothing+Accessories, data=data, 
         family=binomial(link='logit'))
summary(m1)

## 
## Call:
## glm(formula = OnlineOrderFlag ~ Bikes + Clothing + Accessories, 
##     family = binomial(link = "logit"), data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.0767   0.1330   0.1915   0.3999   0.8769  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  2.99548    0.08708   34.40   <2e-16 ***
## Bikes       -1.50410    0.07391  -20.35   <2e-16 ***
## Clothing    -0.73385    0.06301  -11.65   <2e-16 ***
## Accessories  1.72858    0.06326   27.32   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 10108.9  on 21461  degrees of freedom
## Residual deviance:  8160.8  on 21458  degrees of freedom
## AIC: 8168.8
## 
## Number of Fisher Scoring iterations: 6

新的模型AIC也更大了,good
现在我们得到了一个相对可靠的逻辑回归模型用于判断是否买自行车、附件及衣服对于用户最终选择线上下单还是去门店购买


到此为止关于Assignment所需要的内容就算完成了

接下来是 What else can we do 环节

  • 我们希望可以看到根据已有factors所得到的判断(是否线上购买)是什么,我们建立列class
m0.logodd = predict(m0, type="link")

m0.class  = as.numeric(m0.logodd>0)

data$class = m0.class
  • 我们希望可以看到with such factors 我们得到【选择线上购买】的概率是多少,我们建立列prob,取3位小数,单位为%
m0.prob   = predict(m0, type="response")
data$prob = round(m0.prob*100,3)

我们可以peek一下现在的data file长什么样

data[1:10,]
##        FY Bikes Clothing Components Accessories OnlineOrderFlag class   prob
## 9589 2014     0        0          0           1               1     1 99.796
## 9590 2014     1        0          0           0               1     1 90.552
## 9591 2014     1        0          1           0               0     0  0.000
## 9592 2014     1        0          0           1               1     1 99.255
## 9593 2014     0        1          0           1               1     1 99.730
## 9594 2014     1        1          0           1               1     1 99.014
## 9595 2014     1        1          0           0               1     1 87.839
## 9596 2014     0        1          0           1               1     1 99.730
## 9597 2014     1        0          0           1               1     1 99.255
## 9598 2014     0        0          0           1               1     1 99.796
							要注意是[1:10,]有个逗号,不然会报错

接下来我们也许会想看看在各个情况下得到的class 和 prob分别是什么

我们引入package 【tidyverse】, 使用函数filter,并同上peek前十行

library(tidyverse)
q = data %>% filter(Bikes == 0, Clothing == 0, Components == 0, Accessories == 0)
q[1:10,]
##      FY Bikes Clothing Components Accessories OnlineOrderFlag class prob
## NA   NA    NA       NA         NA          NA              NA    NA   NA
## NA.1 NA    NA       NA         NA          NA              NA    NA   NA
## NA.2 NA    NA       NA         NA          NA              NA    NA   NA
## NA.3 NA    NA       NA         NA          NA              NA    NA   NA
## NA.4 NA    NA       NA         NA          NA              NA    NA   NA
## NA.5 NA    NA       NA         NA          NA              NA    NA   NA
## NA.6 NA    NA       NA         NA          NA              NA    NA   NA
## NA.7 NA    NA       NA         NA          NA              NA    NA   NA
## NA.8 NA    NA       NA         NA          NA              NA    NA   NA
## NA.9 NA    NA       NA         NA          NA              NA    NA   NA
e = data %>% filter(Bikes == 0, Clothing == 0, Components == 0, Accessories == 1)
e[1:10,]
##      FY Bikes Clothing Components Accessories OnlineOrderFlag class   prob
## 1  2014     0        0          0           1               1     1 99.796
## 2  2014     0        0          0           1               1     1 99.796
## 3  2014     0        0          0           1               1     1 99.796
## 4  2014     0        0          0           1               1     1 99.796
## 5  2014     0        0          0           1               1     1 99.796
## 6  2014     0        0          0           1               1     1 99.796
## 7  2014     0        0          0           1               1     1 99.796
## 8  2014     0        0          0           1               1     1 99.796
## 9  2014     0        0          0           1               1     1 99.796
## 10 2014     0        0          0           1               1     1 99.796
r = data %>% filter(Bikes == 1, Clothing == 1, Accessories == 1, OnlineOrderFlag == 0)
r[1:10,]
##      FY Bikes Clothing Components Accessories OnlineOrderFlag class prob
## 1  2014     1        1          1           1               0     0    0
## 2  2014     1        1          1           1               0     0    0
## 3  2014     1        1          1           1               0     0    0
## 4  2014     1        1          1           1               0     0    0
## 5  2014     1        1          1           1               0     0    0
## 6  2014     1        1          1           1               0     0    0
## 7  2014     1        1          1           1               0     0    0
## 8  2014     1        1          1           1               0     0    0
## 9  2014     1        1          1           1               0     0    0
## 10 2014     1        1          1           1               0     0    0
t = data %>% filter(Bikes == 1, Clothing == 1, Accessories == 1, OnlineOrderFlag == 1)
t[1:10,]
##      FY Bikes Clothing Components Accessories OnlineOrderFlag class   prob
## 1  2014     1        1          0           1               1     1 99.014
## 2  2014     1        1          0           1               1     1 99.014
## 3  2014     1        1          0           1               1     1 99.014
## 4  2014     1        1          0           1               1     1 99.014
## 5  2014     1        1          0           1               1     1 99.014
## 6  2014     1        1          0           1               1     1 99.014
## 7  2014     1        1          0           1               1     1 99.014
## 8  2014     1        1          0           1               1     1 99.014
## 9  2014     1        1          0           1               1     1 99.014
## 10 2014     1        1          0           1               1     1 99.014

可能会好奇被我们删掉的Components到底是个什么情况,我们把Components单拎出来,分Components为1和为0两个单独看

check1 = data %>% filter(Components == 1)
summary(check1)
##        FY           Bikes           Clothing        Components  Accessories    
##  Min.   :2014   Min.   :0.0000   Min.   :0.0000   Min.   :1    Min.   :0.0000  
##  1st Qu.:2014   1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:1    1st Qu.:0.0000  
##  Median :2014   Median :1.0000   Median :1.0000   Median :1    Median :0.0000  
##  Mean   :2014   Mean   :0.8107   Mean   :0.6341   Mean   :1    Mean   :0.4038  
##  3rd Qu.:2014   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1    3rd Qu.:1.0000  
##  Max.   :2014   Max.   :1.0000   Max.   :1.0000   Max.   :1    Max.   :1.0000  
##  OnlineOrderFlag     class        prob  
##  Min.   :0       Min.   :0   Min.   :0  
##  1st Qu.:0       1st Qu.:0   1st Qu.:0  
##  Median :0       Median :0   Median :0  
##  Mean   :0       Mean   :0   Mean   :0  
##  3rd Qu.:0       3rd Qu.:0   3rd Qu.:0  
##  Max.   :0       Max.   :0   Max.   :0
check1[1:10,]
##      FY Bikes Clothing Components Accessories OnlineOrderFlag class prob
## 1  2014     1        0          1           0               0     0    0
## 2  2014     1        1          1           1               0     0    0
## 3  2014     1        1          1           1               0     0    0
## 4  2014     1        1          1           1               0     0    0
## 5  2014     1        0          1           0               0     0    0
## 6  2014     1        1          1           1               0     0    0
## 7  2014     1        1          1           0               0     0    0
## 8  2014     1        1          1           1               0     0    0
## 9  2014     0        1          1           1               0     0    0
## 10 2014     0        0          1           0               0     0    0
check0 = data %>% filter(Components == 0)
summary(check0)
##        FY           Bikes          Clothing        Components  Accessories    
##  Min.   :2014   Min.   :0.000   Min.   :0.0000   Min.   :0    Min.   :0.0000  
##  1st Qu.:2014   1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0    1st Qu.:1.0000  
##  Median :2014   Median :0.000   Median :0.0000   Median :0    Median :1.0000  
##  Mean   :2014   Mean   :0.443   Mean   :0.3419   Mean   :0    Mean   :0.8066  
##  3rd Qu.:2014   3rd Qu.:1.000   3rd Qu.:1.0000   3rd Qu.:0    3rd Qu.:1.0000  
##  Max.   :2014   Max.   :1.000   Max.   :1.0000   Max.   :0    Max.   :1.0000  
##  OnlineOrderFlag      class        prob      
##  Min.   :0.0000   Min.   :1   Min.   :87.84  
##  1st Qu.:1.0000   1st Qu.:1   1st Qu.:99.01  
##  Median :1.0000   Median :1   Median :99.25  
##  Mean   :0.9803   Mean   :1   Mean   :98.03  
##  3rd Qu.:1.0000   3rd Qu.:1   3rd Qu.:99.80  
##  Max.   :1.0000   Max.   :1   Max.   :99.80
check0[1:10,]
##      FY Bikes Clothing Components Accessories OnlineOrderFlag class   prob
## 1  2014     0        0          0           1               1     1 99.796
## 2  2014     1        0          0           0               1     1 90.552
## 3  2014     1        0          0           1               1     1 99.255
## 4  2014     0        1          0           1               1     1 99.730
## 5  2014     1        1          0           1               1     1 99.014
## 6  2014     1        1          0           0               1     1 87.839
## 7  2014     0        1          0           1               1     1 99.730
## 8  2014     1        0          0           1               1     1 99.255
## 9  2014     0        0          0           1               1     1 99.796
## 10 2014     0        0          0           1               1     1 99.796

懒得看了,大概就是这意思吧

html版连接点这

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值