R机器学习之二：逻辑回归

最新推荐文章于 2024-06-25 23:54:48 发布

岸芷汀兰whu

最新推荐文章于 2024-06-25 23:54:48 发布

阅读量6.9k

点赞数 2

分类专栏：机器学习 R 文章标签：机器学习 logistic

本文链接：https://blog.csdn.net/u012432611/article/details/50066061

版权

机器学习同时被 2 个专栏收录

27 篇文章 0 订阅

订阅专栏

10 篇文章 0 订阅

订阅专栏

逻辑回归是啥？

Logistic 回归是一个二分类算法，用来预测给定独立变量集的二分类输出。我们使用哑变量代替二分类输出。也可以把逻辑回归看成输出为类别变量的特殊的线性回归(使用对数几率作为依赖变量)。简而言之，它通过拟合一个logit函数预测一件事情的发生的概率。

逻辑回归方程的由来

广义线性模型的基本等式是；

g (E (y)) = α + β x 1 + γ x 2

$g(E(y))=\alpha +\beta x1+\gamma x2$
注意：

GLM不假设自变量和因变量之间线性相关，而是假设连接函数和自变量之间线性相关。
依赖变量不需要规则化
使用极大似然估计
误差独立但未必是正态分布
连接函数要满足的条件
由于我们只关心输出的概率，连接函数应输出概率，因此应满足：
总是正的
不大于1
考虑一个简单的因变量被连接函数作用的线性回归
$g (y) = β 0 + β (A g e)$ $g(y)=\beta0+\beta(Age)$ —(a)
我们先把g()记为p得了，由上面第一个条件
$p = e x p (β 0 + β (A g e)) = e (β 0 + β (A g e))$ $p = exp(\beta0+\beta(Age))=e^(\beta0+\beta(Age))$ —(b)
考虑上面第二个条件
$p = e x p ( β 0 + β ( A g e ) ) e x p ( β 0 + β ( A g e ) ) + 1 = e ( β 0 + β ( A g e ) ) e ( β 0 + β ( A g e ) ) + 1$ $p = {exp(\beta0+\beta(Age))\over exp(\beta0+\beta(Age))+1}={e^(\beta0+\beta(Age))\over e^(\beta0+\beta(Age))+1}$ —(c)
由(a),(b),(c)得到
$p = e y 1 + e y$ $p = {e^y\over 1+e^y}$ —(d)
(d)就是logistic函数
$p 1 - p = e y$ ${p\over 1-p}=e^y$
$l o g (p 1 - p) = y$ $log({p\over 1-p})=y$

评估逻辑回归

AIC—logistic调整的 $R^2$ 是AIC，它惩罚参数个数
NULL Deviance and Residual Deviance—–Null Deviance indicates the response predicted
by a model with nothing but an intercept. Lower the value, better the model. Residual
deviance indicates the response predicted by a model on adding independent variables.
Lower the value, better the model.
混淆矩阵
ROC曲线
先明确两个概念吧：
灵敏度 $S_n={TP\over TP+FN}$
特异度 $S_p={TN\over TN+FP}$
即Sn表示在真阳性样本中有多少比例能被正确检验出来，Sp表示在真阳性样本中有多少比例没有被误判。
通过采用不同的阈值，可以使第一类错误率和第二类错误率连续变化。 ROC曲线把灵敏度即真阳性率作为纵坐标轴，把假阳性率作为横坐标轴。然后根据要求确定曲线上某一适当的工作点，以此确定似然比阈值。
## 服装推荐案例 ##
Dressify 是一个服装公司，希望基于服装和市场属性找出推荐销售的的服装。

> head(train)
      ID    Style   Price Rating Size Season  NeckLine SleeveLength waiseline Material FabricType
1 100346     Sexy     Low    0.0 free Winter    v-neck    sleevless    empire   cotton    chiffon
2 100348   Casual     low    4.8 free Summer    o-neck    sleevless   natural   cotton       null
3 100349     work Average    4.7    M Spring    v-neck    sleevless      null     null       null
4 100351  Novelty Average    0.0 free winter    o-neck        short   natural polyster broadcloth
5 100352   Casual     Low    4.6 free Winter boat-neck         full   natural   cotton       null
6 100353 bohemian     Low    4.6 free winter    o-neck    sleevless    empire polyster       null
  Decoration Pattern.Type Area Recommended
1       null        solid    C           1
2       null        print    D           1
3       null         null    A           1
4       lace         null    A           1
5       null    patchwork    C           1
6       null    patchwork    A           1

## R代码实现 ##

#load data
train <- read.csv('Train_Old.csv')
install.packages('caTools')
library(caTools)
set.seed(88)
split <- sample.split(train$Recommended, SplitRatio = 0.75)
#get training and test data
dresstrain <- subset(train, split == TRUE)
dresstest <- subset(train, split == FALSE)
#logistic regression model
model <- glm (Recommended ~ .-ID, data = dresstrain, family = binomial)
summary(model)
predict <- predict(model, type = 'response')
#confusion matrix
table(dresstrain$Recommended, predict > 0.5)
    FALSE TRUE
  0   142   12
  1    15  100
#ROCR Curve
library(ROCR)
ROCRpred <- prediction(predict, dresstrain$Recommended)
ROCRperf <- performance(ROCRpred, 'tpr','fpr')
plot(ROCRperf, colorize = TRUE, text.adj = c(-0.2,1.7))

这里写图片描述

#plot glm
library(ggplot2)
ggplot(dresstrain, aes(x=Rating, y=Recommended)) + geom_point() +
stat_smooth(method="glm", family="binomial", se=FALSE)

这里写图片描述

岸芷汀兰whu

关注

2
点赞
踩
11

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录