logistic 回归
非线性回归模型
在某种因素的作用下某种结果是否发生
与线性回归的关系:
将线性回归的结果映射到【0,1】
Y为分类变量
一个自变量与Y的关系:暴露在X状态下,结果y=1的概率:
p ( y = 1 ∣ x ) = 1 1 + e x p [ − ( β 0 + β x ) ] p(y=1|x)= \frac{1}{1+exp[-(\beta_0+\beta_x)]} p(y=1∣x)=1+exp[−(β0+βx)]1
z = β 0 + β 1 x z=\beta_0+\beta_1 x z=β0+β1x
- x 为确定性变量,不是随机的
- 样本容量大于自变量
- 自变量不是精确地线性关系
- 误差项 ϵ ∼ N ( 0 , σ 2 ) \epsilon \sim N(0, \sigma^2) ϵ∼N(0,σ2) 服从正态分布,互相独立9
b<-iris3
a<-iris
dim(b)
mode(b)
names(b)
str(a)
attributes(a)
summary(a)
table(a$Species)
hist(a$Sepal.Length)
plot(density(a$Sepal.Length))
plot(a$Sepal.Length,a$Sepal.Width)
plot(a)
b<- which(a$Species=="virginica")
##数组排除,行去除,列保留
c<-a[-b,]
dim(c)
s<-sample(100,80)//
s<-sort(s)
##通过随机数取样
ir_tr<-c[s,]
ir_te<-c[-s,]##排除标签
dim(ir_tr)
## 训练模型
model<-glm(Species~.,family=binomial(link="logit"),data=ir_tr)
summary(model)
d<-predict(model,type='response')
res_tr<-ifelse(d>0.5,1,0)
e<-predict(model,type='response',newdata = ir_te)
res_te<-ifelse(d>0.5,1,0)
##控制
model<-glm(Species~.,family=binomial(link="logit"),data=ir_tr,control=list(maxit=100))
summary(model)
Call:
glm(formula = Species ~ ., family = binomial(link = “logit”),
data = ir_tr)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.462e-05 -2.110e-08 -2.110e-08 2.110e-08 1.831e-05
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.641 692624.045 0 1
Sepal.Length -9.491 228472.780 0 1
Sepal.Width -7.505 113300.415 0 1
Petal.Length 19.054 171147.486 0 1
Petal.Width 25.340 254828.923 0 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1.1070e+02 on 79 degrees of freedom
Residual deviance: 1.0442e-09 on 75 degrees of freedom
AIC: 10
Number of Fisher Scoring iterations: 25