查看要用到的数据文件和数据属性类型:
##################################
# 3. Logistic Regression Analysis逻辑回归分析
##################################
wine_log = read.csv('wine_logistic.csv')
str(wine_log)
其中,Sales中的0和1值是表示是否卖出。
用glm()函数来进行逻辑回归分析如下:
# Sales: whether it is sold or not
# By using the regression function `glm()` in R, we can conduct a logistic regression analysis.
# glm(DependentVariable ~ IndependentVarialbe, family = binomial, data=Data)
?glm()
model5 = glm(Sales ~ AGST+HarvestRain, family = binomial, data=wine_log)
summary(model5)
结果如下:
可以得出一个逻辑回归检验模型:Estimated log(p/1-p) = -76.69638 + 4.97193*AGST -0.04033*HarvestRain
其中AIC用于评价模型:AIC值越小,模型越好
接下来调用一个测试的数据集进行模型的准确率检验:
wine_log_test = read.csv('wine_logistic_test.csv')
prediction = predict(model5, type = 'response', newdata = wine_log_test)
table(wine_log_test$Sales, prediction > 0.5)
测试集检验结果如下: