R语言回归分析Regression Analysis

今天用wine数据集作为例子研究回归分析。首先导入和查看数据集,数据集已上传到附件链接可自行下载:

# 2. Regression Analysis
##############################

# First, download the data from Canvas and save it to your working directory where you store the data.
# Read data in csv format.
setwd('D:/rstudio')  #设置工作区,自行调整
wine <- read.csv('wine.csv')

# Let's take a look at the variable names.
names(wine)

# The variables that will be used include:
# Year: Vintage year
# Price: Price of the wine (dependent variable)
# WinterRain: Precipitation in Winter
# AGST: Average Temperature
# HarvestRain: Harvester Precipitation
# Age: Aging Period
# FrancePop: France Population

str(wine)
summary(wine)

【回归分析】

调用lm函数进行回归分析,模型1:研究AGST和Price的关系代码如下

# By using the regression function `lm()` in R, we can conduct a regression analysis. 
# lm(DependentVariable ~ IndependentVarialbe, data=Data)
?lm()

model1 <- lm(Price ~ AGST, data=wine)
summary(model1)

# Estimated Price = -3.4178 + 0.6531*AGST
# Statistical significance can be determined based on the following conditions: 
# It is more significant, 1) larger absolute value of t,t的绝对值大 2) Pr(>|t|) value less than 0.05 小于0.05, 3) more * 
# In the model 1, we can say that the coefficient of AGST is statistically significant and AGST is positively correlated to Price正相关+显著性影响.

结果如下:Price = -3.4178 + 0.6531*AGST,Pr(>|t|) 小于0.05,说明AGST显著性的影响Price,由于Estimate值是正数0.6351,所以AGST与Price正相关。

继续研究模型2:ARST+HarvestRain对Price、

模型3:AGST + HarvestRain + WinterRain + Age + FrancePop对Price的影响:

# Regression becomes a more useful tool when researchers want to look at multiple factors simultaneously: multivariate regression
model2 <- lm(Price ~ AGST + HarvestRain, data=wine)
summary(model2)

model3 <- lm(Price ~ AGST + HarvestRain + WinterRain + Age + FrancePop, data=wine)
summary(model3)

# Better R-squared, but the coefficient of FrancePop is statistically insignificant. FrancePop不显著影响
# Let's try a regression model without FrancePop. 

结果如下:

FrancePop的Pr(t)值为0.493大于0.05,所以FrancePop不能显著性的影响Price。

模型4:

model4 <- lm(Price ~ AGST + HarvestRain + WinterRain + Age, data=wine)
summary(model4)

# Adjusted R-squared!
# Adjusted R-squared is a corrected goodness-of-fit (model accuracy) measure for linear models. 
# Adjusted R-squared might decrease if additional variable(s) does not improve the model.如果变量不能改善model,Adjusted R-squared值会降低

Adjusted R-squared 值为0.7945,如果(新增加的)变量不能改善model,Adjusted R-squared值会降低。

调用cor()函数计算相关系数,观察WinterRain和Age的相关性,Age和FrancePop的相关性,以及全部变量之间的相关性:

# High correlation between independent variables can lead to multicollinearity issues 独立变量的关联-> lower model performance or biased estimator 
cor(wine$WinterRain, wine$Age)
cor(wine$Age, wine$FrancePop)
cor(wine)

plot(wine$WinterRain, wine$Age)
plot(wine$Age, wine$FrancePop)
plot(wine)

输出结果如下(看正负):

  • 5
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

蛋肠加蛋不加香菜

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值