WEEK3-Quick guide to linear regression

Explore Statistics with R (EDX)

WEEK3-Quick guide to linear regression 笔记


一、引子

Imagine you have measured two variables for each subject in a study.
1. Variables are on interval scale or ratio scale
2. Variables are at least roughly normally distributed

You suspect that the variables may be associated. Maybe a linear model would fit:

y = a + bx + error.

Here we will not go into any depth describing the theory of linear regression. Use the example code below to learn how to add a linear regression line to a scatter plot. If you already know something about linear regression you will find some additional useful lines of code below.

(已经测量了两个变量,它们满足:1. 在区间尺或比例尺上,2. 大致正态分布;它们可能是线性关系哦~)


二、例子

#Create some data

set.seed(278)
x <- rnorm(25, mean=100, sd=10)
y <- 2 * x + 20 + rnorm(25, mean=10, sd=4)


plot(x,y)  #Do you think a linear model would fit?



相关系数R和决定系数R^2

> cor(x,y)  #If you just want the correlation coefficient
[1] 0.9548738
> cor(x,y)^2 #Or the coefficient of determination
[1] 0.911784


> lm.obj <- lm(y~x) # See how models are described in R. y depends on x
> abline(lm.obj)    #We can add the regression line to the scatterplot
> predict(lm.obj) #The predicted y-values for your x-values
       1        2        3        4        5        6        7        8 
226.2604 209.0715 239.2173 210.2221 219.6353 211.7828 202.1819 238.3487 
       9       10       11       12       13       14       15       16 
236.4330 215.9328 243.1476 228.5103 214.0388 241.6755 232.5671 206.4833 
      17       18       19       20       21       22       23       24 
216.5090 212.1111 234.9200 234.7542 239.8300 206.8384 248.6047 215.7893 
      25 
221.9042 
> points(x,predict(lm.obj), col="green") #Add predicted values to the graph
> summary(lm.obj)   #Lets look at the content of lm.obj

Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.7166 -2.1476 -0.5456  2.2163  9.9858 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  48.9354    11.4055    4.29 0.000273 ***
x             1.7973     0.1166   15.42 1.28e-13 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.375 on 23 degrees of freedom
Multiple R-squared:  0.9118,	Adjusted R-squared:  0.9079 
F-statistic: 237.7 on 1 and 23 DF,  p-value: 1.284e-13

> str(lm.obj)     #You can retrieve parts of the lm object
List of 12
 $ coefficients : Named num [1:2] 48.9 1.8
  ..- attr(*, "names")= chr [1:2] "(Intercept)" "x"
 $ residuals    : Named num [1:25] -1.659 3.734 0.911 -4.294 1.325 ...
  ..- attr(*, "names")= chr [1:25] "1" "2" "3" "4" ...
 $ effects      : Named num [1:25] -1121.35 67.46 1.89 -4.71 1.36 ...
  ..- attr(*, "names")= chr [1:25] "(Intercept)" "x" "" "" ...
 $ rank         : int 2
 $ fitted.values: Named num [1:25] 226 209 239 210 220 ...
  ..- attr(*, "names")= chr [1:25] "1" "2" "3" "4" ...
 $ assign       : int [1:2] 0 1
 $ qr           :List of 5
  ..$ qr   : num [1:25, 1:2] -5 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:25] "1" "2" "3" "4" ...
  .. .. ..$ : chr [1:2] "(Intercept)" "x"
  .. ..- attr(*, "assign")= int [1:2] 0 1
  ..$ qraux: num [1:2] 1.2 1.23
  ..$ pivot: int [1:2] 1 2
  ..$ tol  : num 1e-07
  ..$ rank : int 2
  ..- attr(*, "class")= chr "qr"
 $ df.residual  : int 23
 $ xlevels      : Named list()
 $ call         : language lm(formula = y ~ x)
 $ terms        :Classes 'terms', 'formula' length 3 y ~ x
  .. ..- attr(*, "variables")= language list(y, x)
  .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:2] "y" "x"
  .. .. .. ..$ : chr "x"
  .. ..- attr(*, "term.labels")= chr "x"
  .. ..- attr(*, "order")= int 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "predvars")= language list(y, x)
  .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. ..- attr(*, "names")= chr [1:2] "y" "x"
 $ model        :'data.frame':	25 obs. of  2 variables:
  ..$ y: num [1:25] 225 213 240 206 221 ...
  ..$ x: num [1:25] 98.7 89.1 105.9 89.7 95 ...
  ..- attr(*, "terms")=Classes 'terms', 'formula' length 3 y ~ x
  .. .. ..- attr(*, "variables")= language list(y, x)
  .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:2] "y" "x"
  .. .. .. .. ..$ : chr "x"
  .. .. ..- attr(*, "term.labels")= chr "x"
  .. .. ..- attr(*, "order")= int 1
  .. .. ..- attr(*, "intercept")= int 1
  .. .. ..- attr(*, "response")= int 1
  .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. .. ..- attr(*, "predvars")= language list(y, x)
  .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. .. ..- attr(*, "names")= chr [1:2] "y" "x"
 - attr(*, "class")= chr "lm"
> lm.obj$coefficients
(Intercept)           x 
  48.935353    1.797343 


par(mfrow=c(2,2)) #prepare for a 2x2 layout
plot(lm.obj) #The built in controls for your regression analysis
par(mfrow=c(1,1)) #Restore 1x1 layout






  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值