R语言swirl教程(Regression Models)1——Introduction

R语言swirl教程(Regression Models)1——Introduction

| Introduction to Regression Models. (Slides for this and other Data Science courses may be found at github https://github.com/DataScienceSpecialization/courses if you want to use them. They must be downloaded as a zip file and viewed locally. This lesson corresponds to Regression_Models/01_01_introduction.)

| This is the first lesson on Regression Models. We’ll begin with the concept of “regression toward the mean” and illustrate it with some pioneering work of the father of forensic science, Sir Francis Galton.

| Sir Francis studied the relationship between heights of parents and their children. His work showed that parents who were taller than average had children who were also tall but closer to the average height. Similarly, parents who were shorter than average had children who were also shorter than average but less so than mom and dad. That is, they were closer to the average height. From one generation to the next the heights moved closer to the average or regressed toward the mean.

| For this lesson we’ll use Sir Francis’s parent/child height data which we’ve taken the liberty to load for you as the variable, galton. (Data is from John Verzani’s website, http://wiener.math.csi.cuny.edu/UsingR/.) So let’s get started!

| Here is a plot of Galton’s data, a set of 928 parent/child height pairs. Moms’ and dads’ heights were averaged together (after moms’ heights were adjusted by a factor of 1.08). In our plot we used the R function “jitter” on the children’s heights to highlight heights that occurred most frequently. The dark spots in each column rise from left to right suggesting that children’s heights do depend on their parents’. Tall parents have tall children and short parents have short children.
在这里插入图片描述
| Here we add a red (45 degree) line of slope 1 and intercept 0 to the plot. If children tended to be the same height as their parents, we would expect the data to vary evenly about this line. We see this isn’t the case. On the left half of the plot we see a concentration of heights above the line, and on the right half we see the concentration below the line.

| Now we’ve added a blue regression line to the plot. This is the line which has the minimum variation of the data around it. (For theory see the slides.) Its slope is greater than zero indicating that parents’ heights do affect their children’s. The slope is also less than 1 as would have been the case if children tended to be the same height as their parents.

| Now’s your chance to plot in R. Type “plot(child ~ parent, galton)” at the R prompt.

plot(child ~ parent, galton)
在这里插入图片描述

| You’ll notice that this plot looks a lot different than the original we displayed. Why? Many people are the same height to within measurement error, so points fall on top of one another. You can see that some circles appear darker than others. However, by using R’s function “jitter” on the children’s heights, we can spread out the data to simulate the measurement errors and make high frequency heights more visible.

| Now it’s your turn to try. Just type “plot(jitter(child,4) ~ parent,galton)” and see the magic.

plot(jitter(child, 4) ~ parent, galton)

| Now for the regression line. This is quite easy in R. The function lm (linear model) needs a “formula” and dataset. You can type “?formula” for more information, but, in simple terms, we just need to specify the dependent variable (children’s heights) ~ the independent variable (parents’ heights).

| So generate the regression line and store it in the variable regrline. Type “regrline <- lm(child ~ parent, galton)”

regrline <- lm(child ~ parent, galton)

| Now add the regression line to the plot with “abline”. Make the line wide and red for visibility. Type
| “abline(regrline, lwd=3, col=‘red’)”

abline(regrline, lwd = 3, col = ‘red’)

| The regression line will have a slope and intercept which are estimated from data. Estimates are not exact. Their accuracy is gauged by theoretical techniques and expressed in terms of “standard error.” You can use “summary(regrline)” to examine the Galton regression line. Do this now.

summary(regrline)
Call:
lm(formula = child ~ parent, data = galton)
Residuals:
Min 1Q Median 3Q Max
-7.8050 -1.3661 0.0487 1.6339 5.9264
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.94153 2.81088 8.517 <2e-16 ***
parent 0.64629 0.04114 15.711 <2e-16 ***

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.239 on 926 degrees of freedom
Multiple R-squared: 0.2105, Adjusted R-squared: 0.2096
F-statistic: 246.8 on 1 and 926 DF, p-value: < 2.2e-16

| The slope of the line is the estimate of the coefficient, or multiplier, of “parent”, the independent variable of our data (in this case, the parents’ heights). From the output of “summary” what is the slope of the regression line?

1: .64629
2: .04114
3: 23.94153

Selection: 1

| What is the standard error of the slope?

1: .04114
2: 23.94153
3: .64629

Selection: 1

| A coefficient will be within 2 standard errors of its estimate about 95% of the time. This means the slope of our regression is significantly different than either 0 or 1 since (.64629) +/- (2*.04114) is near neither 0 nor 1.
在这里插入图片描述
| We’re now adding two blue lines to indicate the means of the children’s heights (horizontal) and the parents’ (vertical). Note that these lines and the regression line all intersect in a point. Pretty cool, huh? We’ll talk more about this in a later lesson. (Something you can look forward to.)

| The slope of a line shows how much of a change in the vertical direction is produced by a change in the horizontal direction. So, parents “1 inch” above the mean in height tend to have children who are only .65 inches above the mean. The green triangle illustrates this point. From the mean, moving a “1 inch distance” horizontally to the right (increasing the parents’ height) produces a “.65 inch” increase in the vertical direction (children’s height).

| Similarly, parents who are 1 inch below average in height have children who are only .65 inches below average height. The purple triangle illustrates this. From the mean, moving a “1 inch distance” horizontally to the left (decreasing the parents’ height) produces a “.65 inch” decrease in the vertical direction (children’s height).

| This concludes our lesson on regression toward the mean. We hope you found it above average!

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
cwt-0.1.tar.gz cwt.pdf cwtlib-1.5.tar.gz ecg-vc8.rar ecg.rar fwt_filters.rar MassSpecWavelet-manual.pdf Publ_24-27_May_2005_Rus.tar.gz toolbox.tar.Z VC实现的二维小波变换源代码.rar VC编写,具有小波变换,复原,图像融合等算法.经过调试,可以正常 运行.rar wavelet.0.3.tar.gz 图像处理中的小波变换C++源代码,含有使用文档和例子,非常全面!.rar 图像小波变换的vc++代码,包括小波反变换WaveletTransform.rar 基于Gabor小波变换的特征提取和人工智能的人脸检测系统源代码.rar 基于小波变换的图像融合技术研究.exe 基于小波变换的车型特征提取及车型分类PPT(程啸岚).rar(5.08M).rar 小波人脸表情.rar 小波代码.rar 小波包变换的代码.是学习小波变换的好的例程!-.tar 小波变换C++源代码(1).rar 小波变换C++源代码.asp.rar 小波变换C++源代码.rar 小波变换C++程序.rar 小波变换的数字水印算法,附有论文和源程序.rar 小波变换程序源代码.rar 小波变换算法.zip(121.27K).zip 小波时频画图工具包,能较好地画出小波变换地时频图.rar 小波滤波器-代码.r.. . 关于常用图像处理的v.. . 采用多种小波基的离散.. . 利用小波变换进行边缘.. . 一个.rar 常用的小波变换程序.rar 本程序实现二维灰度图像的小波变换和逆变换wavelet.rar 脊波变换.zip 这是我以前收集的一个毕业论文(包括源程序),是关于图像小波变换实现及EZW编码理论研究,供大家交流.rar 连续小波变换下载.rar 连续小波程序入门.rar
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值