SAS Module 4 Regression Analysis

SAS

Module 4 Regression Analysis

Simple Linear Regression:
one independent variable
Multiple Linear Regression:
two or more independent variables
Regression Goal:
find line that most closely match the observed relationship between X and Y and “most closely” is defined by the minimum RSS (residual sum of squares )
Standard Error:
determine how close the coefficient estimates are to the actual values. We use RSE (residual standard error) for regression, a measure of the how much the dependent variable varies from estimated one. The primary value to calculate SE is to answer two questions:

  • what is the likely range for two values of coefficients? (95% confidence interval)
  • Does the independent variables influence the value of dependent variable in a statistically significant way?

在这里插入图片描述
Hypothesis Testing: By P-value.
For single linear regression model, if the probability is too small (typically less than 5% or 1%), we reject null hypothesis (no relationship).
在这里插入图片描述
在这里插入图片描述

If reject the null hypothesis, we need RSE and R^2 to see how well the model fit the data.
R^2 is always the one to use and lies in range [0,1], 1 means it is a perfect model, 0 means it explains none of observed variation.
在这里插入图片描述
For multiple linear regression model, we test all the regression coefficients are same as H0, and at least one is different as H1.
在这里插入图片描述
在这里插入图片描述
We accept or reject H0 by F-statistic. In SAS, we use P-value associated with F-statistic (Pr>F), if it is small enough, reject H0.
在这里插入图片描述
在这里插入图片描述
Independent variables can be qualitative or quantitative.
If a categorical predictor has more than two levels, we can create one fewer dummy variables than the number of levels. For example, weight status, it can be “Normal”, “Not normal”, “Overweight”, “Not Overweight” to represent different levels.

Model Selection in SAS: based on variable importance

  • optimize the subset of variables with Backward selection

Model Extension: Interactions

  • Sometimes, interactions of multiple predictors have bigger influence to the response.
  • In SAS, interaction effects can be detected by creating an interaction plot by grouping the dependent variable according to the different possible values of the hypothesized interaction variable and plotting them separately against the target dependent variable. If the resulting lines or scatter plots are “parallel” or have some rough shape (even at different levels), there is likely no significant interaction. Otherwise, we should consider to add an interaction regressor into the model.
  • In SAS, just use “+ New Data Item” and select “interaction effect…”
    在这里插入图片描述

Polynomial Regression: add square, cubic, quartic to the model

  • model should not be too complexed (over-fit) or too simple (under-fit)
  • add more polynomial into one model may lead to overfitting issues, so we need data partition to partition available data into train set and validation set
  • rate model performance using validation data. Select the simplest model with highest validation assessment.
  • In SAS, just use “+ New Data Item” and select “Partition…”

Model Comparison:

  • We always have an initial model first with all meaningful variables
  • If a large number of data are missing, consider to use “informative missingness” function
  • Remove some variables that are not significant (variable selection)
  • Remove some variables that are highly correlated
  • Consider adding interactive effects
  • Consider adding nonlinear effects (polynomial)
  • Use partition to avoid overfitting issues
  • Consider using “group by” to separate models for separate values of categorical variables
  • Finally, in SAS, use “model comparison” to compare different models and select one best
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值