统计学习导论_统计学习导论 | 读书笔记11 | 多项式回归和阶梯函数

ISLR(7)- 非线性回归分析

多项式回归和阶梯函数

Note Summary:
0.从理想的线性到现实的非线性
1.多项式回归
2.Step Function
3.参考

0. Moving Beyond Linearity

相较于其他模型, 线性模型更易于描述和实现

  • 解释性能和推断理论更有优势

However, standard linear regression can have significant limitations in terms of predictive power

  • Since the linearity assumption is always an poor approximation
  • Recall that Least Squares can be improved by Ridge Regression, LASSO, PCR... to reduce the complexity of the linear model
    • reduce the variance of the estimates

Goals Beyond Linearity:

Relax the Linearity assumption while

  • still maintaining interpretability as much as possible

Extensions of linear models

  1. Polynomial Regression(7.1)
  2. Step Function(7.2)
  3. Regression Spline(7.4)
  4. Smoothing Spline(7.5)
  5. Local Regression(7.6)
  • above approaches are for modeling the relationship between a response Y and a single predictor X in a flexible way.
  1. Generalized Additive Model (GAM)
  • above approaches can be seamlessly integrated to model
    and several

1. Polynomial Regression

❝ Polynomial Regression extends the linear model by adding extra predictors, obtained by raising each of the original predictors to a power:
  • A Cubic regression uses three variables,
    , as predictors to
    provide a non-linear fit to data

Standard Linear Model to Polynomial

  • for large enough 「degree d」, polynomial regression produces an extremely non-linear curve
  • the coefficients
    are still estimated by Least Squre
  • Genearlly,
    since large d will lead polynomial curve overly flexible and take strange shapes

Wage & Age Non-Linear Relation

Fitting a degree-4 polynomial using least squares

  • the individual coefficients are not of particular interest (black box???)
  • Let
    be the value of
    age, to predict wage:

「Variance」
Compute Variance of the fit,

, we need:
  • Variance Estimates for each of the fitted coefficients
    from Least Squares
  • The Covariances between pairs of coefficient estimates,
    • Let
      be the 5x5 covariance matrix of the
  • Let

is the
estimated pointwise standard error of
  • As EACH reference point
    , this computation is repeated and get the fitted curve and twice the standard error

The pair of dotted curves at both sides of the fit are (2x) standard error curves

  • Since this (2x) quantity corresponds to an approximate 95% CI, for normally distributed error terms

5dc42312b7d991691b6bb764eb5aa62f.png

「Logsitic Regression」
We can treat Wage as a binary variable by splitting it into 「high/low earners」

  • logistic regression can be fitted to predict binary response:

Although the sample size is n = 3000, there are only 79 high earners,

  • this results in a high variance in the estimated coefficients and therefore fairly wide confidence intervals

2. Step Function

Using polynomial functions in a linear model imposes a 「global structure」 on the non-linear function of X

  • use step function to avoid such global structure
❝ Step Function cut the range of a variable into K distinct regions to produce a qualitative variable
  • this has the effect of fitting a piecewise constant function in each bin
  • and convert a continuous variable into ordered categorical variable

Create cutpoints

in the range of X, and then construct
new variables:

Since

must be in exactly one of the
intervals,
  • Use Least Squares to fit a linear model by using
    as predictors:

  • can be interpreted as the mean value of
    for
  • can represent the average increase in the response for
    in
    relative to

c1a751069ef9fe9177d0dc4920f39882.png

Fit the Logistic Regression Model to predict the probability:

「Disadvantages:」
Unless there are natural breakpoints in the predictors, piecewise-constant functions can 「miss the action」

  • age from 20 to 30

「Advantages:」
Step functions are more likely used in biostatistics and epidemiology,

  • 5-year age groups are often used to define the bins

3. 参考:

  • 《Introduction to Statistical Learning》
    • Section 7.1, 7.2

TOGO: (7) Basis Functions and Splines!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值