统计学习导论_统计学习导论 | 读书笔记12 | 基函数&回归样条

ISLR 7.3 & 7.4 - 基函数&回归样条

要点:
1.基函数
2.回归样条
-- 分段多项式
-- 约束条件与样条
-- 样条基函数
-- 确定结点的个数与位置
-- 回归样条与多项式回归的对比

1. Basic Functions

The Basic Functions are a family of transformations

that can be applied to a variable
to fit a non-linear model (e.g. Polynomial and Piecewise-constant regression):

Basic functions are fixed and known:

  • polynomial:
  • piecewise:

We can still use least squares to estimate the unknown regression coefficients and all of the inference tools like MSE, F-statistics for the linear model's overall significance are available

2. Regression Splines

To extend upon the polynomial and piecewise constant regression, there is a flexible class of basis functions

2.1 Piecewise Polynomials

「Idea」: Fitting separate low-degree polynomials over different region of

to avoid fitting a
high-degree polynomial over entire range of

「Knots」: the points where the coefficients change

  • Using more Knots leads to a more flexible piecewise polynomial
  • knots will end up fitting
    different polynomials
  • Degrees of Freedom =

Example with

:

2.2 Constraints and Splines

Each constraint effectively frees up one degree of freedom

  • by reducing the complexity of the resulting piecewise polynomial fit

49afe01b34c013397ef11233bfbfb7e9.png

「Definition of a Degree-d Spline」 : A piecewise degree-d polynomial with continuity in derivatives up to degree 「d-1」 at each knot:

  • A Cubic Spline needs both 1st and 2nd derivatives are continuous at the knot
  • A Cubic spline with
    knots uses a total of
    Degrees of Freedom

2.3 The Spline Basis Representation

A cubic spline with

knots can be modeled as:

Basis Functions

are chosen ahead of time and the model can be fit using least squares
  • the most direct way is to start off with a basis for a cubic polynomial
    ,
  • and then add one 「truncated power basis」 function per knot
    (/ksi/)
    :

In order to fit a Cubic Spline to a data set with

knots, we perform Least Squares regression with an
intercept and
predictors」
:
  • Degrees of Freedom」

「Disadvantages:」
Splines can have 「high variance」 when

takes very small or large value
  • Boundary is the region where
    is
    smaller than the smallest knot or larger than the largest knot

115447de4c870139cf855c92a771d330.png

A Natural Spline is a regression spline with additional boundary constraints:

  • extrapolate linearly beyond the boundary knots
  • the function is required to be linear at the boundary
  • which produces more stable estimates at the boundaries (CIs are narrower)

2.4 Knots Placement & Selection

「Placement」
In theory, place more knots (flexibility) over regions where the function seems to be changing rapidly,

  • and place fewer knots where
    appears more
    stable

In practice it is common to place knots in a 「uniform fashion」

  • by specifying the desired degrees of freedom first
  • and use software automatically place the corresponding number of knots at uniform quantiles of the data

4faa760316523cc231f1017c7533a87a.png

The 3 knot locations were chosen automatically as the 25th, 50th, 75th percentiles

  • by requesting 3+1 = 4 degrees of freedom

「Use CV to select the best DF」Remove a portion of the data (say 10%), fit a spline with a certain number of knots to the remaining data,

  • and then use this spline to make predictions for the held-out portion
  • Repeat until each observation has been left out once, then compute the overall cross-validated RSS

c87ef55029caad4b4ba9d76bb97c69fe.png

The procedure can be repeated for different numbers of knots

.
  • Then the value of
    giving the smallest RSS is chosen

2.5 Comparison to Polynomial Regression

Regression Splines are often better than polynomial regression.

  • polynomials must use a high degree to produce flexible fits
  • splines introduce flexibility by increasing knots but keeping the degree fixed
    • more stable estimates

ac69aa32b6f7c3dd73e2d966add9ca78.png

3. Reference

An Introduction to Statistical Learning, with applications in R (Springer, 2013)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值