统计学习导论_统计学习导论 | 读书笔记12 | 基函数&回归样条

最新推荐文章于 2024-05-24 10:35:33 发布

weixin_39629075

最新推荐文章于 2024-05-24 10:35:33 发布

阅读量440

点赞数

文章标签：统计学习导论

ISLR 7.3 & 7.4 - 基函数&回归样条

要点：
1.基函数
2.回归样条
-- 分段多项式
-- 约束条件与样条
-- 样条基函数
-- 确定结点的个数与位置
-- 回归样条与多项式回归的对比

1. Basic Functions

The Basic Functions are a family of transformations

that can be applied to a variable

to fit a non-linear model (e.g. Polynomial and Piecewise-constant regression):

Basic functions are fixed and known:

polynomial:
piecewise:

We can still use least squares to estimate the unknown regression coefficients and all of the inference tools like MSE, F-statistics for the linear model's overall significance are available

2. Regression Splines

To extend upon the polynomial and piecewise constant regression, there is a flexible class of basis functions

2.1 Piecewise Polynomials

「Idea」: Fitting separate low-degree polynomials over different region of

to avoid fitting a

high-degree polynomial over entire range of

「Knots」: the points where the coefficients change

Using more Knots leads to a more flexible piecewise polynomial
knots will end up fitting

different polynomials
Degrees of Freedom =

Example with

2.2 Constraints and Splines

Each constraint effectively frees up one degree of freedom

by reducing the complexity of the resulting piecewise polynomial fit

「Definition of a Degree-d Spline」 : A piecewise degree-d polynomial with continuity in derivatives up to degree 「d-1」 at each knot:

A Cubic Spline needs both 1st and 2nd derivatives are continuous at the knot
A Cubic spline with
knots uses a total of

Degrees of Freedom

2.3 The Spline Basis Representation

A cubic spline with

knots can be modeled as:

Basis Functions

are chosen ahead of time and the model can be fit using least squares

the most direct way is to start off with a basis for a cubic polynomial
,
and then add one 「truncated power basis」 function per knot
(/ksi/)
:

In order to fit a Cubic Spline to a data set with

knots, we perform Least Squares regression with an

intercept and 「

predictors」

「
Degrees of Freedom」

「Disadvantages:」
Splines can have 「high variance」 when

takes very small or large value

Boundary is the region where
is
smaller than the smallest knot or larger than the largest knot

A Natural Spline is a regression spline with additional boundary constraints:

extrapolate linearly beyond the boundary knots
the function is required to be linear at the boundary
which produces more stable estimates at the boundaries (CIs are narrower)

2.4 Knots Placement & Selection

「Placement」
In theory, place more knots (flexibility) over regions where the function seems to be changing rapidly,

and place fewer knots where
appears more
stable

In practice it is common to place knots in a 「uniform fashion」

by specifying the desired degrees of freedom first
and use software automatically place the corresponding number of knots at uniform quantiles of the data

The 3 knot locations were chosen automatically as the 25th, 50th, 75th percentiles

by requesting 3+1 = 4 degrees of freedom

「Use CV to select the best DF」Remove a portion of the data (say 10%), fit a spline with a certain number of knots to the remaining data,

and then use this spline to make predictions for the held-out portion
Repeat until each observation has been left out once, then compute the overall cross-validated RSS

The procedure can be repeated for different numbers of knots

Then the value of
giving the smallest RSS is chosen

2.5 Comparison to Polynomial Regression

Regression Splines are often better than polynomial regression.

polynomials must use a high degree to produce flexible fits
splines introduce flexibility by increasing knots but keeping the degree fixed
- more stable estimates