Chapter 6 (Orthogonality and Least Squares): 内积空间的应用 (加权最小二乘法, Trend Analysis, 傅里叶级数, 瑞利商)

连理o

已于 2022-07-31 16:49:13 修改

阅读量608

点赞数 2

分类专栏：线性代数文章标签：线性代数

于 2020-10-07 13:50:39 首次发布

本文链接：https://blog.csdn.net/weixin_42437114/article/details/108944124

版权

线性代数专栏收录该内容

50 篇文章 27 订阅

订阅专栏

本文为《Linear algebra and its applications》的读书笔记

Weighted Least-Squares

加权最小二乘法

Weighted Least-Squares

Let $\boldsymbol y$ be a vector of $n$ observations, $y_1,..., y_n$ , and suppose we wish to approximate $\boldsymbol y$ by a vector $KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 6: \hat \̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲y$ that belongs to some specified subspace of $R^n$ . Denote the entries in $KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 6: \hat \̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲y$ by $\hat y_1,..., \hat y_n$ . Then the sum of the squares for error (误差的平方和), or SS(E), in approximating $\boldsymbol y$ by $KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 5: \hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲y$ is
This is simply $KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 26: …ldsymbol y-\hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲y\right\|$ , using the standard length in $R^n$ .
Now suppose the measurements that produced the entries in $\boldsymbol y$ are not equally reliable. Then it becomes appropriate to weight the squared errors in (1) in such a way that more importance is assigned to the more reliable measurements. If the weights are denoted by $w_1^2,...,w_2^n$ , , then the weighted sum of the squares for error is
This is the square of the length of $KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 20: …dsymbol y -\hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲y$ , where the length is derived from an inner product analogous to that in Example 1 in Section 6.7, namely,

Note: Suppose the errors in measuring the $y_i$ are independent random variables with means equal to zero and variances of $\sigma_1^2,...,\sigma_n^2$ . Then the appropriate weights in are $w_i^2=1/\sigma_i^2$ . The larger the variance of the error, the smaller the weight.

Transform a weighted least-squares problem into an equivalent ordinary least-squares problem

Let $W$ be the diagonal matrix with (positive) $w_1,...,w_n$ on its diagonal, so that
Observe that
It follows that the weighted $SS (E)$ is $KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 28: …dsymbol y-W\hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲y\right\|^2$ .
Now suppose the approximating vector $KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 5: \hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲y$ is to be constructed from the columns of a matrix $A$ . Then we seek an $KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 5: \hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲x$ that makes $KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 6: A\hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲x=\hat\boldsymb…$ as close to $\boldsymbol y$ as possible. However, the measure of closeness is the weighted error,
Thus $KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 5: \hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲x$ is the (ordinary) least-squares solution of the equation
The normal equation for the least-squares solution is

EXAMPLE 1

Find the least-squares line $=\beta_0+\beta_1x$ that best fits the data $(- 2, 3), (- 1, 5), (0, 5), (1, 4)$ , and $(2, 3)$ . Suppose the errors in measuring the $y$ -values of the last two data points are greater than for the other points. Weight these data half as much as the rest of the data.

SOLUTION

As in Section 6.6, write $X$ for the matrix $A$ and $\boldsymbol \beta$ for the vector $\boldsymbol x$ , and obtain
For a weighting matrix, choose $W$ with diagonal entries $2, 2, 2, 1$ , and $1$ . Left-multiplication by $W$ scales the rows of $X$ and $\boldsymbol y$ :
For the normal equation, compute
and solve
The desired line is
In contrast, the ordinary least-squares line for these data is
Both lines are displayed in Figure 1.

Trend Analysis of Data

数据趋势分析

Let $f$ represent an unknown function whose values are known (perhaps only approximately) at $t_0,..., t_n$ . If there is a “linear trend” in the data $f(t_0)... f(t_n)$ , then we might expect to approximate the values of $f$ by a function of the form $\beta_0 +\beta_1t$ . If there is a “quadratic trend” to the data, then we would try a function of the form $\beta_0 +\beta_1t+\beta_2t^2$ .
In some statistical problems, it is important to be able to separate the linear trend from the quadratic trend (and possibly cubic or higher-order trends).
- For instance, suppose engineers are analyzing the performance of a new car, and $f (t)$ represents the distance between the car at time $t$ and some reference point. If the car is traveling at constant velocity, then the graph of $f (t)$ should be a straight line whose slope is the car’s velocity. If the gas pedal is suddenly pressed to the floor, the graph of $f (t)$ will change to include a quadratic term and possibly a cubic term (due to the acceleration). To analyze the ability of the car to pass another car, for example, engineers may want to separate the quadratic and cubic components from the linear term.

trend analysis

If the function is approximated by a curve of the form $y=\beta_0 +\beta_1t+\beta_2t^2$ , the coefficient $\beta_2$ may not give the desired information about the quadratic trend in the data, because it may not be “independent” in a statistical sense from the other $\beta_i$ .
To make what is known as a trend analysis of the data, we introduce an inner product on the space $\mathbb P^n$ analogous to that given in Example 2 in Section 6.7. For $p, q$ in $\mathbb P^n$ , define
- In practice, statisticians seldom need to consider trends in data of degree higher than cubic or quartic. So let $p_0, p_1, p_2, p_3$ denote an orthogonal basis of the subspace $\mathbb P^3$ of $\mathbb P^n$ , obtained by applying the Gram–Schmidt process to the polynomials $1, t , t^2$ , and $t^3$ .
- By EXAMPLE 4 in Section 2.1, there is a polynomial $g$ in $\mathbb P^n$ whose values at $t_0,..., t_n$ coincide with those of the unknown function $f$ . Let $\hat g$ be the orthogonal projection (with respect to the given inner product) of $g$ onto $\mathbb P^3$ , say,
  Then $\hat g$ is called a cubic trend function, and $c_0,..., c_3$ are the trend coefficients of the data.
  - The coefficient $c_1$ measures the linear trend, $c_2$ the quadratic trend, and $c_3$ the cubic trend.
  - It turns out that if the data have certain properties, these coefficients are statistically independent.

Note: Since $p_0,..., p_3$ are orthogonal, the trend coefficients may be computed one at a time, independently of one another. (Recall that $c_i = \frac{<g, p_i>}{<p_i, p_i>}$ .)

EXAMPLE 2

The simplest and most common use of trend analysis occurs when the points $t_0,..., t_n$ can be adjusted so that they are evenly spaced and sum to zero. Fit a quadratic trend function to the data $(- 2, 3), (- 1, 5), (0, 5), (1, 4),$ and $(2, 3)$ .

SOLUTION

The $t$ -coordinates are suitably scaled to use the orthogonal polynomials found in Example 5 of Section 6.7:

在这里插入图片描述

The best approximation to the data by polynomials in $\mathbb P^2$ is the orthogonal projection given by
and
Since the coefficient of $p_2$ is not extremely small, it would be reasonable to conclude that the trend is at least quadratic. This is confirmed by the graph in Figure 2.

Fourier Series

傅里叶变换

Rayleigh quotient

瑞利商

Consider the problem of finding an eigenvalue of an $n\times n$ matrix $A$ when an approximate eigenvector $\boldsymbol v$ is known. Since $\boldsymbol v$ is not exactly correct, the equation
$A\boldsymbol v =\lambda\boldsymbol v\ \ \ \ \ \ \ \ \ \ (1)$ will probably not have a solution. However, $\lambda$ can be estimated by a least-squares solution when (1) is viewed properly.
Think of $\boldsymbol v$ as an $\times 1$ matrix $V$ , think of $\lambda$ as a vector in $R^1$ , and denote the vector $A\boldsymbol v$ by the symbol $\boldsymbol b$ . Then (1) becomes $\boldsymbol b = \lambda V$ , which may also be written as $V\lambda=\boldsymbol b$ . The least-squares solution of this system of $n$ equations in the one unknown $\lambda$ is an estimate for $\lambda$ , which is called a Rayleigh quotient.

连理o

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
2
评论
Chapter 6 (Orthogonality and Least Squares): 内积空间的应用 (加权最小二乘法, Trend Analysis, 傅里叶级数, 瑞利商)

本文为《Linear algebra and its applications》的读书笔记目录Weighted Least-Squares 加权最小二乘法Weighted Least-Squares 加权最小二乘法Let y\boldsymbol yy be a vector of nnn observations, y1,...,yny_1,..., y_ny1,...,yn, and suppose we wish to approximate y\boldsymbol yy by a vec
复制链接

扫一扫

专栏目录