Chapter 6 (Orthogonality and Least Squares): 内积空间的应用 (加权最小二乘法, Trend Analysis, 傅里叶级数, 瑞利商)

本文为《Linear algebra and its applications》的读书笔记

Weighted Least-Squares

加权最小二乘法

Weighted Least-Squares

  • Let y \boldsymbol y y be a vector of n n n observations, y 1 , . . . , y n y_1,..., y_n y1,...,yn, and suppose we wish to approximate y \boldsymbol y y by a vector KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 6: \hat \̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲y that belongs to some specified subspace of R n \R^n Rn. Denote the entries in KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 6: \hat \̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲y by y ^ 1 , . . . , y ^ n \hat y_1,..., \hat y_n y^1,...,y^n. Then the sum of the squares for error (误差的平方和), or SS(E), in approximating y \boldsymbol y y by KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 5: \hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲y is
    在这里插入图片描述This is simply KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 26: …ldsymbol y-\hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲y\right\|, using the standard length in R n \R^n Rn.
  • Now suppose the measurements that produced the entries in y \boldsymbol y y are not equally reliable. Then it becomes appropriate to weight the squared errors in (1) in such a way that more importance is assigned to the more reliable measurements. If the weights are denoted by w 1 2 , . . . , w 2 n w_1^2,...,w_2^n w12,...,w2n, , then the weighted sum of the squares for error is
    在这里插入图片描述This is the square of the length of KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 20: …dsymbol y -\hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲y, where the length is derived from an inner product analogous to that in Example 1 in Section 6.7, namely,
    在这里插入图片描述

Note: Suppose the errors in measuring the y i y_i yi are independent random variables with means equal to zero and variances of σ 1 2 , . . . , σ n 2 \sigma_1^2,...,\sigma_n^2 σ12,...,σn2 . Then the appropriate weights in are w i 2 = 1 / σ i 2 w_i^2=1/\sigma_i^2 wi2=1/σi2. The larger the variance of the error, the smaller the weight.


Transform a weighted least-squares problem into an equivalent ordinary least-squares problem

  • Let W W W be the diagonal matrix with (positive) w 1 , . . . , w n w_1,...,w_n w1,...,wn on its diagonal, so that
    在这里插入图片描述Observe that
    在这里插入图片描述It follows that the weighted S S ( E ) SS(E) SS(E) is KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 28: …dsymbol y-W\hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲y\right\|^2.
  • Now suppose the approximating vector KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 5: \hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲y is to be constructed from the columns of a matrix A A A. Then we seek an KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 5: \hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲x that makes KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 6: A\hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲x=\hat\boldsymb… as close to y \boldsymbol y y as possible. However, the measure of closeness is the weighted error,
    在这里插入图片描述Thus KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 5: \hat\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲ ̲x is the (ordinary) least-squares solution of the equation
    在这里插入图片描述The normal equation for the least-squares solution is
    在这里插入图片描述

EXAMPLE 1

Find the least-squares line y = β 0 + β 1 x y =\beta_0+\beta_1x y=β0+β1x that best fits the data ( − 2 , 3 ) , ( − 1 , 5 ) , ( 0 , 5 ) , ( 1 , 4 ) (-2, 3),(-1, 5),(0, 5),(1, 4) (2,3),(1,5),(0,5),(1,4), and ( 2 , 3 ) (2, 3) (2,3). Suppose the errors in measuring the y y y-values of the last two data points are greater than for the other points. Weight these data half as much as the rest of the data.

SOLUTION

  • As in Section 6.6, write X X X for the matrix A A A and β \boldsymbol \beta β for the vector x \boldsymbol x x, and obtain
    在这里插入图片描述
  • For a weighting matrix, choose W W W with diagonal entries 2 , 2 , 2 , 1 2, 2, 2, 1 2,2,2,1, and 1 1 1. Left-multiplication by W W W scales the rows of X X X and y \boldsymbol y y:
    在这里插入图片描述For the normal equation, compute
    在这里插入图片描述and solve
    在这里插入图片描述
  • The desired line is
    在这里插入图片描述In contrast, the ordinary least-squares line for these data is
    在这里插入图片描述Both lines are displayed in Figure 1.
    在这里插入图片描述

Trend Analysis of Data

数据趋势分析

  • Let f f f represent an unknown function whose values are known (perhaps only approximately) at t 0 , . . . , t n t_0,..., t_n t0,...,tn. If there is a “linear trend” in the data f ( t 0 ) . . . f ( t n ) f(t_0)... f(t_n) f(t0)...f(tn), then we might expect to approximate the values of f f f by a function of the form β 0 + β 1 t \beta_0 +\beta_1t β0+β1t. If there is a “quadratic trend” to the data, then we would try a function of the form β 0 + β 1 t + β 2 t 2 \beta_0 +\beta_1t+\beta_2t^2 β0+β1t+β2t2.
  • In some statistical problems, it is important to be able to separate the linear trend from the quadratic trend (and possibly cubic or higher-order trends).
    • For instance, suppose engineers are analyzing the performance of a new car, and f ( t ) f(t) f(t) represents the distance between the car at time t t t and some reference point. If the car is traveling at constant velocity, then the graph of f ( t ) f(t) f(t) should be a straight line whose slope is the car’s velocity. If the gas pedal is suddenly pressed to the floor, the graph of f ( t ) f(t) f(t) will change to include a quadratic term and possibly a cubic term (due to the acceleration). To analyze the ability of the car to pass another car, for example, engineers may want to separate the quadratic and cubic components from the linear term.

trend analysis

  • If the function is approximated by a curve of the form y = β 0 + β 1 t + β 2 t 2 y=\beta_0 +\beta_1t+\beta_2t^2 y=β0+β1t+β2t2, the coefficient β 2 \beta_2 β2 may not give the desired information about the quadratic trend in the data, because it may not be “independent” in a statistical sense from the other β i \beta_i βi .
  • To make what is known as a trend analysis of the data, we introduce an inner product on the space P n \mathbb P^n Pn analogous to that given in Example 2 in Section 6.7. For p , q p, q p,q in P n \mathbb P^n Pn, define
    在这里插入图片描述
    • In practice, statisticians seldom need to consider trends in data of degree higher than cubic or quartic. So let p 0 , p 1 , p 2 , p 3 p_0, p_1, p_2, p_3 p0,p1,p2,p3 denote an orthogonal basis of the subspace P 3 \mathbb P^3 P3 of P n \mathbb P^n Pn, obtained by applying the Gram–Schmidt process to the polynomials 1 , t , t 2 1, t , t^2 1,t,t2, and t 3 t^3 t3.
    • By EXAMPLE 4 in Section 2.1, there is a polynomial g g g in P n \mathbb P^n Pn whose values at t 0 , . . . , t n t_0,..., t_n t0,...,tn coincide with those of the unknown function f f f. Let g ^ \hat g g^ be the orthogonal projection (with respect to the given inner product) of g g g onto P 3 \mathbb P^3 P3, say,
      在这里插入图片描述Then g ^ \hat g g^ is called a cubic trend function, and c 0 , . . . , c 3 c_0,..., c_3 c0,...,c3 are the trend coefficients of the data.
      • The coefficient c 1 c_1 c1 measures the linear trend, c 2 c_2 c2 the quadratic trend, and c 3 c_3 c3 the cubic trend.
      • It turns out that if the data have certain properties, these coefficients are statistically independent.

Note: Since p 0 , . . . , p 3 p_0,..., p_3 p0,...,p3 are orthogonal, the trend coefficients may be computed one at a time, independently of one another. (Recall that c i = < g , p i > < p i , p i > c_i = \frac{<g, p_i>}{<p_i, p_i>} ci=<pi,pi><g,pi>.)


EXAMPLE 2

The simplest and most common use of trend analysis occurs when the points t 0 , . . . , t n t_0,..., t_n t0,...,tn can be adjusted so that they are evenly spaced and sum to zero. Fit a quadratic trend function to the data ( − 2 , 3 ) , ( − 1 , 5 ) , ( 0 , 5 ) , ( 1 , 4 ) , (-2,3),(-1, 5),(0, 5),(1, 4), (2,3),(1,5),(0,5),(1,4), and ( 2 , 3 ) (2,3) (2,3).

SOLUTION

  • The t t t-coordinates are suitably scaled to use the orthogonal polynomials found in Example 5 of Section 6.7:

在这里插入图片描述

  • The best approximation to the data by polynomials in P 2 \mathbb P^2 P2 is the orthogonal projection given by
    在这里插入图片描述and在这里插入图片描述
  • Since the coefficient of p 2 p_2 p2 is not extremely small, it would be reasonable to conclude that the trend is at least quadratic. This is confirmed by the graph in Figure 2.
    在这里插入图片描述

Fourier Series

Rayleigh quotient

瑞利商

  • Consider the problem of finding an eigenvalue of an n × n n\times n n×n matrix A A A when an approximate eigenvector v \boldsymbol v v is known. Since v \boldsymbol v v is not exactly correct, the equation
    A v = λ v            ( 1 ) A\boldsymbol v =\lambda\boldsymbol v\ \ \ \ \ \ \ \ \ \ (1) Av=λv          (1)will probably not have a solution. However, λ \lambda λ can be estimated by a least-squares solution when (1) is viewed properly.
  • Think of v \boldsymbol v v as an n × 1 n \times 1 n×1 matrix V V V , think of λ \lambda λ as a vector in R 1 \R^1 R1, and denote the vector A v A\boldsymbol v Av by the symbol b \boldsymbol b b. Then (1) becomes b = λ V \boldsymbol b = \lambda V b=λV , which may also be written as V λ = b V\lambda=\boldsymbol b =b. The least-squares solution of this system of n n n equations in the one unknown λ \lambda λ is an estimate for λ \lambda λ, which is called a Rayleigh quotient.
  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值