本文为《Linear algebra and its applications》的读书笔记
The Gram–Schmidt process
- The Gram–Schmidt process is a simple algorithm for producing an orthogonal or orthonormal basis for any nonzero subspace of R n \mathbb R^n Rn.
EXAMPLE 1
Let W = S p a n { x 1 , x 2 } W = Span\{\boldsymbol x_1, \boldsymbol x_2\} W=Span{x1,x2}, where x 1 = [ 3 6 0 ] \boldsymbol x_1 =\begin{bmatrix}3\\6\\0\end{bmatrix} x1=⎣⎡360⎦⎤ and x 2 = [ 1 2 3 ] \boldsymbol x_2 =\begin{bmatrix}1\\2\\3\end{bmatrix} x2=⎣⎡123⎦⎤. Construct an orthogonal basis { v 1 , v 2 } \{\boldsymbol v_1, \boldsymbol v_2\} {v1,v2} for W W W.
SOLUTION
- The component of
x
2
\boldsymbol x_2
x2 orthogonal to
x
1
\boldsymbol x_1
x1 is
x
2
−
p
\boldsymbol x_2 -\boldsymbol p
x2−p, which is in
W
W
W. Let
v
1
=
x
1
\boldsymbol v_1 =\boldsymbol x_1
v1=x1 and
- The next example fully illustrates the Gram–Schmidt process. Study it carefully.
EXAMPLE 2
Let x 1 = [ 1 1 1 1 ] \boldsymbol x_1 =\begin{bmatrix}1\\1\\1\\1\end{bmatrix} x1=⎣⎢⎢⎡1111⎦⎥⎥⎤, x 2 = [ 0 1 1 1 ] \boldsymbol x_2 =\begin{bmatrix}0\\1\\1\\1\end{bmatrix} x2=⎣⎢⎢⎡0111⎦⎥⎥⎤, and x 3 = [ 0 0 1 1 ] \boldsymbol x_3 =\begin{bmatrix}0\\0\\1\\1\end{bmatrix} x3=⎣⎢⎢⎡0011⎦⎥⎥⎤. Then { x 1 , x 2 , x 3 } \{\boldsymbol x_1, \boldsymbol x_2,\boldsymbol x_3\} {x1,x2,x3} is linearly independent and thus is a basis for a subspace W W W of R 4 \mathbb R^4 R4. Construct an orthogonal basis for W W W.
SOLUTION
- S t e p 1. Step 1. Step1. Let v 1 = x 1 \boldsymbol v_1 =\boldsymbol x_1 v1=x1 and W 1 = S p a n { x 1 } = S p a n { v 1 } W_1 = Span\{\boldsymbol x_1\}= Span\{\boldsymbol v_1\} W1=Span{x1}=Span{v1}.
-
S
t
e
p
2.
Step 2.
Step2. Let
v
2
\boldsymbol v_2
v2 be the vector produced by subtracting from
x
2
\boldsymbol x_2
x2 its projection onto the subspace
W
1
W_1
W1. That is, let
Then { v 1 , v 2 } \{\boldsymbol v_1,\boldsymbol v_2\} {v1,v2} is an orthogonal basis for the subspace W 2 W_2 W2 spanned by x 1 \boldsymbol x_1 x1 and x 2 \boldsymbol x_2 x2.
-
S
t
e
p
2
′
(
o
p
t
i
o
n
a
l
)
.
Step 2' (optional).
Step2′(optional). If appropriate, scale
v
2
\boldsymbol v_2
v2 to simplify later computations:
-
S
t
e
p
3.
Step 3.
Step3. Let
v
3
\boldsymbol v_3
v3 be the vector produced by subtracting from
x
3
\boldsymbol x_3
x3 its projection onto the subspace
W
2
W_2
W2. Use the orthogonal basis
{
v
1
,
v
2
′
}
\{\boldsymbol v_1,\boldsymbol v_2'\}
{v1,v2′} to compute this projection onto
W
2
W_2
W2:
- The proof of the next theorem shows that this strategy really works. Scaling of vectors is not mentioned because that is used only to simplify hand calculations.
- Theorem 11 shows that any nonzero subspace W W W of R n \mathbb R^n Rn has an orthogonal basis.
PROOF
- For 1 ≤ k ≤ p 1 \leq k \leq p 1≤k≤p, let W k = S p a n { x 1 , . . . , x k } W_k = Span \{\boldsymbol x_1,..., \boldsymbol x_k\} Wk=Span{x1,...,xk}.
- Set v 1 = x 1 \boldsymbol v_1 =\boldsymbol x_1 v1=x1, so that S p a n { v 1 } = S p a n { x 1 } Span \{v_1\}= Span \{x_1\} Span{v1}=Span{x1}.
- Suppose, for some
k
<
p
k < p
k<p, we have constructed
v
1
,
.
.
.
,
v
k
\boldsymbol v_1,...,\boldsymbol v_k
v1,...,vk so that
{
v
1
,
.
.
.
,
v
k
}
\{\boldsymbol v_1,..., \boldsymbol v_k\}
{v1,...,vk} an orthogonal basis for
W
k
W_k
Wk. Define
v k + 1 = x k + 1 − p r o j W k x k + 1 ( 2 ) \boldsymbol v_{k+1} = \boldsymbol x_{k+1}- proj_{W_k}\boldsymbol x_{k+1}\ \ \ \ \ (2) vk+1=xk+1−projWkxk+1 (2)By the Orthogonal Decomposition Theorem, v k + 1 \boldsymbol v_{k+1} vk+1 is orthogonal to W k W_k Wk.- Note that p r o j W k x k + 1 proj_{W_k}\boldsymbol x_{k+1} projWkxk+1 is in W k W_k Wk and hence also in W k + 1 W_{k+1} Wk+1. Since x k + 1 \boldsymbol x_{k+1} xk+1 is in W k + 1 W_{k+1} Wk+1, so is v k + 1 \boldsymbol v_{k+1} vk+1.
- Furthermore, v k + 1 ≠ 0 \boldsymbol v_{k+1} \neq \boldsymbol 0 vk+1=0 because x k + 1 \boldsymbol x_{k+1} xk+1 is not in W k W_k Wk.
- Hence { v 1 , . . . , v k + 1 } \{\boldsymbol v_1,..., \boldsymbol v_{k+1}\} {v1,...,vk+1} is an orthogonal set of nonzero vectors in W k + 1 W_{k+1} Wk+1. This set is an orthogonal basis for W k + 1 W_{k+1} Wk+1. Hence W k + 1 = S p a n { v 1 , . . . , v k + 1 } W_{k+1}= Span \{\boldsymbol v_1,..., \boldsymbol v_{k+1}\} Wk+1=Span{v1,...,vk+1}. When k + 1 = p k + 1 = p k+1=p, the process stops.
Orthonormal Bases
- An orthonormal basis is constructed easily from an orthogonal basis { v 1 , . . . , v p } \{\boldsymbol v_1,..., \boldsymbol v_p\} {v1,...,vp}: simply normalize (i.e., “scale”) all the v k \boldsymbol v_k vk.
Tip: When working problems by hand, this is easier than normalizing each v k \boldsymbol v_k vk as soon as it is found (because it avoids unnecessary writing of square roots).
QR Factorization of Matrices
- If an
m
×
n
m \times n
m×n matrix
A
A
A has linearly independent columns
{
x
1
,
.
.
.
,
x
n
}
\{\boldsymbol x_1,..., \boldsymbol x_{n}\}
{x1,...,xn}, then applying the Gram–Schmidt process (with normalizations) to
{
x
1
,
.
.
.
,
x
n
}
\{\boldsymbol x_1,..., \boldsymbol x_{n}\}
{x1,...,xn} amounts to
f
a
c
t
o
r
i
n
g
factoring
factoring
A
A
A.
- This factorization is widely used in computer algorithms for various computations, such as solving equations (discussed in Section 6.5) and finding eigenvalues (mentioned in the exercises for Section 5.2).
PROOF
- The columns of
A
A
A form a basis
{
x
1
,
.
.
.
,
x
n
}
\{\boldsymbol x_1,..., \boldsymbol x_{n}\}
{x1,...,xn} for
C
o
l
A
ColA
ColA. Construct an orthonormal basis
{
u
1
,
.
.
.
,
u
n
}
\{\boldsymbol u_1,..., \boldsymbol u_{n}\}
{u1,...,un} for
W
=
C
o
l
A
W = ColA
W=ColA. This basis may be constructed by the Gram–Schmidt process or some other means. Let
For k = 1 , . . . , n k = 1,..., n k=1,...,n, x k \boldsymbol x_k xk is in S p a n { x 1 , . . . , x k } = S p a n { u 1 , . . . , u k } Span \{\boldsymbol x_1,...,\boldsymbol x_k\}= Span\{\boldsymbol u_1,...,\boldsymbol u_k\} Span{x1,...,xk}=Span{u1,...,uk}. So there are constants, r 1 k , . . . , r k k r_{1k},..., r_{kk} r1k,...,rkk, such that
We may assume that r k k ≥ 0 r_{kk}\geq0 rkk≥0. (If r k k < 0 r_{kk} < 0 rkk<0, multiply both r k k r_{kk} rkk and u k \boldsymbol u_k uk by − 1 -1 −1.) This shows that x k \boldsymbol x_k xk is a linear combination of the columns of Q Q Q using as weights the entries in the vector
That is, x k = Q r k \boldsymbol x_k= Q\boldsymbol r_k xk=Qrk for k = 1 , . . . , n k= 1,..., n k=1,...,n. Let R = [ r 1 . . . r n ] R =[\boldsymbol r_1\ \ ...\ \ \boldsymbol r_n] R=[r1 ... rn]. Then
- The fact that R R R is invertible follows easily from the fact that the columns of A A A are linearly independent. Since R R R is clearly upper triangular, its nonnegative diagonal entries must be positive.
EXERCISES
Suppose A = Q R A = QR A=QR, where Q Q Q is m × n m \times n m×n and R R R is n × n n \times n n×n. Show that if the columns of A A A are linearly independent, then R R R must be invertible.
SOLUTION
- [Hint: Study the equation R x = 0 R\boldsymbol x = \boldsymbol 0 Rx=0 and use the fact that A = Q R A = QR A=QR.]
- Q Q Q can be constructed by Gram–Schmidt process and R R R can be calculated by R = Q T A R=Q^TA R=QTA
1 ^1 1 See F u n d a m e n t a l s Fundamentals Fundamentals o f of of M a t r i x Matrix Matrix C o m p u t a t i o n s Computations Computations, by D a v i d David David S . S. S. W a t k i n s Watkins Watkins (New York: John Wiley & Sons, 1991), pp. 167–180.
EXERCISES 23
Suppose A = Q R A = QR A=QR is a Q R QR QR factorization of an m × n m \times n m×n matrix A A A (with linearly independent columns). Partition A A A as [ A 1 A 2 ] [A_1\ \ A_2] [A1 A2], where A 1 A_1 A1 has p p p columns. Show how to obtain a QR factorization of A 1 A_1 A1 .
SOLUTION
∵
A
=
Q
R
∴
[
A
1
m
×
p
A
2
m
×
(
n
−
p
)
]
=
[
Q
1
m
×
p
Q
2
m
×
(
n
−
p
)
]
[
R
1
1
p
×
p
R
1
2
p
×
(
n
−
p
)
0
(
n
−
p
)
×
p
R
2
2
(
n
−
p
)
×
(
n
−
p
)
]
∴
A
1
m
×
p
=
Q
1
m
×
p
R
1
1
p
×
p
\because A=QR \\\therefore \begin{bmatrix}A_{1_{m\times p}}& A_{2_{m\times (n-p)}}\end{bmatrix}=\begin{bmatrix}Q_{1_{m\times p}}& Q_{2_{m\times (n-p)}}\end{bmatrix}\begin{bmatrix}R_{11_{p\times p}}& R_{12_{p\times (n-p)}}\\ 0_{(n-p)\times p}& R_{22_{(n-p)\times (n-p)}} \end{bmatrix}\\\therefore A_{1_{m\times p}}=Q_{1_{m\times p}}R_{11_{p\times p}}
∵A=QR∴[A1m×pA2m×(n−p)]=[Q1m×pQ2m×(n−p)][R11p×p0(n−p)×pR12p×(n−p)R22(n−p)×(n−p)]∴A1m×p=Q1m×pR11p×p