Lecture 1: AX and the column space of A
How to look at matrix times a vector
A
X
AX
AX:
[ 2 1 3 3 1 4 5 7 12 ] [ x 1 x 2 x 3 ] \begin{bmatrix} 2 & 1 & 3 \\ 3 & 1 & 4 \\ 5 & 7 & 12 \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ x_3 \end{bmatrix} ⎣⎡2351173412⎦⎤⎣⎡x1x2x3⎦⎤
- standard way: dot product of row x,
get a component at a time——a low level way - vector wise——right way
[ 2 1 3 3 1 4 5 7 12 ] [ x 1 x 2 x 3 ] = x 1 [ 2 3 5 ] + x 2 [ 1 1 7 ] + x 3 [ 3 4 12 ] \begin{bmatrix} 2 & 1 & 3\\ 3 & 1 &4 \\ 5 & 7 & 12 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = x_1 \begin{bmatrix} 2 \\ 3 \\ 5 \end{bmatrix} + x_2 \begin{bmatrix} 1 \\ 1 \\ 7 \end{bmatrix} + x_3 \begin{bmatrix} 3 \\ 4 \\ 12 \end{bmatrix} ⎣⎡2351173412⎦⎤⎣⎡x1x2x3⎦⎤=x1⎣⎡235⎦⎤+x2⎣⎡117⎦⎤+x3⎣⎡3412⎦⎤
think of a matrix as a whole thing,
not just a bunch of m times n numbers, but a thing —— a matrix multiplies a vector to give another vector. A X AX AX is a combination of the columns of A A A.
Column Space:
A more general situation: take a matrix A A A, take all X s Xs Xs, and imagine all the outputs, what’s that look like? It’s the column space of A A A, denoted as C ( A ) C(A) C(A), which is a space depend on A A A.
A s A s As to A A A = [ 1 3 8 1 3 8 1 3 8 ] , \begin{bmatrix} 1 & 3 & 8 \\ 1 & 3 & 8 \\ 1 & 3 & 8 \end{bmatrix}, ⎣⎡111333888⎦⎤, C ( A ) = x 1 [ 1 1 1 ] + x 2 [ 3 3 3 ] + x 3 [ 8 8 8 ] f o r a l l X C(A) = x_1\begin{bmatrix}1 \\1 \\ 1\end{bmatrix}+x_2 \begin{bmatrix} 3 \\ 3 \\ 3 \end{bmatrix} +x_3 \begin{bmatrix} 8 \\ 8 \\ 8 \end{bmatrix}\rm{for\ all\ X} C(A)=x1⎣⎡111⎦⎤+x2⎣⎡333⎦⎤+x3⎣⎡888⎦⎤for all X is a line, and we can see rank(A) = 1.
The Rank of a matrix A is the dimension of its column space.
Simply put, the columns of A (a set of vectors) form a space, which is the column space of A. The columns are the bases of that space.
The first A’s rank is two, because the third column is a combination (multiply and add) of the others. The independent columns would be bases for the column space.
Rank-1 matrixes are the building blocks of linear algebra, which would be elaborated more detailed after. A special way to write those rank one matrixes is a column times a row:
[
1
3
8
1
3
8
1
3
8
]
=
[
1
1
1
]
[
1
3
8
]
\begin{bmatrix} 1 & 3 & 8 \\ 1 & 3 & 8 \\ 1 & 3 & 8 \end{bmatrix}= \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} \begin{bmatrix} 1 & 3 & 8 \end{bmatrix}
⎣⎡111333888⎦⎤=⎣⎡111⎦⎤[138]
Independent Columns:
Find a basis for the column space of A A A and Factor A A A into C C C times R R R
Basis of a space:
- a set of independent columns (No column is a combination of others)
- their combinations have to fill the space
To find a basis of A A A, the goal is to create a matrix C C C whose columns directly from A A A, but not include any column that is a combination of previous columns. A natural construction of C C C:
- If column 1 of A A A is not all zero, put it into the matrix C C C.
- If column 2 of A A A is not a multiple of column 1 put it into C C C.
- If column 3 of A A A is not a combination of column 1 and 2, put if into C C C. C o n t i n u e Continue Continue
The final C C C will be a “basis” for the column space of A A A.
Example: If A = [ 1 3 8 1 2 6 0 1 2 ] A = \begin{bmatrix} 1 & 3 & 8\\ 1 & 2 & 6 \\ 0 & 1 & 2 \end{bmatrix} A=⎣⎡110321862⎦⎤, then C = [ 1 3 1 2 0 1 ] C = \begin{bmatrix} 1 & 3 \\ 1 & 2\\ 0 & 1 \end{bmatrix} C=⎣⎡110321⎦⎤.The column 3 of A A A is the combination of 1 and 2, so it is dropped. It’s obvious that the number of C C C’s columns is the rank of A A A, also the rank of C C C. It counts the independent columns, which is mentioned above.
Again:
The column rank = The number of independent columns.
The rank of a matrix is the dimension of its column space.
The matrix C C C connects to A A A by a third matrix R R R: A = C R A = CR A=CR. It’s an important ** factorization ** of A A A.
A
=
[
1
3
8
1
2
6
0
1
2
]
=
[
1
3
1
2
0
1
]
[
1
0
2
0
1
2
]
=
C
R
A = \begin{bmatrix} 1 & 3 & 8 \\ 1 & 2 & 6 \\ 0 & 1 & 2 \end{bmatrix} = \begin{bmatrix} 1 & 3 \\ 1 & 2 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 2\\ 0 & 1 & 2 \end{bmatrix} = CR
A=⎣⎡110321862⎦⎤=⎣⎡110321⎦⎤[100122]=CR
We can see it in the right way of matrix times vector in the beginning:
- column i i i of A A A is C C C times column i i i of R R R, take column 1 for example: [ 1 1 0 ] = 1 ∗ [ 1 1 0 ] + 0 ∗ [ 3 2 1 ] \begin{bmatrix}1 \\ 1 \\ 0\end{bmatrix} = 1 * \begin{bmatrix}1 \\ 1 \\ 0 \end{bmatrix} + 0 * \begin{bmatrix} 3 \\ 2 \\ 1 \end{bmatrix} ⎣⎡110⎦⎤=1∗⎣⎡110⎦⎤+0∗⎣⎡321⎦⎤
First great theorem: The column rank = The row rank
Next, We are going to prove a Great Theorem in Linear Algebra:
The number of independent columns equals the number of independent rows.
The column rank = The row rank
Obviously, R R R’s rank is equal to C C C’s rank, which is A A A’s column rank. Next, we need to prove R R R is a basis for A A A’s row space:
- check1: check the vectors are independent
- check2: check their combinations produce all three of A A A’s rows
Matrix multiplication in another way:
taking combinatioins of the rows of
R
R
R, and we can get
A
A
A from the rows of
A
A
A if row i of
C
C
C times
R
R
R, take row 1 for example:
[
1
3
8
]
=
1
∗
[
1
0
2
]
+
3
∗
[
0
1
2
]
\begin{bmatrix} 1 & 3 & 8 \end{bmatrix} = 1 * \begin{bmatrix} 1 & 0 & 2\end{bmatrix} + 3 * \begin{bmatrix} 0 & 1 & 2 \end{bmatrix}
[138]=1∗[102]+3∗[012].
—— The wonderful thing about matrix multiplication is that you can do it a lot of ways, it comes out the same every way, but each way tells you some different things.
So R R R is a basis of the row space of A A A. The column space and row space of A A A both have dimension 2, with 2 basis vectors——columns of C C C and rows of R R R.Proof down.
In conclusion, the proof is exactly to look at the multiplication C R CR CR in two ways. First, look at it as combinations of columns of C C C to give A A A’s columns;Second, look as combinations of rows of R R R to produce A A A’s rows. So the factorization A = C R A=CR A=CR is the key idea.
R = [ 1 0 2 0 1 2 ] R=\begin{bmatrix} 1 & 0 & 2\\ 0 & 1 & 2 \end{bmatrix} R=[100122] is a famous matrix in linear algebra, called the row reduced-echelon form of A A A, with an identity there and other columns.
A big factorization for data science is “SVD” of A, when the first factor C C C has r r r orthogonal columns and the second factor R R R has r r r orthogonal rows. ——CUR, to be introduced in follow-up lectures.
How to deal with a matrix of size
1
0
5
10^5
105, it is hard to be put into the fast memory.——sample a matrix
A
B
C
x
ABCx
ABCx is also in the
C
(
A
)
C(A)
C(A),cause it’s A times things.
matrix multiplication:
A
A
A :
m
m
m by
n
n
n matrix,
B
B
B:
n
n
n by
p
p
p matrix.
how to see
A
A
A times
B
B
B:
- dot products of rows and columns —— low level for beginners
- columns of
A
A
A times rows of
B
B
B is a high level way:
( m , n ) ( n , p ) = ( m , p ) = s u m o f ( m , 1 ) ( 1 , p ) (m, n) (n, p) = (m, p) = sum\ of\ (m, 1)(1, p) (m,n)(n,p)=(m,p)=sum of (m,1)(1,p) ——Rank 1’s matrixes are building blocks.
c o l 1 col1 col1 times r o w 1 row1 row1 + c o l k colk colk times r o w k rowk rowk + c o l n coln coln times r o w n rown rown
more in next lecture.
Conclusion:
-
column space of A A A.
-
a factorization: A = C R A=CR A=CR
-
The theorem: column rank = row rank