Machine Learning
Assignment
1
(Linear
Algebra)
Instructor: Beilun Wang Name:Daiyang Luan ID:61518421
\begin{array}{|l|} \hline \text { Machine Learning } \\\\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \textbf { Assignment 1 (Linear Algebra) }\\\\ \text {Instructor: Beilun Wang }\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{Name:Daiyang Luan\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{ID:61518421}}\\\\ \hline \end{array}
Machine Learning Assignment 1 (Linear Algebra) Instructor: Beilun Wang Name:Daiyang Luan ID:61518421
Problem 1
Let two vectors
a
=
(
1
,
2
,
3
)
T
a=(1,2,3)^{\mathrm{T}}
a=(1,2,3)T and
b
=
(
−
8
,
1
,
2
)
T
b=(-8,1,2)^{\mathrm{T}}
b=(−8,1,2)T.Answer the following equations:
(1) Compute the
ℓ
2
\ell_{2}
ℓ2 norm of
a
a
a and
b
b
b
(2) Calculate the Euclidean distance between
a
a
a and
b
b
b
(3) Are
a
a
a and
b
b
b orthogonal?
Solution:
(1)The ℓ 2 \ell_{2} ℓ2 norm of a a a is 14 \sqrt{14} 14 and the ℓ 2 \ell_{2} ℓ2 norm of b b b is 69 \sqrt{69} 69.
(2)The Euclidean distance between a a a and b b b is 83 \sqrt{83} 83.
(3)As
a
T
b
=
1
×
(
−
8
)
+
2
×
1
+
3
×
2
=
0
a^{\mathrm{T}}b=1\times (-8)+2\times 1+3\times 2=0
aTb=1×(−8)+2×1+3×2=0,
a
a
a and
b
b
b is orthogonal.
Problem 2
Suppose
A
=
[
1
−
3
3
3
−
5
3
6
−
6
4
]
A=\left[\begin{array}{ccc}{1} & {-3} & {3} \\ {3} & {-5} & {3} \\ {6} & {-6} & {4}\end{array}\right]
A=⎣⎡136−3−5−6334⎦⎤, answer the following questions:
(1) Calculate
A
−
1
A^{-1}
A−1 and
det
(
A
)
\operatorname{det}(A)
det(A).
(2) The Rank of
A
A
A is?
(3) The trace of
A
A
A is?
(4) Calculate
A
+
A
T
A+A^{T}
A+AT
(5) Is
A
A
A an orthogonal matrix? State your reason.
(6) Calculate all the eigenvalue
λ
\lambda
λ and corresponding eigenvectors of
A
A
A.
(7) Diagonalize the matrix
A
A
A.
(8) Calculate the
ℓ
2
,
1
\ell_{2,1}
ℓ2,1 norm
∥
A
∥
2
,
1
\|A\|_{2,1}
∥A∥2,1 and the Frobenius norm (i.e.
ℓ
2
\ell_{2}
ℓ2 norm)
∥
A
∥
F
\|A\|_{F}
∥A∥F
(9) Calculate the nuclear norm
∥
A
∥
∗
\|A\|_*
∥A∥∗ and the spectral norm
∥
A
∥
2
\|A\|_{2}
∥A∥2
Solution:
(1)
[
A
I
]
=
[
1
−
3
3
1
0
0
3
−
5
3
0
1
0
6
−
6
4
0
0
1
]
⟶
r
o
w
[
1
0
0
−
1
/
8
−
3
/
8
3
/
8
0
1
0
3
/
8
−
7
/
8
3
/
8
0
0
1
3
/
4
−
3
/
4
1
/
4
]
=
[
I
A
−
1
]
\left[\begin{array}{ccc} A &I\end{array}\right]=\left[\begin{array}{ccc}1&-3&3&1&0&0\\3&-5&3&0&1&0 \\6&-6&4&0&0&1\end{array}\right]\stackrel{row }{\longrightarrow}\left[\begin{array}{ccc}1&0&0&-1/8&-3/8&3/8\\0&1&0&3/8&-7/8&3/8 \\0&0&1&3/4&-3/4&1/4\end{array}\right]=\left[\begin{array}{ccc} I &A^{-1}\end{array}\right]
[AI]=⎣⎡136−3−5−6334100010001⎦⎤⟶row⎣⎡100010001−1/83/83/4−3/8−7/8−3/43/83/81/4⎦⎤=[IA−1]
Hence,
A
−
1
=
[
−
1
/
8
−
3
/
8
3
/
8
3
/
8
−
7
/
8
3
/
8
3
/
4
−
3
/
4
1
/
4
]
A^{-1}=\left[\begin{array}{ccc}-1/8&-3/8&3/8\\3/8&-7/8&3/8 \\3/4&-3/4&1/4\end{array}\right]
A−1=⎣⎡−1/83/83/4−3/8−7/8−3/43/83/81/4⎦⎤
d
e
t
(
A
)
=
∣
1
−
3
3
3
−
5
3
6
−
6
4
∣
=
∣
1
−
3
3
0
4
−
6
0
0
4
∣
=
16
det(A)= \left|\begin{array}{cccc} 1 & -3 & 3 \\ 3 & -5 & 3\\ 6 & -6 & 4 \end{array}\right| =\left|\begin{array}{cccc} 1 & -3 & 3 \\ 0 & 4 & -6\\ 0 & 0 & 4 \end{array}\right|=16
det(A)=∣∣∣∣∣∣136−3−5−6334∣∣∣∣∣∣=∣∣∣∣∣∣100−3403−64∣∣∣∣∣∣=16
(2)As d e t ( A ) ≠ 0 det(A)\not=0 det(A)=0, A A A is a full-rank matrix. Thus, the rank of A A A is 3 3 3.
(3) t r ( A ) = 1 + ( − 5 ) + 4 = 0 tr(A)=1+(-5)+4=0 tr(A)=1+(−5)+4=0. That is, the trace of A A A is 0 0 0.
(4) A + A T = [ 1 − 3 3 3 − 5 3 6 − 6 4 ] + [ 1 3 6 − 3 − 5 − 6 3 3 4 ] = [ 2 0 9 0 − 10 − 3 9 − 3 8 ] A+A^{T}=\left[\begin{array}{ccc}1&-3&3\\3&-5&3\\6&-6&4\end{array}\right]+\left[\begin{array}{ccc}1&3&6\\-3&-5&-6\\3&3&4\end{array}\right]=\left[\begin{array}{ccc}2&0&9\\0&-10&-3\\9&-3&8\end{array}\right] A+AT=⎣⎡136−3−5−6334⎦⎤+⎣⎡1−333−536−64⎦⎤=⎣⎡2090−10−39−38⎦⎤
(5) A T A = [ 46 − 54 36 − 54 70 − 48 36 − 48 34 ] ≠ I A^{T}A=\left[\begin{array}{ccc}46&-54&36\\-54&70&-48\\36&-48&34\end{array}\right]\not=I ATA=⎣⎡46−5436−5470−4836−4834⎦⎤=I, so A A A is not an orthogonal matrix.
(6)The characteristic determinant of A A A is ∣ λ − 1 3 − 3 − 3 λ + 5 − 3 − 6 6 λ − 4 ∣ = ( λ + 2 ) 2 ( λ − 4 ) . \left|\begin{array}{cccc} \lambda-1 & 3 & -3 \\ -3 & \lambda+5 & -3\\ -6 & 6 & \lambda-4 \end{array}\right|=(\lambda+2)^{2}(\lambda-4). ∣∣∣∣∣∣λ−1−3−63λ+56−3−3λ−4∣∣∣∣∣∣=(λ+2)2(λ−4). Thus, all the eigenvalues of A A A are λ 1 = λ 2 = − 2 , λ 3 = 4. \lambda_{1}=\lambda_{2}=-2,\lambda_{3}=4. λ1=λ2=−2,λ3=4. Let A α i = λ i α i , i = 1 , 2 , 3 A\alpha_{i}=\lambda_{i}\alpha_{i},i=1,2,3 Aαi=λiαi,i=1,2,3. Then we have α 1 = [ 1 1 0 ] , α 2 = [ 0 1 1 ] , α 3 = [ 1 1 2 ] \alpha_{1}=\left[\begin{array}{ccc}1\\1\\0\end{array}\right],\alpha_{2}=\left[\begin{array}{ccc}0\\1\\1\end{array}\right],\alpha_{3}=\left[\begin{array}{ccc}1\\1\\2\end{array}\right] α1=⎣⎡110⎦⎤,α2=⎣⎡011⎦⎤,α3=⎣⎡112⎦⎤. α i ( i = 1 , 2 , 3 ) \alpha_{i}(i=1,2,3) αi(i=1,2,3) are the corresponding eigenvectors.
(7)The diagonal matrix corresponding to matrix A A A is [ − 2 0 0 0 − 2 0 0 0 4 ] \left[\begin{array}{cccc} -2 & 0 & 0 \\ 0 & -2 & 0\\ 0 &0 & 4 \end{array}\right] ⎣⎡−2000−20004⎦⎤
(8)In order to calculate the
ℓ
2
,
1
\ell_{2,1}
ℓ2,1 norm
∥
A
∥
2
,
1
\|A\|_{2,1}
∥A∥2,1, we first calculate the 2-norm of each row:
19
,
43
,
2
22
\sqrt{19},\sqrt{43},2\sqrt{22}
19,43,222. Thus,
∥
A
∥
2
,
1
=
19
+
43
+
2
22
\|A\|_{2,1}=\sqrt{19}+\sqrt{43}+2\sqrt{22}
∥A∥2,1=19+43+222.
∥
A
∥
F
=
(
∑
i
=
1
m
∑
j
=
1
n
(
a
i
j
)
2
)
1
2
=
1
+
9
+
9
+
9
+
25
+
9
+
36
+
36
+
16
=
150
.
\Vert A \Vert_F=\left({\sum\limits_{i=1}^{m}{\sum\limits_{j=1}^n{(a_{ij})^2}}}\right)^{{\frac{1}{2}}}=\sqrt{1+9+9+9+25+9+36+36+16}=\sqrt{150}.
∥A∥F=(i=1∑mj=1∑n(aij)2)21=1+9+9+9+25+9+36+36+16=150.
(9)The nuclear norm
∥
A
∥
∗
\|A\|_*
∥A∥∗ is defined as the sum of all the singular values of matrix
A
A
A. As is calculated above,
A
T
A
=
[
46
−
54
36
−
54
70
−
48
36
−
48
34
]
A^{T}A=\left[\begin{array}{ccc}46&-54&36\\-54&70&-48\\36&-48&34\end{array}\right]
ATA=⎣⎡46−5436−5470−4836−4834⎦⎤. Supposing the eigenvalues of
A
T
A
A^TA
ATA are
λ
i
,
i
=
1
,
2
,
3
\lambda_i, i=1,2,3
λi,i=1,2,3, we have
∣
λ
I
−
A
∣
=
0
|\lambda I-A|=0
∣λI−A∣=0.
That is,
∣
λ
−
46
54
−
36
54
λ
−
70
48
−
36
48
λ
−
34
∣
=
0
\left|{\begin{array}{l} \lambda-46&54&-36\\ 54&\lambda-70&48\\ -36&48&\lambda-34 \end{array}}\right|=0
∣∣∣∣∣∣λ−4654−3654λ−7048−3648λ−34∣∣∣∣∣∣=0
Hence, we have
λ
3
−
150
λ
2
+
648
λ
−
256
=
0
\lambda^3-150\lambda^2+648\lambda-256=0
λ3−150λ2+648λ−256=0
The solution of the equation is:
λ
1
=
4
\lambda_1=4
λ1=4
λ
2
=
73
+
9
65
\lambda_2=73+9\sqrt{65}
λ2=73+965
λ
3
=
73
−
9
65
\lambda_3=73-9\sqrt{65}
λ3=73−965
Thus,
∥
A
∥
∗
=
2
+
73
+
9
65
+
73
−
9
65
≈
14.727922061357859
\|A\|_*=2+\sqrt{73+9\sqrt{65}}+\sqrt{73-9\sqrt{65}}\approx14.727922061357859
∥A∥∗=2+73+965+73−965≈14.727922061357859.
∥
A
∥
2
=
m
a
x
(
A
T
A
)
=
73
+
9
65
≈
12.064838156174618
\|A\|_2=\sqrt{max(A^TA})=\sqrt{73+9\sqrt{65}}\approx 12.064838156174618
∥A∥2=max(ATA)=73+965≈12.064838156174618
Problem 3
Please give some proper steps to show how you get the answer. Let
x
=
(
x
1
,
x
2
,
x
3
)
T
x=\left(x_{1}, x_{2}, x_{3}\right)^{T}
x=(x1,x2,x3)T and
{
2
x
1
+
2
x
2
+
3
x
3
=
1
x
1
−
x
2
=
−
1
−
x
1
+
2
x
2
+
x
3
=
2
\left\{\begin{array}{l} 2 x_{1}+2 x_{2}+3 x_{3}=1 \\ x_{1}-x_{2}=-1 \\ -x_{1}+2 x_{2}+x_{3}=2 \end{array}\right.
⎩⎨⎧2x1+2x2+3x3=1x1−x2=−1−x1+2x2+x3=2
Answer the following questions:
(1) Solve the linear equations
(2) Write it into matrix form(i.e.
A
x
=
b
A x=b
Ax=b ) and we will use the same
A
A
A and
b
b
b in the following questions.
(3) The Rank of
A
A
A is?
(4) Calculate
A
−
1
A^{-1}
A−1 and
det
(
A
)
\operatorname{det}(A)
det(A)
(5) Use (4) to solve the linear equations
(6) Calculate the inner product and outer product of
x
x
x and
b
b
b.(i.e.
⟨
x
,
b
⟩
\langle x, b\rangle
⟨x,b⟩ and
x
⊗
b
x \otimes b
x⊗b )
(7) Calculate the
ℓ
1
,
ℓ
2
\ell_{1}, \ell_{2}
ℓ1,ℓ2 and
ℓ
∞
\ell_{\infty}
ℓ∞ norm of
b
b
b
(8) Suppose
y
=
(
y
1
,
y
2
,
y
3
)
,
y=\left(y_{1}, y_{2}, y_{3}\right),
y=(y1,y2,y3), calculate
y
T
A
y
,
∇
y
y
T
A
y
y^{T} A y, \nabla_{y} y^{T} A y
yTAy,∇yyTAy
(9) We add one linear equation
−
x
1
+
2
x
2
+
x
3
=
2
-x_{1}+2 x_{2}+x_{3}=2
−x1+2x2+x3=2 into linear equations above. Write it into matrix form(i.e.
A
1
x
=
b
)
\left.A_{1} x=b\right)
A1x=b)
(10) The rank of
A
1
A_{1}
A1 is?
(11) Could these linear equations
A
1
x
=
b
A_{1} x=b
A1x=b be solved? State reasons.
Solution:
(1)Solving the linear equations, we have:
x
1
=
−
1
,
x
2
=
0
,
x
3
=
1
x_1=-1, x_2=0, x_3=1
x1=−1,x2=0,x3=1.
(2)The linear equation can be written into matrix form
A
x
=
b
Ax=b
Ax=b where
A
=
[
2
2
3
1
−
1
0
−
1
2
1
]
A=\left[\begin{array}{l} 2&2&3 \\ 1&-1&0 \\ -1&2&1 \end{array}\right]
A=⎣⎡21−12−12301⎦⎤
and
b
=
[
1
−
1
2
]
b=\left[\begin{array}{l} 1\\-1\\2 \end{array}\right]
b=⎣⎡1−12⎦⎤
(3)The rank of A A A is 3.
(4)
A
−
1
=
[
1
−
4
−
3
1
−
5
−
3
−
1
6
4
]
A^{-1}=\left[\begin{array}{l} 1&-4&-3 \\ 1&-5&-3 \\ -1&6&4 \end{array}\right]
A−1=⎣⎡11−1−4−56−3−34⎦⎤
d
e
t
(
A
)
=
−
1.
det(A)=-1.
det(A)=−1.
(5)
x
=
A
−
1
b
=
[
1
−
4
−
3
1
−
5
−
3
−
1
6
4
]
[
1
−
1
2
]
=
[
−
1
0
1
]
x=A^{-1}b=\left[\begin{array}{l} 1&-4&-3 \\ 1&-5&-3 \\ -1&6&4 \end{array}\right]\left[\begin{array}{l} 1\\-1\\2 \end{array}\right]=\left[\begin{array}{l} -1\\0\\1 \end{array}\right]
x=A−1b=⎣⎡11−1−4−56−3−34⎦⎤⎣⎡1−12⎦⎤=⎣⎡−101⎦⎤
That is,
x
1
=
−
1
,
x
2
=
0
,
x
3
=
1
x_1=-1, x_2=0, x_3=1
x1=−1,x2=0,x3=1, which is consistent with the result of question1.
(6) < x , b > = 1 , x ⨂ b = [ 1 3 1 ] T <x,b>=1,x\bigotimes b=\left[\begin{array}{l} 1&3&1 \end{array}\right]^T <x,b>=1,x⨂b=[131]T
(7)The
ℓ
1
\ell_1
ℓ1 norm of
b
b
b is
∥
b
∥
1
=
1
+
1
+
2
=
4
\|b\|_1=1+1+2=4
∥b∥1=1+1+2=4.
The
ℓ
2
\ell_2
ℓ2 norm of
b
b
b is
∥
b
∥
2
=
1
+
1
+
4
=
6
\|b\|_2=\sqrt{1+1+4}=\sqrt{6}
∥b∥2=1+1+4=6.
The
ℓ
∞
\ell_\infty
ℓ∞ norm of
b
b
b is
∥
b
∥
∞
=
m
a
x
(
1
,
1
,
2
)
=
2
\|b\|_\infty=max(1,1,2)=2
∥b∥∞=max(1,1,2)=2.
(8)
y
T
A
y
=
[
y
1
y
2
y
3
]
[
2
2
3
1
−
1
0
−
1
2
1
]
[
y
1
y
2
y
3
]
=
2
y
1
2
−
y
2
2
+
y
3
2
+
3
y
1
y
2
+
2
y
2
y
3
+
2
y
1
y
3
y^TAy=\left[\begin{array}{l} y_1&y_2&y_3 \end{array}\right]\left[\begin{array}{l} 2&2&3 \\ 1&-1&0 \\ -1&2&1 \end{array}\right]\left[\begin{array}{l} y_1\\y_2\\y_3 \end{array}\right]=2y_1^2-y_2^2+y_3^2+3y_1y_2+2y_2y_3+2y_1y_3
yTAy=[y1y2y3]⎣⎡21−12−12301⎦⎤⎣⎡y1y2y3⎦⎤=2y12−y22+y32+3y1y2+2y2y3+2y1y3
∇
y
y
T
A
y
=
[
4
y
1
+
3
y
2
+
2
y
3
3
y
1
−
2
y
2
+
2
y
3
2
y
1
+
2
y
2
+
2
y
3
]
\nabla_yy^TAy=\left[\begin{array}{l} 4y_1+3y_2+2y_3\\3y_1-2y_2+2y_3\\2y_1+2y_2+2y_3 \end{array}\right]
∇yyTAy=⎣⎡4y1+3y2+2y33y1−2y2+2y32y1+2y2+2y3⎦⎤
(9)The new linear equation can be written into matrix form
A
1
x
=
b
1
A_1x=b_1
A1x=b1 where
A
1
=
[
2
2
3
1
−
1
0
−
1
2
1
−
1
2
1
]
A_1=\left[\begin{array}{l} 2&2&3 \\ 1&-1&0 \\ -1&2&1\\-1&2&1 \end{array}\right]
A1=⎣⎢⎢⎡21−1−12−1223011⎦⎥⎥⎤
and
b
1
=
[
1
−
1
2
2
]
b_1=\left[\begin{array}{l} 1\\-1\\2\\2 \end{array}\right]
b1=⎣⎢⎢⎡1−122⎦⎥⎥⎤
(10)The rank of A 1 A_1 A1 is 3.
(11)Yes.
The number of variables is the same as the rank of the new matrix
A
1
A_1
A1 and thus there is no more than one solution to the non homogeneous linear equations. Moreover, after diagonalizing the matrix
A
A
A, we can see that after deleting the row whose elements are all zero, determinant of the new matrix is not zero. This indicates that a solution exists for these linear equations.