Problem Set 1
1 Conditions for Normal Equation Prove the following theorem: The
matrix AT A is invertible if and only if thecolumns of A are linearly
independent.
1.先证:当矩阵A的列向量组线性无关,则矩阵
A
T
A
A^TA
ATA可逆。
设
A
T
A
X
=
0
A^TAX=0
ATAX=0,如果
A
T
A
A^TA
ATA可逆则方程有唯一解
X
=
0
X=0
X=0,
原命题等价于证明当矩阵A的列向量组线性无关,则
A
T
A
X
=
0
A^TAX=0
ATAX=0有唯一解
X
=
0
X=0
X=0,
有
X
T
A
T
A
X
=
0
X^TA^TAX=0
XTATAX=0,变换得
(
A
X
)
T
A
X
=
0
(AX)^TAX=0
(AX)TAX=0,
A
X
=
0
AX=0
AX=0,
设
A
=
[
a
1
,
a
2
,
.
.
.
,
a
n
]
A=[a_1,a_2,...,a_n]
A=[a1,a2,...,an],
X
=
[
x
1
,
x
2
,
.
.
.
,
x
n
]
T
X=[x_1,x_2,...,x_n]^T
X=[x1,x2,...,xn]T,
有
a
1
x
1
+
a
2
x
2
+
.
.
.
+
a
n
x
n
=
0
a_1x_1+a_2x_2+...+a_nx_n=0
a1x1+a2x2+...+anxn=0 (1),
因为A的列向量组线性无关,所以方程有唯一解
[
x
1
,
x
2
,
.
.
.
,
x
n
]
=
[
0
,
0
,
.
.
.
,
0
]
[x_1,x_2,...,x_n]=[0,0,...,0]
[x1,x2,...,xn]=[0,0,...,0],即
X
=
0
X=0
X=0,原命题得证。
若矩阵A的列向量组不满足线性无关的条件,观察式(1),不保证有唯一解 X = 0 X=0 X=0,因此矩阵 A T A A^TA ATA可逆当且仅当矩阵A的列向量组线性无关。
Newton’s Method for Computing Least SquaresIn this problem,we will
prove that if we use Newton’s method solve the leastsquares
optimization problem,then we only need one iteration to converge tothe
optimal parameter O* (a) Find the Hessian of the cost function
J(e)=责〉T(OT a()- y()2 (b)Show that the first iteration of Newton’s
method gives us 0*=(XTX)-1XTg, the solution to our least square
problem.(gdenotes the vector of the fea-tures.)
2.(a)根据Hessian矩阵定义:
H
i
,
j
=
∂
2
J
(
θ
)
∂
θ
i
∂
θ
j
=
∂
∂
θ
j
(
∂
J
(
θ
)
∂
θ
i
)
=
∂
∂
θ
j
(
∑
k
=
1
m
(
θ
T
x
(
k
)
−
y
(
k
)
)
x
i
(
k
)
)
=
∑
k
=
1
m
x
i
(
k
)
x
j
(
k
)
H_{i,j}=\frac{\partial ^2 J(\theta)}{\partial \theta _i \partial \theta _j}=\frac{\partial}{\partial \theta _j}(\frac{\partial J(\theta)}{\partial \theta _i})=\frac{\partial}{\partial \theta _j}(\sum_{k=1}^{m}(\theta ^Tx^{(k)}-y^{(k)})x^{(k)}_i)=\sum_{k=1}^{m}x^{(k)}_ix^{(k)}_j
Hi,j=∂θi∂θj∂2J(θ)=∂θj∂(∂θi∂J(θ))=∂θj∂(k=1∑m(θTx(k)−y(k))xi(k))=k=1∑mxi(k)xj(k)
(b)设
X
=
[
x
(
1
)
x
(
2
)
.
.
.
x
(
m
)
]
T
X=[x^{(1)} x^{(2)} ... x^{(m)}]^T
X=[x(1)x(2)...x(m)]T,
Y
=
[
y
(
1
)
y
(
2
)
.
.
.
y
(
m
)
]
Y=[y^{(1)} y^{(2)} ... y^{(m)}]
Y=[y(1)y(2)...y(m)],
可得
H
=
X
T
X
H=X^TX
H=XTX,
对于
▽
θ
J
(
θ
)
▽_{\theta}J(\theta)
▽θJ(θ),根据梯度的含义:
▽
θ
J
(
θ
)
=
[
∂
J
(
θ
)
∂
θ
1
∂
J
(
θ
)
∂
θ
2
.
.
.
∂
J
(
θ
)
∂
θ
n
]
T
▽_{\theta}J(\theta)=[\frac{\partial J(\theta)}{\partial \theta _1} \frac{\partial J(\theta)}{\partial \theta _2} ... \frac{\partial J(\theta)}{\partial \theta _n}]^T
▽θJ(θ)=[∂θ1∂J(θ)∂θ2∂J(θ)...∂θn∂J(θ)]T
▽
θ
J
(
θ
)
i
=
∑
k
=
1
m
(
θ
T
x
(
k
)
−
y
(
k
)
)
x
i
(
k
)
=
∑
k
=
1
m
θ
T
x
(
k
)
x
i
(
k
)
−
∑
k
=
1
m
y
(
k
)
x
i
(
k
)
=
(
X
T
X
θ
−
X
T
y
→
)
i
▽_{\theta}J(\theta)_i=\sum_{k=1}^{m}(\theta ^Tx^{(k)}-y^{(k)})x^{(k)}_i=\sum_{k=1}^{m}\theta ^Tx^{(k)}x^{(k)}_i-\sum_{k=1}^{m}y^{(k)}x^{(k)}_i \\ =(X^TX\theta-X^T\overrightarrow{y})_i
▽θJ(θ)i=∑k=1m(θTx(k)−y(k))xi(k)=∑k=1mθTx(k)xi(k)−∑k=1my(k)xi(k)=(XTXθ−XTy)i
因此
▽
θ
J
(
θ
)
=
X
T
X
θ
−
X
T
y
→
▽_{\theta}J(\theta)=X^TX\theta-X^T\overrightarrow{y}
▽θJ(θ)=XTXθ−XTy,
根据牛顿法迭代公式
θ
:
=
θ
−
H
−
1
▽
θ
J
(
θ
)
\theta:=\theta-H^{-1}▽_{\theta}J(\theta)
θ:=θ−H−1▽θJ(θ),
第一轮迭代后
θ
∗
=
θ
−
(
X
T
X
)
−
1
(
X
T
X
θ
−
X
T
y
→
)
=
(
X
T
X
)
−
1
X
T
y
→
\theta^*=\theta-(X^TX)^{-1}(X^TX\theta-X^T\overrightarrow{y})=(X^TX)^{-1}X^T\overrightarrow{y}
θ∗=θ−(XTX)−1(XTXθ−XTy)=(XTX)−1XTy.
3Prediction using Linear Regression The sales of a company (in million
dollars)for each year are shown in the tablebelow. x (year) 2005 2006
2007 2008 2009 y (sales) 12 19 29 37 45 (a)Find the least square
regression line y = aa +b. (b)Use the least squares regression line as
a model to estimate the sales of the company in 2012.
3.(a)根据定理,最小二乘的唯一解为
θ
=
(
X
T
X
)
−
1
X
T
Y
\theta=(X^TX)^{-1}X^TY
θ=(XTX)−1XTY,
根据题意设
X
=
[
1
2005
1
2006
1
2007
1
2008
1
2009
]
X=\begin{bmatrix} 1 && 2005 \\ 1 && 2006 \\ 1 && 2007 \\ 1 && 2008 \\ 1 && 2009 \\ \end{bmatrix}
X=
1111120052006200720082009
,
Y
=
[
12
19
29
37
45
]
Y=\begin{bmatrix} 12 \\ 19 \\ 29 \\ 37 \\ 45 \\ \end{bmatrix}
Y=
1219293745
,
代入计算得
θ
=
[
−
1.68
e
+
04
;
8.40
]
\theta=[-1.68e+04;8.40]
θ=[−1.68e+04;8.40],
即最小二乘回归直线为
y
=
8.40
a
−
1.68
e
4
y=8.40a-1.68e4
y=8.40a−1.68e4.
(b)预测公司2012年的销售额为70.4百万美元。
4Logistic Regression Consider the average empirical loss for logistic
regression: m J(0)= >log (1 +e-yO a’) – >log (ho(gs))) m :1 i=1 ,i=1
where g() e {-1,1} ho(z)= g(0T .c) and g(z)=1/(1+e-2). Find the
HessianH of this function,and show that for any vector z, it holds
true that H z ≥0
4.根据Hessian矩阵定义:
∂
J
(
θ
)
∂
θ
i
=
−
1
m
∑
k
=
1
m
(
1
−
g
(
θ
T
y
(
k
)
x
(
k
)
)
)
y
(
k
)
x
i
(
k
)
\frac{\partial J(\theta)}{\partial \theta _i}=-\frac{1}{m}\sum_{k=1}^{m}(1-g(\theta ^Ty^{(k)}x^{(k)}))y^{(k)}x^{(k)}_i
∂θi∂J(θ)=−m1∑k=1m(1−g(θTy(k)x(k)))y(k)xi(k)
∂
J
(
θ
)
∂
θ
i
∂
θ
j
=
1
m
∑
k
=
1
m
g
(
θ
T
y
(
k
)
x
(
k
)
)
(
1
−
g
(
θ
T
y
(
k
)
x
(
k
)
)
)
(
y
(
k
)
)
2
x
i
(
k
)
x
j
(
k
)
=
H
i
,
j
\frac{\partial J(\theta)}{\partial \theta _i \partial \theta _j}=\frac{1}{m}\sum_{k=1}^{m}g(\theta ^Ty^{(k)}x^{(k)})(1-g(\theta ^Ty^{(k)}x^{(k)}))(y^{(k)})^2x^{(k)}_ix^{(k)}_j=H_{i,j}
∂θi∂θj∂J(θ)=m1∑k=1mg(θTy(k)x(k))(1−g(θTy(k)x(k)))(y(k))2xi(k)xj(k)=Hi,j
注意到
y
(
k
)
∈
{
−
1
,
1
}
y^{(k)}\in\{-1, 1\}
y(k)∈{−1,1},
(
y
(
k
)
)
2
=
1
(y^{(k)})^2=1
(y(k))2=1,
以及
h
θ
(
x
)
=
g
(
θ
T
x
)
h_{\theta}(x)=g(\theta ^T x)
hθ(x)=g(θTx),
h
θ
(
x
)
=
h
θ
(
−
x
)
h_{\theta}(x)=h_{\theta}(-x)
hθ(x)=hθ(−x),
H
i
,
j
=
1
m
∑
k
=
1
m
h
θ
(
x
(
k
)
)
(
1
−
h
θ
(
x
(
k
)
)
)
x
i
(
k
)
x
j
(
k
)
H_{i,j}=\frac{1}{m}\sum_{k=1}^{m}h_{\theta}(x^{(k)})(1-h_{\theta}(x^{(k)}))x^{(k)}_ix^{(k)}_j
Hi,j=m1∑k=1mhθ(x(k))(1−hθ(x(k)))xi(k)xj(k),
进一步可得
H
=
1
m
∑
k
=
1
m
h
θ
(
x
(
k
)
)
(
1
−
h
θ
(
x
(
k
)
)
)
x
(
k
)
x
(
k
)
T
H=\frac{1}{m}\sum_{k=1}^{m}h_{\theta}(x^{(k)})(1-h_{\theta}(x^{(k)}))x^{(k)}x^{(k)T}
H=m1∑k=1mhθ(x(k))(1−hθ(x(k)))x(k)x(k)T,
z
T
H
z
=
1
m
∑
k
=
1
m
h
θ
(
x
(
k
)
)
(
1
−
h
θ
(
x
(
k
)
)
)
z
T
x
(
k
)
x
(
k
)
T
z
=
1
m
∑
k
=
1
m
h
θ
(
x
(
k
)
)
(
1
−
h
θ
(
x
(
k
)
)
)
(
z
T
x
(
k
)
)
2
≥
0
z^THz=\frac{1}{m}\sum_{k=1}^{m}h_{\theta}(x^{(k)})(1-h_{\theta}(x^{(k)}))z^Tx^{(k)}x^{(k)T}z \\ =\frac{1}{m}\sum_{k=1}^{m}h_{\theta}(x^{(k)})(1-h_{\theta}(x^{(k)}))(z^Tx^{(k)})^2 \geq 0
zTHz=m1∑k=1mhθ(x(k))(1−hθ(x(k)))zTx(k)x(k)Tz=m1∑k=1mhθ(x(k))(1−hθ(x(k)))(zTx(k))2≥0,命题得证。