标准方程法
预备知识
-
下文均使用大写字母表示矩阵,小写字母带下标表示变量,
小写字母无下标表示向量
-
一个标量多元函数对一个向量的导数是这样定义的
我们假设这个函数是 f ( x ) = w 0 x 0 + w 1 x 1 + … + w n x n f(x)=w_0x_0+w_1x_1+…+w_nx_n f(x)=w0x0+w1x1+…+wnxn
那么
∂ f ( x ) ∂ x = [ ∂ f ( x ) ∂ x 0 , ∂ f ( x ) ∂ x 1 , ∂ f ( x ) ∂ x 2 , … , ∂ f ( x ) ∂ x n ] T \frac{\partial f(x)}{\partial x}=[\frac{\partial f(x)}{\partial x_0},\frac{\partial f(x)}{\partial x_1},\frac{\partial f(x)}{\partial x_2},…,\frac{\partial f(x)}{\partial x_n}]^T ∂x∂f(x)=[∂x0∂f(x),∂x1∂f(x),∂x2∂f(x),…,∂xn∂f(x)]T -
结论一:
∂ β T x ∂ x = β \frac{\partial \beta^Tx}{\partial x}=\beta ∂x∂βTx=β
证明:这个很简单,把这个f(x)写出来就一目了然了 f ( x ) = β 1 x 1 + β 2 x 2 + β 3 x 3 + … + β n x n f(x)=\beta_1x_1+\beta_2x_2+\beta_3x_3+…+\beta_nx_n f(x)=β1x1+β2x2+β3x3+…+βnxn
对 x i x_i xi求导就是 β i \beta_i βi
-
结论二:
∂ x T A x ∂ x = ( A + A T ) x \frac{\partial x^TAx}{\partial x}=(A+A^T)x ∂x∂xTAx=(A+AT)x
证明:这个的证明稍微麻烦一点,但对本人来说还是过于easy。还是只要把f(x)展开就行了。
f ( x ) = [ x 1 , x 2 , … , x n ] [ a 11 a 12 … a 1 n a 21 a 22 … a 2 n … … . . . . . . a n 1 a n 2 … a n n ] [ x 1 x 2 . . . x n ] = ∑ i = 1 n a i i x i 2 + ∑ i = 1 n − 1 ∑ j = i + 1 n ( a i j + a j i ) x i x j f(x)= [x_1,x_2,…,x_n] \left[ \begin{matrix} a_{11} & a_{12} & … & a_{1n}\\ a_{21} & a_{22} & … & a_{2n}\\ … & … & ... & ...\\ a_{n1} & a_{n2} & … & a_{nn} \end{matrix} \right] \left[ \begin{matrix} x_1\\ x_2\\ ...\\ x_n \end{matrix} \right]=\sum_{i=1}^{n}a_{ii}x_i^2+\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}(a_{ij}+a_{ji})x_ix_j f(x)=[x1,x2,…,xn]⎣⎢⎢⎡a11a21…an1a12a22…an2……...…a1na2n...ann⎦⎥⎥⎤⎣⎢⎢⎡x1x2...xn⎦⎥⎥⎤=i=1∑naiixi2+i=1∑n−1j=i+1∑n(aij+aji)xixj
比如f(x)对 x 1 x_1 x1求导就得到 2 a 11 x 1 + ( a 21 + a 12 ) x 2 + . . . + ( a n 1 + a 1 n ) x n 2a_{11}x_1+(a_{21}+a_{12})x_2+...+(a_{n1}+a_{1n})x_n 2a11x1+(a21+a12)x2+...+(an1+a1n)xn写成矩阵形式就是
[ 2 a 11 , ( a 12 + a 21 ) , . . . , ( a 1 n + a n 1 ) ] [ x 1 x 2 . . . x n ] \left[ \begin{matrix} 2a_{11},(a_{12}+a_{21}),...,(a_{1n}+a_{n1}) \end{matrix} \right] \left[ \begin{matrix} x_1\\ x_2\\ ...\\ x_n \end{matrix} \right] [2a11,(a12+a21),...,(a1n+an1)]⎣⎢⎢⎡x1x2...xn⎦⎥⎥⎤那f(x)对整个x向量求导就可以写成
[ 2 a 11 , ( a 12 + a 21 ) , . . . , ( a 1 n + a n 1 ) ( a 21 + a 12 ) , 2 a 22 , . . . , ( a 2 n + a n 2 ) . . . . . . , . . . . . , . . . . . . , . . . . . . ( a n 1 + a 1 n ) , ( a n 2 + a 2 n ) , . . . , 2 a n n ] [ x 1 x 2 . . . x n ] \left[ \begin{matrix} 2a_{11},(a_{12}+a_{21}),...,(a_{1n}+a_{n1})\\ (a_{21}+a_{12}),2a_{22},...,(a_{2n}+a_{n2})\\ ......,.....,......,......\\ (a_{n1}+a_{1n}),(a_{n2}+a_{2n}),...,2a_{nn} \end{matrix} \right] \left[ \begin{matrix} x_1\\ x_2\\ ...\\ x_n \end{matrix} \right] ⎣⎢⎢⎡2a11,(a12+a21),...,(a1n+an1)(a21+a12),2a22,...,(a2n+an2)......,.....,......,......(an1+a1n),(an2+a2n),...,2ann⎦⎥⎥⎤⎣⎢⎢⎡x1x2...xn⎦⎥⎥⎤将左侧的大矩阵拆开,就可以得到 A + A T A+A^T A+AT,故 ∂ x T A x ∂ x = ( A + A T ) x \frac{\partial x^TAx}{\partial x}=(A+A^T)x ∂x∂xTAx=(A+AT)x
原理
梯度下降法是一步步迭代找到的极值点,但是标准方程法是直接将所有回归参数算出来了。
先来看代价函数
J
(
w
0
,
w
1
,
…
,
w
n
)
=
1
2
m
∑
i
=
1
m
(
y
i
−
h
w
(
x
i
)
)
2
J(w_0,w_1,…,w_n) = \frac{1}{2m}\sum_{i=1}^{m}{(y^i-h_w(x^i))^2}
J(w0,w1,…,wn)=2m1i=1∑m(yi−hw(xi))2
如果写成矩阵形式就是
J
(
w
)
=
1
2
m
(
y
−
X
w
)
T
(
y
−
X
w
)
J(w)=\frac{1}{2m}(y-Xw)^T(y-Xw)
J(w)=2m1(y−Xw)T(y−Xw)
根据极值点偏导数为0可得
∂
(
y
−
X
w
)
T
(
y
−
X
w
)
∂
w
=
0
\frac{\partial (y-Xw)^T(y-Xw)}{\partial w}=0
∂w∂(y−Xw)T(y−Xw)=0
∂ ( y − X w ) T ( y − X w ) ∂ w = ∂ y T y ∂ w − ∂ y T X w ∂ w − ∂ w T X T y ∂ w + ∂ w T X T X w ∂ w \frac{\partial (y-Xw)^T(y-Xw)}{\partial w}=\frac{\partial y^Ty}{\partial w}-\frac{\partial y^TXw}{\partial w}-\frac{\partial w^TX^Ty}{\partial w}+\frac{\partial w^TX^TXw}{\partial w} ∂w∂(y−Xw)T(y−Xw)=∂w∂yTy−∂w∂yTXw−∂w∂wTXTy+∂w∂wTXTXw
易知
∂
y
T
y
∂
w
=
0
\frac{\partial y^Ty}{\partial w}=0
∂w∂yTy=0
根据结论一可得
∂
y
T
X
w
∂
w
=
∂
(
X
T
y
)
T
w
∂
w
=
X
T
y
\frac{\partial y^TXw}{\partial w}=\frac{\partial (X^Ty)^Tw}{\partial w}=X^Ty
∂w∂yTXw=∂w∂(XTy)Tw=XTy
因为
y
T
X
w
y^TXw
yTXw是标量,所以
y
T
X
w
=
(
y
T
X
w
)
T
=
w
T
X
T
y
y^TXw=(y^TXw)^T=w^TX^Ty
yTXw=(yTXw)T=wTXTy,故
∂
w
T
X
T
y
∂
w
=
X
T
y
\frac{\partial w^TX^Ty}{\partial w}=X^Ty
∂w∂wTXTy=XTy
根据结论二可得
∂
w
T
X
T
X
w
∂
w
=
2
X
T
X
w
\frac{\partial w^TX^TXw}{\partial w}=2X^TXw
∂w∂wTXTXw=2XTXw
代入可得
0
−
X
T
y
−
X
T
y
+
2
X
T
X
w
=
0
0-X^Ty-X^Ty+2X^TXw=0
0−XTy−XTy+2XTXw=0
X T X w = X T y X^TXw=X^Ty XTXw=XTy
w = ( X T X ) − 1 X T y w=(X^TX)^{-1}X^Ty w=(XTX)−1XTy
这样我们就把w向量求出来了。
当然从这里可以看出 X T X X^TX XTX必须存在逆矩阵,不然无法使用标准方程法求出。
标准方程法实现一元线性回归
# encoding:utf-8
import numpy as np
import matplotlib.pyplot as plt
# 载入数据
data = np.genfromtxt("../data/data.csv", delimiter=',')
x_data = data[:, 0, np.newaxis]
y_data = data[:, 1, np.newaxis]
# 给样本加入偏置项
X_data = np.concatenate((np.ones((100, 1)), x_data), axis=1)
# 定义标准方程法求回归参数
def weights(xArray, yArray):
xMat = np.mat(xArray)
yMat = np.mat(yArray)
xTx = xMat.T * xMat
# 判断矩阵是否存在逆矩阵
if np.linalg.det(xTx) == 0.0:
print("无法使用标准方程法计算")
return
return xTx.I * xMat.T * yMat
ws = weights(X_data, y_data)
x = np.array([[20], [80]])
plt.plot(x_data, y_data, 'b.')
y = ws[0] + x * ws[1]
plt.plot(x, y, 'r')
plt.show()