公式1:
w
1
∗
=
∑
i
y
i
(
x
i
−
1
n
∑
i
x
i
)
∑
i
x
i
2
−
1
n
(
∑
i
x
i
)
2
w_1^*=\frac{\sum_i y_i(x_i-\frac{1}{n}\sum_i x_i)}{\sum_i x_i^2-\frac{1}{n}(\sum_ix_i)^2}
w1∗=∑ixi2−n1(∑ixi)2∑iyi(xi−n1∑ixi)
推导上式,给定以下条件:
{
w
0
∗
=
(
1
n
∑
i
y
i
)
−
w
1
∗
(
1
n
∑
i
x
i
)
w
1
∗
=
−
∑
i
x
i
(
w
0
∗
−
y
i
)
/
∑
i
x
i
2
\begin{cases} w_0^*=(\frac{1}{n}\sum_iy_i)-w_1^*(\frac{1}{n}\sum_ix_i)\\ w_1^*=-\sum_ix_i(w_0^*-y_i)/\sum_i x_i^2 \end{cases}
{w0∗=(n1∑iyi)−w1∗(n1∑ixi)w1∗=−∑ixi(w0∗−yi)/∑ixi2
这里相当于求解方程组,消去
w
0
∗
w_0^*
w0∗ 即可
w
1
∗
=
−
w
0
∗
∑
i
x
i
+
∑
i
x
i
y
i
∑
i
x
i
2
(1)
w_1^*=\frac{-w_0^*\sum_ix_i+\sum_ix_iy_i}{\sum_ix_i^2} \tag{1}
w1∗=∑ixi2−w0∗∑ixi+∑ixiyi(1)
( ∑ i x i ∑ i x i 2 ) ∗ w 0 ∗ = ( ∑ i x i ∑ i x i 2 ) ∗ ( ( 1 n ∑ i y i ) − w 1 ∗ ( 1 n ∑ i x i ) ) ( ∑ i x i ∑ i x i 2 ) ∗ w 0 ∗ = ( ∑ i x i ) ∗ ( 1 n ∑ i y i ) ∑ i x i 2 − ( ∑ i x i ) ∗ ( − w 1 ∗ ( 1 n ∑ i x i ) ) ∑ i x i 2 (2) \begin{aligned} & \left(\frac{\sum_ix_i}{\sum_ix_i^2}\right)*w_0^*= \left(\frac{\sum_ix_i}{\sum_ix_i^2}\right)* \left((\frac{1}{n}\sum_iy_i)-w_1^*(\frac{1}{n}\sum_ix_i)\right) \\ & \left(\frac{\sum_ix_i}{\sum_ix_i^2}\right)*w_0^*= \frac{\left(\sum_ix_i\right)*\left(\frac{1}{n}\sum_iy_i\right)}{\sum_ix_i^2}- \frac{\left(\sum_ix_i\right)*\left(-w_1^*(\frac{1}{n}\sum_ix_i)\right)}{\sum_ix_i^2} \tag{2} \end{aligned} (∑ixi2∑ixi)∗w0∗=(∑ixi2∑ixi)∗((n1i∑yi)−w1∗(n1i∑xi))(∑ixi2∑ixi)∗w0∗=∑ixi2(∑ixi)∗(n1∑iyi)−∑ixi2(∑ixi)∗(−w1∗(n1∑ixi))(2)
(
1
)
+
(
2
)
⟹
w
1
∗
−
w
1
∗
(
1
n
(
∑
i
x
i
)
2
)
∑
i
x
i
2
+
1
n
∑
i
x
i
∑
i
y
i
∑
i
x
i
2
=
∑
i
x
i
y
i
∑
i
x
i
2
⟹
w
1
∗
(
∑
i
x
i
2
−
1
n
(
∑
i
x
i
)
2
∑
i
x
i
2
)
=
∑
i
x
i
y
i
−
1
n
∑
i
x
i
∑
i
y
i
∑
i
x
i
2
⟹
w
1
∗
=
∑
i
y
i
(
x
i
−
1
n
∑
i
x
i
)
∑
i
x
i
2
−
1
n
(
∑
i
x
i
)
2
=
∑
i
x
i
(
y
i
−
1
n
∑
i
y
i
)
∑
i
x
i
2
−
1
n
(
∑
i
x
i
)
2
\begin{aligned} (1)+(2) & \Longrightarrow w_1^*-\frac{w_1^*\left(\frac{1}{n}\left(\sum_ix_i\right)^2\right)}{\sum_ix_i^2}+\frac{\frac{1}{n}\sum_ix_i\sum_iy_i}{\sum_ix_i^2}=\frac{\sum_ix_iy_i}{\sum_ix_i^2}\\ & \Longrightarrow w_1^*\left(\frac{\sum_ix_i^2-\frac{1}{n}\left(\sum_ix_i\right)^2}{\sum_ix_i^2}\right)=\frac{\sum_ix_iy_i-\frac{1}{n}\sum_ix_i\sum_iy_i}{\sum_ix_i^2}\\ & \Longrightarrow w_1^*=\frac{\sum_i y_i(x_i-\frac{1}{n}\sum_i x_i)}{\sum_i x_i^2-\frac{1}{n}(\sum_ix_i)^2}=\frac{\sum_i x_i(y_i-\frac{1}{n}\sum_i y_i)}{\sum_i x_i^2-\frac{1}{n}(\sum_ix_i)^2} \end{aligned}
(1)+(2)⟹w1∗−∑ixi2w1∗(n1(∑ixi)2)+∑ixi2n1∑ixi∑iyi=∑ixi2∑ixiyi⟹w1∗(∑ixi2∑ixi2−n1(∑ixi)2)=∑ixi2∑ixiyi−n1∑ixi∑iyi⟹w1∗=∑ixi2−n1(∑ixi)2∑iyi(xi−n1∑ixi)=∑ixi2−n1(∑ixi)2∑ixi(yi−n1∑iyi)
推导完毕.
公式2
w
^
=
(
X
T
X
)
−
1
X
T
y
\mathrm{\widehat{w}}=(\mathrm{X}^{\mathrm{T}}\mathrm{X})^{-1}\mathrm{X}^{\mathrm{T}}\mathrm{\mathbf{y}}
w
=(XTX)−1XTy
推导上式,给定以下条件:
arg min
w
0
,
w
1
L
(
w
^
)
=
∥
Y
−
X
w
^
∥
2
\argmin_{w_0,w_1}L(\widehat{w})=\|\mathrm{Y}-\mathrm{X\widehat{w}}\|^2
w0,w1argminL(w
)=∥Y−Xw
∥2
先化简:
∥
X
w
^
−
Y
∥
2
=
(
X
w
^
−
Y
)
T
(
X
w
^
−
Y
)
=
(
w
^
T
X
T
−
Y
T
)
(
X
w
^
−
Y
)
=
w
^
T
X
T
X
w
^
−
w
^
T
X
T
Y
−
Y
T
X
w
^
+
Y
T
Y
\begin{aligned} \|\mathrm{X\widehat{w}}-\mathrm{Y}\|^2 &= (\mathrm{X\widehat{w}}-\mathrm{Y})^{\mathrm{T}}(\mathrm{X\widehat{w}}-\mathrm{Y}) \\&=(\mathrm{\widehat{w}}^{\mathrm{T}}\mathrm{X}^{\mathrm{T}}-\mathrm{Y}^{\mathrm{T}})(\mathrm{X\widehat{w}}-\mathrm{Y}) \\&=\mathrm{\widehat{w}}^{\mathrm{T}}\mathrm{X}^{\mathrm{T}}\mathrm{X\widehat{w}}-\mathrm{\widehat{w}}^{\mathrm{T}}\mathrm{X}^{\mathrm{T}}\mathrm{Y}-\mathrm{Y}^{\mathrm{T}}\mathrm{X\widehat{w}}+\mathrm{Y}^{\mathrm{T}}\mathrm{Y} \end{aligned}
∥Xw
−Y∥2=(Xw
−Y)T(Xw
−Y)=(w
TXT−YT)(Xw
−Y)=w
TXTXw
−w
TXTY−YTXw
+YTY
对
w
^
\mathrm{\widehat{w}}
w
求导使得下式为0:
∂
(
∥
X
w
^
−
Y
∥
2
)
∂
w
^
=
∂
(
w
^
T
X
T
X
w
^
−
w
^
T
X
T
Y
−
Y
T
X
w
^
+
Y
T
Y
w
^
)
∂
w
^
=
0
\begin{aligned} \frac{\partial\left( \|\mathrm{X\widehat{w}}-\mathrm{Y}\|^2\right)}{\partial\mathrm{\widehat{w}}} &=\frac{\partial\left( \mathrm{\widehat{w}}^{\mathrm{T}}\mathrm{X}^{\mathrm{T}}\mathrm{X\widehat{w}}-\mathrm{\widehat{w}}^{\mathrm{T}}\mathrm{X}^{\mathrm{T}}\mathrm{Y}-\mathrm{Y}^{\mathrm{T}}\mathrm{X\widehat{w}}+\mathrm{Y}^{\mathrm{T}}\mathrm{Y}\mathrm{\widehat{w}}\right)}{\partial\mathrm{\widehat{w}}}=0 \end{aligned}
∂w
∂(∥Xw
−Y∥2)=∂w
∂(w
TXTXw
−w
TXTY−YTXw
+YTYw
)=0
以下是矩阵求导公式:
∂
(
w
^
T
X
T
X
w
^
)
∂
w
^
=
2
X
T
X
w
^
\frac{\partial\left(\mathrm{\widehat{w}}^{\mathrm{T}}\mathrm{X}^{\mathrm{T}}\mathrm{X\widehat{w}}\right)}{\partial\mathrm{\widehat{w}}}=2\mathrm{X}^{\mathrm{T}}\mathrm{X\widehat{w}}
∂w
∂(w
TXTXw
)=2XTXw
∂ ( w ^ T X T Y ) ∂ w ^ = X T Y \frac{\partial\left(\mathrm{\widehat{w}}^{\mathrm{T}}\mathrm{X}^{\mathrm{T}}\mathrm{Y}\right)}{\partial\mathrm{\widehat{w}}}=\mathrm{X}^{\mathrm{T}}\mathrm{Y} ∂w ∂(w TXTY)=XTY
∂ ( Y T X w ^ ) ∂ w ^ = X T Y \frac{\partial\left(\mathrm{Y}^{\mathrm{T}}\mathrm{X}\mathrm{\widehat{w}}\right)}{\partial\mathrm{\widehat{w}}}=\mathrm{X}^{\mathrm{T}}\mathrm{Y} ∂w ∂(YTXw )=XTY
∂ ( Y T Y ) ∂ w ^ = 0 \frac{\partial\left(\mathrm{Y}^{\mathrm{T}}\mathrm{Y}\right)}{\partial\mathrm{\widehat{w}}}=0 ∂w ∂(YTY)=0
推导:
2
X
T
X
w
^
−
X
T
Y
−
X
T
Y
=
0
⟹
2
(
X
T
X
w
^
−
X
T
Y
)
=
0
⟹
X
T
X
w
^
−
X
T
Y
=
0
⟹
(
X
T
X
)
w
^
=
X
T
Y
⟹
w
^
=
(
X
T
X
)
−
1
X
T
Y
\begin{aligned} &2\mathrm{X}^{\mathrm{T}}\mathrm{X\widehat{w}}-\mathrm{X}^{\mathrm{T}}\mathrm{Y}-\mathrm{X}^{\mathrm{T}}\mathrm{Y}=0 \\ \Longrightarrow & 2(\mathrm{X}^{\mathrm{T}}\mathrm{X\widehat{w}}-\mathrm{X}^{\mathrm{T}}\mathrm{Y})=0 \\ \Longrightarrow & \mathrm{X}^{\mathrm{T}}\mathrm{X\widehat{w}}-\mathrm{X}^{\mathrm{T}}\mathrm{Y} =0 \\ \Longrightarrow & (\mathrm{X}^{\mathrm{T}}\mathrm{X})\mathrm{\widehat{w}}=\mathrm{X}^{\mathrm{T}}\mathrm{Y} \\ \Longrightarrow & \mathrm{\widehat{w}}=(\mathrm{X}^{\mathrm{T}}\mathrm{X})^{-1}\mathrm{X}^{\mathrm{T}}\mathrm{Y} \end{aligned}
⟹⟹⟹⟹2XTXw
−XTY−XTY=02(XTXw
−XTY)=0XTXw
−XTY=0(XTX)w
=XTYw
=(XTX)−1XTY
代码实现(Python)
import numpy as np
import matplotlib.pyplot as plt
import time
def get_fake_data(iter):
X = np.random.rand(iter) * 20
noise = np.random.randn(iter)
y = 0.5 * X + noise
plt.scatter(X, y)
return X, y
def equation1(X_train, y_train):
stat_time = time.time()
num_instances = X_train.shape[0]
w1 = np.sum(y_train * (X_train - np.sum(X_train) / num_instances)) / \
(np.sum(X_train ** 2) - (np.sum(X_train) ** 2 / num_instances))
w0 = np.sum(y_train) / num_instances - w1 * (np.sum(X_train) / num_instances)
end_time = time.time()
W = np.array((w1, w0))
return W, (end_time - stat_time)
def equation2(X_train, y_train):
stat_time = time.time()
ones = np.ones(X_train.shape[0])
X = np.column_stack((X_train.reshape(X_train.shape[0], 1), ones))
W = (np.linalg.inv((X.T).dot(X)).dot(X.T)).dot(y_train)
end_time = time.time()
return W, (end_time - stat_time)
def equation1_test(X, y):
print("公式1 : ")
numInstances = X.shape[0]
train_test_split = int(numInstances * 0.7)
X_train, y_train = X[:train_test_split], y[:train_test_split]
X_test, y_test = X[train_test_split:], y[train_test_split:]
W, spend_time = equation1(X_train, y_train)
# 画图
ones = np.ones(X_test.shape[0])
X_test = np.column_stack((X_test.reshape(X_test.shape[0], 1), ones))
y_predict = X_test * W
plt.plot(X_test, y_predict, color='#3479f7')
# 输出
print("权重 : ", W)
print("运行时间 : ", spend_time)
def equation2_test(X, y):
print("公式2 : ")
numInstances = X.shape[0]
train_test_split = int(numInstances * 0.7)
X_train, y_train = X[:train_test_split], y[:train_test_split]
X_test, y_test = X[train_test_split:], y[train_test_split:]
W, spend_time = equation2(X_train, y_train)
# 画图
ones = np.ones(X_test.shape[0])
X_test = np.column_stack((X_test.reshape(X_test.shape[0], 1), ones))
y_predict = X_test * W
plt.plot(X_test, y_predict, color='#9b59b6')
print("权重 : ", W)
print("运行时间 : ", spend_time)
if __name__ == '__main__':
X, y = get_fake_data(100)
equation1_test(X, y)
equation2_test(X, y)
plt.show()
公式1 :
权重 : [0.50045109 0.06868346]
运行时间 : 0.0
公式2 :
权重 : [0.50045109 0.06868346]
运行时间 : 0.37624287605285645