之前介绍了一元线性回归batchsize=1和N的情况,现在我们探讨一下多元线性回归中标签是标量
y
y
y,属性有M个,分别用
{
x
1
,
⋯
,
x
i
}
i
∈
M
\{x_1,\cdots,x_i\} i \in M
{x1,⋯,xi}i∈M表示,这样参数也是有M个,用
w
1
,
⋯
,
w
i
{w_1,\cdots,w_i}
w1,⋯,wi表示。具体表示形式如下:
y
=
b
+
w
1
x
1
+
⋯
+
w
i
x
i
y=b+w_1x_1+\cdots+w_ix_i
y=b+w1x1+⋯+wixi
为了更简洁的用向量表示,将参数
b
b
b看做
w
0
w_0
w0,这样表达式可以写为:
y
=
w
0
+
w
1
x
1
+
⋯
+
w
i
x
i
\begin{aligned} y=w_0+w_1x_1+\cdots+w_ix_i \end{aligned}
y=w0+w1x1+⋯+wixi
可以使用向量表示
w
=
{
w
0
,
⋯
,
w
i
}
T
\boldsymbol{w}=\{w_0,\cdots,w_i\}^T
w={w0,⋯,wi}T,
x
=
{
1
,
x
1
,
⋯
,
x
i
}
T
\boldsymbol{x}=\{1,x_1,\cdots,x_i\}^T
x={1,x1,⋯,xi}T,这样
y
=
w
T
x
y=\boldsymbol{w}^T\boldsymbol{x}
y=wTx。
为了简单起见,我们还是先考虑batchsize为1的情况,这时损失函数
L
L
L使用最小二乘损失表示为:
L
=
1
2
(
y
−
y
∗
)
2
=
1
2
(
w
0
+
w
1
x
1
∗
+
⋯
+
w
i
x
i
∗
−
y
∗
)
2
\begin{aligned} L&=\frac{1}{2}(y-y^*)^2\\ &=\frac{1}{2}(w_0+w_1x_1^*+\cdots+w_ix_i^*-y^*)^2 \end{aligned}
L=21(y−y∗)2=21(w0+w1x1∗+⋯+wixi∗−y∗)2
损失函数
L
L
L对
w
\boldsymbol{w}
w的分量分别求偏导:
∂
L
∂
w
0
=
(
w
0
+
w
1
x
1
∗
+
⋯
+
w
i
x
i
∗
−
y
∗
)
∗
1
\frac{\partial{L}}{\partial{w_0}}=(w_0+w_1x_1^*+\cdots+w_ix_i^*-y^*)*1
∂w0∂L=(w0+w1x1∗+⋯+wixi∗−y∗)∗1
∂
L
∂
w
1
=
(
w
0
+
w
1
x
1
∗
+
⋯
+
w
i
x
i
∗
−
y
∗
)
x
1
∗
\frac{\partial{L}}{\partial{w_1}}=(w_0+w_1x_1^*+\cdots+w_ix_i^*-y^*)x_1^*
∂w1∂L=(w0+w1x1∗+⋯+wixi∗−y∗)x1∗
…
∂
L
∂
w
i
=
(
w
0
+
w
1
x
1
∗
+
⋯
+
w
i
x
i
∗
−
y
∗
)
x
i
∗
\frac{\partial{L}}{\partial{w_i}}=(w_0+w_1x_1^*+\cdots+w_ix_i^*-y^*)x_i^*
∂wi∂L=(w0+w1x1∗+⋯+wixi∗−y∗)xi∗
因此损失函数关于
w
\boldsymbol{w}
w的梯度为:
∇
L
=
{
∂
L
∂
w
0
,
⋯
,
∂
L
∂
w
i
}
T
=
(
w
T
x
∗
−
y
∗
)
x
∗
\begin{aligned} \nabla L&=\{\frac{\partial{L}}{\partial{w_0}},\cdots,\frac{\partial{L}}{\partial{w_i}}\}^T\\ &=(\boldsymbol{w}^T\boldsymbol{x}^*-y^*)\boldsymbol{x}^* \end{aligned}
∇L={∂w0∂L,⋯,∂wi∂L}T=(wTx∗−y∗)x∗
设定步长step,参数更新方法如下:
w
n
e
w
=
w
−
s
t
e
p
∗
∇
L
\boldsymbol{w}_{new}=\boldsymbol{w}-step*\nabla L
wnew=w−step∗∇L
下面考虑batchsize为N的情况,这时损失函数
L
L
L可表示为:
L
=
∑
j
=
1
N
1
2
(
y
j
−
y
j
∗
)
2
=
∑
j
=
1
N
1
2
(
w
0
+
w
1
x
1
j
∗
+
⋯
+
w
i
x
i
j
∗
−
y
j
∗
)
2
=
∑
j
=
1
N
1
2
(
x
∗
T
w
−
y
j
∗
)
2
=
1
2
(
A
w
−
y
∗
)
T
(
A
w
−
y
∗
)
\begin{aligned} L&=\sum_{j=1}^{N}\frac{1}{2}(y^j-y^{j*})^2\\ &=\sum_{j=1}^{N}\frac{1}{2}(w_0+w_1x_1^{j*}+\cdots+w_ix_i^{j*}-y^{j*})^2\\ &=\sum_{j=1}^{N}\frac{1}{2}(\boldsymbol{x^*}^T\boldsymbol{w}-y^{j*})^2\\ &=\frac{1}{2}(A\boldsymbol{w}-\boldsymbol{y^*})^T(A\boldsymbol{w}-\boldsymbol{y^*})\\ \end{aligned}
L=j=1∑N21(yj−yj∗)2=j=1∑N21(w0+w1x1j∗+⋯+wixij∗−yj∗)2=j=1∑N21(x∗Tw−yj∗)2=21(Aw−y∗)T(Aw−y∗)
损失函数
L
L
L对
w
\boldsymbol{w}
w的分量分别求偏导:
∂
L
∂
w
0
=
∑
j
=
1
N
(
w
0
+
w
1
x
1
j
∗
+
⋯
+
w
i
x
i
j
∗
−
y
j
∗
)
∗
1
=
∑
j
=
1
N
w
0
+
∑
j
=
1
N
w
1
x
1
j
∗
+
⋯
+
∑
j
=
1
N
w
i
x
i
j
∗
−
∑
j
=
1
N
y
j
∗
=
w
0
e
T
e
+
w
1
e
T
x
1
∗
+
⋯
+
w
i
e
T
x
i
∗
−
e
T
y
∗
=
e
T
(
w
0
e
+
w
1
x
1
∗
+
⋯
+
w
i
x
i
∗
−
y
∗
)
=
e
T
(
[
e
,
x
1
∗
,
⋯
,
x
i
∗
]
w
−
y
∗
)
=
e
T
(
A
w
−
y
∗
)
\begin{aligned} \frac{\partial{L}}{\partial{w_0}}&=\sum_{j=1}^{N}(w_0+w_1x_1^{j*}+\cdots+w_ix_i^{j*}-y^{j*})*1\\ &=\sum_{j=1}^{N}w_0+\sum_{j=1}^{N}w_1x_1^{j*}+\cdots+\sum_{j=1}^{N}w_ix_i^{j*}-\sum_{j=1}^{N}y^{j*}\\ &=w_0\boldsymbol{e}^T\boldsymbol{e}+w_1\boldsymbol{e}^T\boldsymbol{x_1^*}+\cdots+w_i\boldsymbol{e}^T\boldsymbol{x_i^*}-\boldsymbol{e}^T\boldsymbol{y^*}\\ &=\boldsymbol{e}^T(w_0\boldsymbol{e}+w_1\boldsymbol{x_1^*}+\cdots+w_i\boldsymbol{x_i^*}-\boldsymbol{y^*})\\ &=\boldsymbol{e}^T([\boldsymbol{e},\boldsymbol{x_1^*},\cdots,\boldsymbol{x_i^*}]\boldsymbol{w}-\boldsymbol{y^*})\\ &=\boldsymbol{e}^T(A\boldsymbol{w}-\boldsymbol{y^*}) \end{aligned}
∂w0∂L=j=1∑N(w0+w1x1j∗+⋯+wixij∗−yj∗)∗1=j=1∑Nw0+j=1∑Nw1x1j∗+⋯+j=1∑Nwixij∗−j=1∑Nyj∗=w0eTe+w1eTx1∗+⋯+wieTxi∗−eTy∗=eT(w0e+w1x1∗+⋯+wixi∗−y∗)=eT([e,x1∗,⋯,xi∗]w−y∗)=eT(Aw−y∗)
∂
L
∂
w
1
=
∑
j
=
1
N
(
w
0
+
w
1
x
1
j
∗
+
⋯
+
w
i
x
i
j
∗
−
y
j
∗
)
x
1
j
∗
=
∑
j
=
1
N
w
0
x
1
j
∗
+
∑
j
=
1
N
w
1
x
1
j
∗
x
1
j
∗
+
⋯
+
∑
j
=
1
N
w
i
x
i
j
∗
x
1
j
∗
−
∑
j
=
1
N
y
j
∗
x
1
j
∗
=
w
0
x
1
∗
T
e
+
w
1
x
1
∗
T
x
1
∗
+
⋯
+
w
i
x
1
∗
T
x
i
∗
−
x
1
∗
T
y
∗
=
x
1
∗
T
(
w
0
e
+
w
1
x
1
∗
+
⋯
+
w
i
x
i
∗
−
y
∗
)
=
x
1
∗
T
(
[
e
,
x
1
∗
,
⋯
,
x
i
∗
]
w
−
y
∗
)
=
x
1
∗
T
(
A
w
−
y
∗
)
\begin{aligned} \frac{\partial{L}}{\partial{w_1}}&=\sum_{j=1}^{N}(w_0+w_1x_1^{j*}+\cdots+w_ix_i^{j*}-y^{j*})x_1^{j*}\\ &=\sum_{j=1}^{N}w_0x_1^{j*}+\sum_{j=1}^{N}w_1x_1^{j*}x_1^{j*}+\cdots+\sum_{j=1}^{N}w_ix_i^{j*}x_1^{j*}-\sum_{j=1}^{N}y^{j*}x_1^{j*}\\ &=w_0\boldsymbol{x_1^{*T}}\boldsymbol{e}+w_1\boldsymbol{x_1}^{*T}\boldsymbol{x_1^*}+\cdots+w_i\boldsymbol{x_1}^{*T}\boldsymbol{x_i^*}-\boldsymbol{x_1}^{*T}\boldsymbol{y^*}\\ &=\boldsymbol{x_1}^{*T}(w_0\boldsymbol{e}+w_1\boldsymbol{x_1^*}+\cdots+w_i\boldsymbol{x_i^*}-\boldsymbol{y^*})\\ &=\boldsymbol{x_1}^{*T}([\boldsymbol{e},\boldsymbol{x_1^*},\cdots,\boldsymbol{x_i^*}]\boldsymbol{w}-\boldsymbol{y^*})\\ &=\boldsymbol{x_1}^{*T}(A\boldsymbol{w}-\boldsymbol{y^*}) \end{aligned}
∂w1∂L=j=1∑N(w0+w1x1j∗+⋯+wixij∗−yj∗)x1j∗=j=1∑Nw0x1j∗+j=1∑Nw1x1j∗x1j∗+⋯+j=1∑Nwixij∗x1j∗−j=1∑Nyj∗x1j∗=w0x1∗Te+w1x1∗Tx1∗+⋯+wix1∗Txi∗−x1∗Ty∗=x1∗T(w0e+w1x1∗+⋯+wixi∗−y∗)=x1∗T([e,x1∗,⋯,xi∗]w−y∗)=x1∗T(Aw−y∗)
同样的方法可求出其他分量的偏导
∂
L
∂
w
i
=
x
i
∗
T
(
A
w
−
y
∗
)
\frac{\partial{L}}{\partial{w_i}}=\boldsymbol{x_i}^{*T}(A\boldsymbol{w}-\boldsymbol{y^*})
∂wi∂L=xi∗T(Aw−y∗)
其中
A
=
[
e
,
x
1
∗
,
⋯
,
x
i
∗
]
A=[\boldsymbol{e},\boldsymbol{x_1^*},\cdots,\boldsymbol{x_i^*}]
A=[e,x1∗,⋯,xi∗]为每批batch的x值矩阵,在第一列增加了一个全1的列。损失函数关于
w
\boldsymbol{w}
w的梯度为:
∇
L
=
{
∂
L
∂
w
0
,
⋯
,
∂
L
∂
w
i
}
T
=
[
e
T
,
x
1
∗
T
,
⋯
,
x
i
∗
T
]
T
(
A
w
−
y
∗
)
=
A
T
(
A
w
−
y
∗
)
\begin{aligned} \nabla L&=\{\frac{\partial{L}}{\partial{w_0}},\cdots,\frac{\partial{L}}{\partial{w_i}}\}^T\\ &=[\boldsymbol{e}^T,\boldsymbol{x_1^*}^T,\cdots,\boldsymbol{x_i^*}^T]^T(A\boldsymbol{w}-\boldsymbol{y^*})\\ &=A^T(A\boldsymbol{w}-\boldsymbol{y^*}) \end{aligned}
∇L={∂w0∂L,⋯,∂wi∂L}T=[eT,x1∗T,⋯,xi∗T]T(Aw−y∗)=AT(Aw−y∗)
设定步长step,参数更新方法如下:
w
n
e
w
=
w
−
s
t
e
p
∗
∇
L
\boldsymbol{w}_{new}=\boldsymbol{w}-step*\nabla L
wnew=w−step∗∇L
使用矩阵和向量形式可以很方便的用numpy实现多元线性回归:
x = np.array([0.1,1.2,2.1,3.8,4.1,5.4,6.2,7.1,8.2,9.3,10.4,11.2,12.3,13.8,14.9,15.5,16.2,17.1,18.5,19.2,0.1,1.2,2.1,3.8,4.1,5.4,6.2,7.1,8.2,9.3,10.4,11.2,12.3,13.8,14.9,15.5,16.2,17.1,18.5,19.2])
y = np.array([5.7,8.8,10.8,11.4,13.1,16.6,17.3,19.4,21.8,23.1,25.1,29.2,29.9,31.8,32.3,36.5,39.1,38.4,44.2,43.4])
x = x.reshape(2,int(len(x)/2)).T
x = np.insert(arr=x,values=[1],obj=0,axis=1)
y = y.reshape(1,len(y)).T
回归过程如下:
# 设定步长
step=0.001
# 存储每轮损失的loss数组
loss_list=[]
# 定义epoch
epoch=500
# 定义batch_size
batch_size=12
# 定义单位列向量e
e=np.ones(batch_size).reshape(batch_size,1)
# 定义参数w和b并初始化
w=np.zeros(3).reshape(3,1)
#梯度下降回归
for i in range(epoch) :
#计算当前输入x和标签y的索引,由于x和y数组长度一致,因此通过i整除x的长度即可获得当前索引
index = i % int(len(x)/batch_size)
# 当前轮次的x列向量值为:
cx=x[index*batch_size:(index+1)*batch_size]
# 当前轮次的y列向量值为:
cy=y[index*batch_size:(index+1)*batch_size]
# 计算当前loss
loss_list.append(float(1/2*(cx.dot(w)-cy).T.dot(cx.dot(w)-cy)))
# 计算参数w的梯度
grad_w = cx.T.dot(cx.dot(w)-cy)
# 更新w的值
w -= step*grad_w
print(loss_list)
plt.plot(loss_list)
plt.show()
print(w)