如何用矩阵乘法重写计算梯度函数
中间比较难理解的部分是求和符号转换成矩阵乘法。
在吴恩达课程的练习文档中,已经把每个子项列出:
等式右边还有许多细节需要明确:
针对其中的一项
x
0
(
i
)
x_0^{(i)}
x0(i)来说:
x
0
(
i
)
x_0^{(i)}
x0(i)是一个数(标量),比如是6,但是加上求和符号
∑
\sum
∑后,
x
0
(
i
)
x_0^{(i)}
x0(i)就变成一串数:“6,3,2…,
x
0
(
i
)
x_0^{(i)}
x0(i)”,一共是m个
x
0
x_0
x0(样本),在练习中一共5000个
x
0
x_0
x0,m=5000;
x
1
(
i
)
x_1^{(i)}
x1(i)是一个数(标量),比如是2,但是加上求和符号
∑
\sum
∑后,
x
1
(
i
)
x_1^{(i)}
x1(i)就变成一串数:“2,5,7…,
x
1
(
i
)
x_1^{(i)}
x1(i)”,一共是m个
x
1
x_1
x1(样本),在练习中一共5000个
x
1
x_1
x1,m=5000;
x
2
(
i
)
x_2^{(i)}
x2(i)是一个数(标量),比如是8,但是加上求和符号
∑
\sum
∑后,
x
2
(
i
)
x_2^{(i)}
x2(i)就变成一串数:“8,9,1…,
x
2
(
i
)
x_2^{(i)}
x2(i)”,一共是m个
x
2
x_2
x2(样本),在练习中一共5000个
x
2
x_2
x2,m=5000;
x
0
(
i
)
x_0^{(i)}
x0(i)的上标(i),表示第几个样本,(i)的范围是从1到5000。
x 0 ( i ) x_0^{(i)} x0(i) x 1 ( i ) x_1^{(i)} x1(i) x 2 ( i ) x_2^{(i)} x2(i)的下标0、1、2…n,表示在多项式中的第几个变量(输入值),练习中是400个输入,n=400
每个样本都有400个变量,一共有5000组样本,组合起来是一个400×5000的矩阵。
将第一步化简到第二步:
第一步中每个分项的
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
(h_\theta(x^{(i)})-y^{(i)})
(hθ(x(i))−y(i))都是同样的序列,合并后没有变化:
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
(h_\theta(x^{(i)})-y^{(i)})
(hθ(x(i))−y(i))用向量方式表示,可以写为1×5000向量。
x 0 ( i ) x_0^{(i)} x0(i) x 1 ( i ) x_1^{(i)} x1(i) x 2 ( i ) x_2^{(i)} x2(i)… x n ( i ) x_n^{(i)} xn(i),合并后由原来的400个标量 x n ( i ) x_n^{(i)} xn(i)变为向量 x ( i ) x^{(i)} x(i):
x
(
i
)
x^{(i)}
x(i)有5000组值,对应
x
(
1
)
x^{(1)}
x(1)到
x
(
5000
)
x^{(5000)}
x(5000)
x
(
1
)
x^{(1)}
x(1) = [
x
0
(
1
)
x_0^{(1)}
x0(1),
x
1
(
1
)
x_1^{(1)}
x1(1),
x
2
(
1
)
x_2^{(1)}
x2(1)…] = [6,2,8…]
x
(
2
)
x^{(2)}
x(2) = [
x
0
(
2
)
x_0^{(2)}
x0(2),
x
1
(
2
)
x_1^{(2)}
x1(2),
x
2
(
2
)
x_2^{(2)}
x2(2)…] = [3,5,9…]
x
(
3
)
x^{(3)}
x(3) = [
x
0
(
3
)
x_0^{(3)}
x0(3),
x
1
(
3
)
x_1^{(3)}
x1(3),
x
2
(
3
)
x_2^{(3)}
x2(3)…] = [2,7,1…]
x
(
i
)
x^{(i)}
x(i) = [
x
0
(
i
)
x_0^{(i)}
x0(i),
x
1
(
i
)
x_1^{(i)}
x1(i),
x
2
(
i
)
x_2^{(i)}
x2(i)…]
如果不考虑样本,通用的表示方式为:
x
x
x = [
x
0
x_0
x0,
x
1
x_1
x1,
x
2
x_2
x2…]
每个分项的 x 0 ( 1 ) x_0^{(1)} x0(1), x 0 ( 2 ) x_0^{(2)} x0(2), x 0 ( 3 ) x_0^{(3)} x0(3), x 0 ( 4 ) x_0^{(4)} x0(4), x 0 ( 5 ) x_0^{(5)} x0(5)… , x 0 ( 5000 ) x_0^{(5000)} x0(5000)都要与5000个 ( h θ ( x ( 1 ) ) − y ( 1 ) ) (h_\theta(x^{(1)})-y^{(1)}) (hθ(x(1))−y(1)), ( h θ ( x ( 2 ) ) − y ( 2 ) ) (h_\theta(x^{(2)})-y^{(2)}) (hθ(x(2))−y(2)), ( h θ ( x ( 3 ) ) − y ( 3 ) ) (h_\theta(x^{(3)})-y^{(3)}) (hθ(x(3))−y(3)), ( h θ ( x ( 4 ) ) − y ( 4 ) ) (h_\theta(x^{(4)})-y^{(4)}) (hθ(x(4))−y(4)), ( h θ ( x ( 5 ) ) − y ( 5 ) ) (h_\theta(x^{(5)})-y^{(5)}) (hθ(x(5))−y(5)),…, ( h θ ( x ( 5000 ) ) − y ( 5000 ) ) (h_\theta(x^{(5000)})-y^{(5000)}) (hθ(x(5000))−y(5000))对应相乘:
x
0
(
1
)
x_0^{(1)}
x0(1)×
(
h
θ
(
x
(
1
)
)
−
y
(
1
)
)
(h_\theta(x^{(1)})-y^{(1)})
(hθ(x(1))−y(1))
+
x
0
(
2
)
x_0^{(2)}
x0(2)×
(
h
θ
(
x
(
2
)
)
−
y
(
2
)
)
(h_\theta(x^{(2)})-y^{(2)})
(hθ(x(2))−y(2))
+
x
0
(
3
)
x_0^{(3)}
x0(3)×
(
h
θ
(
x
(
3
)
)
−
y
(
3
)
)
(h_\theta(x^{(3)})-y^{(3)})
(hθ(x(3))−y(3))
+
x
0
(
4
)
x_0^{(4)}
x0(4)×
(
h
θ
(
x
(
4
)
)
−
y
(
4
)
)
(h_\theta(x^{(4)})-y^{(4)})
(hθ(x(4))−y(4))
+
x
0
(
5
)
x_0^{(5)}
x0(5)×
(
h
θ
(
x
(
5
)
)
−
y
(
5
)
)
(h_\theta(x^{(5)})-y^{(5)})
(hθ(x(5))−y(5))
+
…
+
x
0
(
5000
)
x_0^{(5000)}
x0(5000)×
(
h
θ
(
x
(
5000
)
)
−
y
(
5000
)
)
(h_\theta(x^{(5000)})-y^{(5000)})
(hθ(x(5000))−y(5000))
用向量表示就是:
x
0
=
[
x
0
(
1
)
x
0
(
2
)
x
0
(
3
)
x
0
(
4
)
x
0
(
5
)
.
.
.
x
0
(
5000
)
]
x_0= \left[ \begin{array}{ccc} x_0^{(1)}\\ x_0^{(2)}\\ x_0^{(3)}\\ x_0^{(4)}\\ x_0^{(5)}\\ ...\\ x_0^{(5000)} \end{array}\right]
x0=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡x0(1)x0(2)x0(3)x0(4)x0(5)...x0(5000)⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤
β = [ ( h θ ( x ( 1 ) ) − y ( 1 ) ) ( h θ ( x ( 2 ) ) − y ( 2 ) ) ( h θ ( x ( 3 ) ) − y ( 3 ) ) ( h θ ( x ( 4 ) ) − y ( 4 ) ) ( h θ ( x ( 5 ) ) − y ( 5 ) ) . . . ( h θ ( x ( 5000 ) ) − y ( 5000 ) ) ] \beta= \left[ \begin{array}{ccc} (h_\theta(x^{(1)})-y^{(1)})\\ (h_\theta(x^{(2)})-y^{(2)})\\ (h_\theta(x^{(3)})-y^{(3)})\\ (h_\theta(x^{(4)})-y^{(4)})\\ (h_\theta(x^{(5)})-y^{(5)})\\ ...\\ (h_\theta(x^{(5000)})-y^{(5000)}) \end{array}\right] β=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎡(hθ(x(1))−y(1))(hθ(x(2))−y(2))(hθ(x(3))−y(3))(hθ(x(4))−y(4))(hθ(x(5))−y(5))...(hθ(x(5000))−y(5000))⎦⎥⎥⎥⎥⎥⎥⎥⎥⎤
分项求和项
∑
i
=
1
m
(
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
x
0
(
i
)
)
\sum\limits_{i=1}^m\left((h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}\right)
i=1∑m((hθ(x(i))−y(i))x0(i))
可通过向量乘法表示为
(
x
0
)
T
β
(x_0)^T\beta
(x0)Tβ
把
x
0
x_0
x0横向扩展到
x
1
x_1
x1
x
2
x_2
x2
x
3
x_3
x3…
x
400
x_{400}
x400就是矩阵乘法
X
T
β
X^T\beta
XTβ
吴恩达课程中是另一种思路,先将
x
1
x_1
x1
x
2
x_2
x2
x
3
x_3
x3…
x
400
x_{400}
x400聚合成
x
x
x向量,把每行分项的标量×标量的求和,聚合成标量×向量的求和。这个转换中难以理解的是还带着一个求和符号,如果去掉求和符号就很好理解:
如果i的范围是从1到2,只有两个数,那么求和符号可以被加法+代替
替换为:
1
m
(
β
(
1
)
x
(
1
)
+
β
(
2
)
x
(
2
)
)
\frac1m(\beta^{(1)}x^{(1)}+\beta^{(2)}x^{(2)})
m1(β(1)x(1)+β(2)x(2))
即:
1
m
∑
i
=
1
2
β
(
i
)
x
(
i
)
\frac1m\sum\limits_{i=1}^2\beta^{(i)}x^{(i)}
m1i=1∑2β(i)x(i)