机器学习 | 向量化

前言

下面对线性回归模型、代价函数、梯度下降算法等基础概念进行向量化。在这里我们讨论的都是这些概念的最一般的形式,毕竟,数学家都喜欢这么做。

一、线性回归模型

线性回归模型最一般的形式为:

h θ ( x ) = θ 0 x 0 + θ 1 x 1 + θ 2 x 2 + ⋯ + θ n x n h_\theta(x)=\theta_0x_0+\theta_1x_1+\theta_2x_2+\cdots+\theta_nx_n hθ(x)=θ0x0+θ1x1+θ2x2++θnxn

θ = [ θ 0 θ 1 ⋮ θ n ] \theta=\left[\begin{array}{c}\theta_0 \\ \theta_1\\\vdots\\\theta_n\end{array}\right] θ=θ0θ1θn   ,    y = [ y ( 1 ) y ( 2 ) ⋮ y ( m ) ] y=\left[\begin{array}{cc}y^{(1)}\\y^{(2)}\\\vdots \\y^{(m)} \end{array}\right] y=y(1)y(2)y(m)

对于 x 1 , x 2 , ⋯   , x n x_1, x_2,\cdots,x_n x1,x2,,xn 如何用向量表示,其实就我目前所知,有两种不同的设法,讨论如下:


第一种:

X = [ x 0 x 1 ⋮ x n ] X=\left[\begin{array}{c}x_0 \\ x_1\\\vdots\\x_n\end{array}\right] X=x0x1xn

h θ ( x ) = θ T X = X T θ h_\theta(x)=\theta^TX=X^T\theta hθ(x)=θTX=XTθ

这种 X X X 的设法可以让 h θ ( x ) h_\theta(x) hθ(x) 的表达式简化一些。


第二种则更实用一些(做编程作业时):

X = [ x 0 x 1 ( 1 ) ⋯ x n ( 1 ) x 0 x 1 ( 2 ) ⋯ x n ( 2 ) ⋮ ⋮ ⋮ x 0 x 1 ( m ) ⋯ x n ( m ) ] X=\left[ \begin{array}{cc}x_0 & x_1^{(1)} & \cdots & x_n^{(1)} \\ x_0 & x_1^{(2)} & \cdots & x_n^{(2)} \\ \vdots & \vdots & & \vdots \\ x_0 & x_1^{(m)} & \cdots & x_n^{(m)} \end{array} \right] X=x0x0x0x1(1)x1(2)x1(m)xn(1)xn(2)xn(m)

对于这种设法,可以理解为以特征为列,以数据样本为行。

并且此时的 h θ ( x ) h_\theta(x) hθ(x) 需要改写一下:

h θ ( x ( i ) ) = θ 0 x 0 ( i ) + θ 1 x 1 ( i ) + θ 2 x 2 ( i ) + ⋯ + θ n x n ( i ) h_\theta(x^{(i)})=\theta_0x_0^{(i)}+\theta_1x_1^{(i)}+\theta_2x_2^{(i)}+\cdots+\theta_nx_n^{(i)} hθ(x(i))=θ0x0(i)+θ1x1(i)+θ2x2(i)++θnxn(i)

然后令 h θ ( x ) = [ h θ ( x ( 1 ) ) h θ ( x ( 2 ) ) ⋮ h θ ( x ( m ) ) ] h_\theta(x)=\left[\begin{array}{cc}h_\theta(x^{(1)})\\h_\theta(x^{(2)}) \\\vdots \\ h_\theta(x^{(m)}) \end{array}\right] hθ(x)=hθ(x(1))hθ(x(2))hθ(x(m))

于是有, h θ ( x ) = X θ h_\theta(x)=X\theta hθ(x)=Xθ


在接下来的向量化推导中我都将采用第二种设法。

二、代价函数

代价函数的一般形式:

J ( θ ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 J \left( \theta\right) = \frac{1}{2m}\sum\limits_{i=1}^m \left( h_{\theta}(x^{(i)})-y^{(i)} \right)^{2} J(θ)=2m1i=1m(hθ(x(i))y(i))2

其中 θ = [ θ 0 θ 1 ⋮ θ n ] \theta=\left[\begin{array}{c}\theta_0 \\ \theta_1\\\vdots\\\theta_n\end{array}\right] θ=θ0θ1θn

对代价函数进行向量化的结果为:

J ( θ ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 = 1 2 m ( X θ − y ) ⋅ ( X θ − y ) J \left( \theta \right)=\frac{1}{2m}\sum\limits_{i=1}^m \left( h_{\theta}(x^{(i)})-y^{(i)} \right)^{2}=\frac{1}{2m}(X\theta-y)\cdot(X\theta-y) J(θ)=2m1i=1m(hθ(x(i))y(i))2=2m1(Xθy)(Xθy)

注意这里 ( X θ − y ) (X\theta-y) (Xθy) ( X θ − y ) (X\theta-y) (Xθy) 之间使用的是点积

具体推导过程如下:

X θ − y = h θ ( x ) − y = [ h θ ( x ( 1 ) ) h θ ( x ( 2 ) ) ⋮ h θ ( x ( m ) ) ] − [ y ( 1 ) y ( 2 ) ⋮ y ( m ) ] = [ h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ] X\theta-y=h_\theta{(x)}-y=\left[\begin{array}{cc}h_\theta(x^{(1)})\\h_\theta(x^{(2)}) \\\vdots \\ h_\theta(x^{(m)}) \end{array}\right]-\left[\begin{array}{c}y^{(1)}\\y^{(2)}\\\vdots\\y^{(m)}\end{array}\right]= \left[\begin{array}{cc}h_\theta(x^{(1)})-y^{(1)}\\h_\theta(x^{(2)})-y^{(2)} \\\vdots \\ h_\theta(x^{(m)})-y^{(m)} \end{array}\right] Xθy=hθ(x)y=hθ(x(1))hθ(x(2))hθ(x(m))y(1)y(2)y(m)=hθ(x(1))y(1)hθ(x(2))y(2)hθ(x(m))y(m)

于是有,

( X θ − y ) ⋅ ( X θ − y ) = [ h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ] ⋅ [ h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ] (X\theta-y)\cdot(X\theta-y)= \left[\begin{array}{cc}h_\theta(x^{(1)})-y^{(1)}\\h_\theta(x^{(2)})-y^{(2)} \\\vdots \\ h_\theta(x^{(m)})-y^{(m)} \end{array}\right] \cdot \left[\begin{array}{cc}h_\theta(x^{(1)})-y^{(1)}\\h_\theta(x^{(2)})-y^{(2)} \\\vdots \\ h_\theta(x^{(m)})-y^{(m)} \end{array}\right] (Xθy)(Xθy)=hθ(x(1))y(1)hθ(x(2))y(2)hθ(x(m))y(m)hθ(x(1))y(1)hθ(x(2))y(2)hθ(x(m))y(m)

= ( h θ ( x ( 1 ) ) − y ( 1 ) ) 2 + ( h θ ( x ( 2 ) ) − y ( 2 ) ) 2 + ⋯ + ( h θ ( x ( m ) ) − y ( m ) ) 2 \\= \left( h_{\theta}(x{(1)})-y^{(1)} \right)^{2}+\left( h_{\theta}(x^{(2)})-y^{(2)} \right)^{2}+\cdots+\left( h_{\theta}(x^{(m)})-y^{(m)} \right)^{2} =(hθ(x(1))y(1))2+(hθ(x(2))y(2))2++(hθ(x(m))y(m))2

= ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 \\ \\=\sum\limits_{i=1}^m \left( h_{\theta}(x^{(i)})-y^{(i)} \right)^{2} =i=1m(hθ(x(i))y(i))2

总结:

J ( θ ) = 1 2 m ( X θ − y ) ⋅ ( X θ − y ) J \left( \theta \right)=\frac{1}{2m}(X\theta-y)\cdot(X\theta-y) J(θ)=2m1(Xθy)(Xθy)

…有没有隐约地感觉到数学的美妙之处?

三、梯度下降算法

梯度下降函数的一般形式为:

θ j : = θ j − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \theta_j:=\theta_j-\alpha\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_j θj:=θjαm1i=1m(hθ(x(i))y(i))xj(i)其中, 0 ≤ j ≤ n 0\leq j \leq n 0jn

将上式进行向量化后的结果:

θ : = θ − α 1 m X T ( X θ − y ) \theta :=\theta-\alpha\frac{1}{m}X^T(X\theta-y) θ:=θαm1XT(Xθy)

其中 θ = [ θ 0 θ 1 ⋮ θ n ] \theta=\left[\begin{array}{c}\theta_0 \\ \theta_1\\\vdots\\\theta_n\end{array}\right] θ=θ0θ1θn   ,    X = [ x 0 ( 1 ) x 1 ( 1 ) ⋯ x n ( 1 ) x 0 ( 2 ) x 1 ( 2 ) ⋯ x n ( 2 ) ⋮ ⋮ ⋮ x 0 ( m ) x 1 ( m ) ⋯ x n ( m ) ] X=\left[ \begin{array}{cc}x_0^{(1)} & x_1^{(1)} & \cdots & x_n^{(1)} \\ x_0^{(2)} & x_1^{(2)} & \cdots & x_n^{(2)} \\ \vdots & \vdots & & \vdots \\ x_0^{(m)} & x_1^{(m)} & \cdots & x_n^{(m)} \end{array} \right] X=x0(1)x0(2)x0(m)x1(1)x1(2)x1(m)xn(1)xn(2)xn(m)  ,    y = [ y ( 1 ) y ( 2 ) ⋮ y ( m ) ] y=\left[\begin{array}{cc}y^{(1)}\\y^{(2)}\\\vdots \\y^{(m)} \end{array}\right] y=y(1)y(2)y(m)

推导过程如下:

θ : = θ − α 1 m δ \theta :=\theta-\alpha\frac{1}{m}\delta θ:=θαm1δ,显然, δ = [ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 0 ( i ) ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 1 ( i ) ⋮ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x n ( i ) ] \delta=\left[\begin{array}{cc}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_0\\\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_1 \\\vdots \\ \sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_n \end{array}\right] δ=i=1m(hθ(x(i))y(i))x0(i)i=1m(hθ(x(i))y(i))x1(i)i=1m(hθ(x(i))y(i))xn(i)

则有 δ = [ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 0 ( i ) ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 1 ( i ) ⋮ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x n ( i ) ] \delta= \left[\begin{array}{cc}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_0\\\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_1 \\\vdots \\ \sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_n \end{array}\right]\quad δ=i=1m(hθ(x(i))y(i))x0(i)i=1m(hθ(x(i))y(i))x1(i)i=1m(hθ(x(i))y(i))xn(i)

= [ ( h θ ( x ( 1 ) ) − y ( 1 ) ) x 0 ( 1 ) + ( h θ ( x ( 2 ) ) − y ( 2 ) ) x 0 ( 2 ) + ⋯ + ( h θ ( x ( m ) ) − y ( m ) ) x 0 ( m ) ( h θ ( x ( 1 ) ) − y ( 1 ) ) x 1 ( 1 ) + ( h θ ( x ( 2 ) ) − y ( 2 ) ) x 1 ( 2 ) + ⋯ + ( h θ ( x ( m ) ) − y ( m ) ) x 1 ( m ) ⋮ ( h θ ( x ( 1 ) ) − y ( 1 ) ) x n ( 1 ) + ( h θ ( x ( 2 ) ) − y ( 2 ) ) x n ( 2 ) + ⋯ + ( h θ ( x ( m ) ) − y ( m ) ) x n ( m ) ] \\= \left[\begin{array}{cc}\left( h_{\theta}(x^{(1)})-y^{(1)} \right)x^{(1)}_0+\left( h_{\theta}(x^{(2)})-y^{(2)} \right)x^{(2)}_0+\cdots+\left( h_{\theta}(x^{(m)})-y^{(m)} \right)x^{(m)}_0 \\ \left( h_{\theta}(x^{(1)})-y^{(1)} \right)x^{(1)}_1+\left( h_{\theta}(x^{(2)})-y^{(2)} \right)x^{(2)}_1+\cdots+\left( h_{\theta}(x^{(m)})-y^{(m)} \right)x^{(m)}_1 \\ \vdots \\ \left( h_{\theta}(x^{(1)})-y^{(1)} \right)x^{(1)}_n+\left( h_{\theta}(x^{(2)})-y^{(2)} \right)x^{(2)}_n+\cdots+\left( h_{\theta}(x^{(m)})-y^{(m)} \right)x^{(m)}_n\\ \end{array}\right] =(hθ(x(1))y(1))x0(1)+(hθ(x(2))y(2))x0(2)++(hθ(x(m))y(m))x0(m)(hθ(x(1))y(1))x1(1)+(hθ(x(2))y(2))x1(2)++(hθ(x(m))y(m))x1(m)(hθ(x(1))y(1))xn(1)+(hθ(x(2))y(2))xn(2)++(hθ(x(m))y(m))xn(m)

= [ x 0 ( 1 ) x 0 ( 2 ) ⋯ x 0 ( m ) x 1 ( 1 ) x 1 ( 2 ) ⋯ x 1 ( m ) ⋮ ⋮ ⋮ x n ( 1 ) x n ( 2 ) ⋯ x n ( m ) ] [ h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ] \\=\left[\begin{array}{cc}x_0^{(1)} & x_0^{(2)} & \cdots & x_0^{(m)} \\ x_1^{(1)} & x_1^{(2)} & \cdots & x_1^{(m)} \\ \vdots & \vdots & & \vdots \\ x_n^{(1)} & x_n^{(2)} & \cdots & x_n^{(m)}\end{array}\right] \left[\begin{array}{cc}h_\theta(x^{(1)})-y^{(1)}\\h_\theta(x^{(2)})-y^{(2)} \\\vdots \\ h_\theta(x^{(m)})-y^{(m)} \end{array}\right] =x0(1)x1(1)xn(1)x0(2)x1(2)xn(2)x0(m)x1(m)xn(m)hθ(x(1))y(1)hθ(x(2))y(2)hθ(x(m))y(m)

很明显, [ x 0 ( 1 ) x 0 ( 2 ) ⋯ x 0 ( m ) x 1 ( 1 ) x 1 ( 2 ) ⋯ x 1 ( m ) ⋮ ⋮ ⋮ x n ( 1 ) x n ( 2 ) ⋯ x n ( m ) ] = X T \left[\begin{array}{cc}x_0^{(1)} & x_0^{(2)} & \cdots & x_0^{(m)} \\ x_1^{(1)} & x_1^{(2)} & \cdots & x_1^{(m)} \\ \vdots & \vdots & & \vdots \\ x_n^{(1)} & x_n^{(2)} & \cdots & x_n^{(m)}\end{array}\right]=X^T x0(1)x1(1)xn(1)x0(2)x1(2)xn(2)x0(m)x1(m)xn(m)=XT ,又因为 X θ − y = [ h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ] X\theta-y= \left[\begin{array}{cc}h_\theta(x^{(1)})-y^{(1)}\\h_\theta(x^{(2)})-y^{(2)} \\\vdots \\ h_\theta(x^{(m)})-y^{(m)} \end{array}\right] Xθy=hθ(x(1))y(1)hθ(x(2))y(2)hθ(x(m))y(m)

δ = X T ( X θ − y ) \delta=X^T(X\theta-y) δ=XT(Xθy)

总结:

θ : = θ − α 1 m X T ( X θ − y ) \theta :=\theta-\alpha\frac{1}{m}X^T(X\theta-y) θ:=θαm1XT(Xθy)



(完)
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值