吴恩达机器学习第二周

Hypothesis:

h_\theta(x)=\theta_0x_0+\theta_1x_1+\theta_2x_2+...+\theta_nx_n=\theta x(x_0=1)

Parameters:

\theta_0,\theta_1,...,\theta_n

Cost Function:

J(\theta_0,\theta_1,...,\theta_n)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2

Goal:

\mathop{minimize} \limits_{\theta_0,\theta_1,...,\theta_n} J(\theta_0,\theta_1,...,\theta_n)

Gradient Descent:

Repeat{

\theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1,...,\theta_n)

(simultaneously update \theta_j

for j=0, 1, ... , n)

}

\begin{aligned} \frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1,...,\theta_j,...,\theta_n) &=\frac{\partial}{\partial\theta_j}\frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2 \\ &=\frac{\partial}{\partial\theta_j}\frac{1}{2m}\sum_{i=1}^{m}(\theta_0x_0^{(i)}+\theta_1x_1^{(i)}+...+\theta_jx_j^{(i)}+...+\theta_nx_n^{(i)}-y^{(i)})^2 \\ &=\frac{1}{2m}\sum_{i=1}^{m}\frac{\partial}{\partial\theta_j}(\theta_0x_0^{(i)}+\theta_1x_1^{(i)}+...+\theta_jx_j^{(i)}+...+\theta_nx_n^{(i)}-y^{(i)})^2 \\ &=\frac{1}{2m}\sum_{i=1}^{m}2(\theta_0x_0^{(i)}+\theta_1x_1^{(i)}+...+\theta_jx_j^{(i)}+...+\theta_nx_n^{(i)}-y^{(i)})\cdot x_j^{(i)} \\ &=\frac{1}{m}\sum_{i=1}^{m}(h_\theta x^{(i)}-y^{(i)})\cdot x_j^{(i)} \end{aligned}

Feature Scaling:

x_n=\frac{x_n-\mu_n}{s_n}

其中\mu_n是平均值,s_n是标准差(中心化、标准化)

Learning Rate:

通常可以考虑尝试这些学习率:

\alpha=0.01,0.03,0.1,0.3,1,3,10

Normal Equation:

梯度下降与正规方程的比较:

\begin{aligned} \nabla_\theta J(\theta) &=\nabla_\theta\frac{1}{2}(X\theta-y)^T(X\theta-y) \\ &=\nabla_\theta\frac{1}{2}(\theta^TX^T-y^T)(X\theta-y) \\ &=\nabla_\theta\frac{1}{2}(\theta^TX^TX\theta-\theta^TX^Ty-y^TX\theta+y^Ty) \\ &=\frac{1}{2}(2X^TX\theta-X^Ty-X^Ty) \\ &\mathop{=} \limits^{set}\textbf{0} \\ X^TX\theta&=X^Ty \\ \theta&=(X^TX)^{-1}X^Ty \end{aligned}

Vectorization:

  •  

x^{(i)}=\begin{bmatrix} x^{(i)}_0 \\ x^{(i)}_1 \\ \vdots \\ x^{(i)}_n \end{bmatrix} \in \mathbb{R}^{(n+1)\times 1} \qquad \theta= \begin{bmatrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_n \end{bmatrix} \in \mathbb{R}^{(n+1)\times 1}

  •  

X= \begin{bmatrix} \cdots & x^{(1)}^T & \cdots \\ \cdots & x^{(2)}^T & \cdots \\ & \vdots & \\ \cdots & x^{(m)}^T & \cdots \\ \end{bmatrix} \in \mathbb{R}^{m\times (n+1)}

  •  

X\theta= \begin{bmatrix} \cdots & x^{(1)}^T \theta & \cdots \\ \cdots & x^{(2)}^T \theta & \cdots \\ & \vdots & \\ \cdots & x^{(m)}^T \theta & \cdots \\ \end{bmatrix} = \begin{bmatrix} \cdots & h_\theta(x^{(1)}) & \cdots \\ \cdots & h_\theta(x^{(2)}) & \cdots \\ & \vdots & \\ \cdots & h_\theta(x^{(m)}) & \cdots \\ \end{bmatrix}\in \mathbb{R}^{m\times 1}

  •  

y= \begin{bmatrix} \cdots & y^{(1)} & \cdots \\ \cdots & y^{(2)} & \cdots \\ & \vdots & \\ \cdots & y^{(m)} & \cdots \\ \end{bmatrix} \in \mathbb{R}^{m\times 1}

  •  

X^T= \begin{bmatrix} \vdots & \vdots & & \vdots \\ x^{(1)}_j & x^{(2)}_j & \cdots & x^{(m)}_j \\ \vdots & \vdots & & \vdots \\ \end{bmatrix} \in \mathbb{R}^{(n+1)\times m}

  •  

\begin{aligned} X^T(X\theta-y)&=\begin{bmatrix} \vdots & \vdots & & \vdots \\ x^{(1)}_j & x^{(2)}_j & \cdots & x^{(m)}_j \\ \vdots & \vdots & & \vdots \\ \end{bmatrix}\begin{bmatrix} \cdots & h_\theta(x^{(1)})-y^{(1)} & \cdots \\ \cdots & h_\theta(x^{(2)})-y^{(2)} & \cdots \\ & \vdots & \\ \cdots & h_\theta(x^{(m)})-y^{(m)} & \cdots \\ \end{bmatrix}\\&=\begin{bmatrix} \vdots \\ \frac{\partial}{\partial\theta_j}J(\theta) \\ \vdots \\ \end{bmatrix} \in\mathbb{R}^{(n+1)\times 1} \end{aligned}

Details:

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值