Coursera 学习笔记|Machine Learning by Stanford University - 吴恩达 / Week 1 - 2 /


博客园 - 链接: Coursera 学习笔记|Machine Learning by Standford University - 吴恩达


Chapter 1 - Introduction

1.1 Definition

  • Arthur Samuel
    The field of study that gives computers the ability to learn without being explicitly programmed.
  • Tom Mitchell
    A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

1.2 Concepts

1.2.1 Classification of Machine Learning

  • Supervised Learning 监督学习:given a labeled data set; already know what a correct output/result should look like
    • Regression 回归:continuous output
    • Classification 分类:discrete output
  • Unsupervised Learning 无监督学习:given an unlabeled data set or an data set with the same labels; group the data by ourselves
    • Clustering 聚类:group the data into different clusters
    • Non-Clustering 非聚类
  • Others: Reinforcement Learning, Recommender Systems…

1.2.2 Model Representation

  • Training Set 训练集

    x 1 ( 1 ) x 2 ( 1 ) ⋯ x n ( 1 ) y ( 1 ) x 1 ( 2 ) x 2 ( 2 ) ⋯ x n ( 2 ) y ( 2 ) ⋮ ⋮ ⋱ ⋮ ⋮ x 1 ( m ) x 2 ( m ) ⋯ x n ( m ) y ( m ) \begin{matrix} x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n&&y^{(1)}\\ x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n&&y^{(2)}\\ \vdots&\vdots&\ddots&\vdots&&\vdots\\ x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n&&y^{(m)} \end{matrix} x1(1)x1(2)x1(m)x2(1)x2(2)x2(m)xn(1)xn(2)xn(m)y(1)y(2)y(m)

  • 符号说明
    m = m= m= the number of training examples 训练样本的数量 - 行数
    n = n= n= the number of features 特征数量 - 列数
    x = x= x= input variable/feature 输入变量/特征
    y = y= y= output variable/target variable 输出变量/目标变量
    ( x j ( i ) , y ( i ) ) (x^{(i)}_j,y^{(i)}) (xj(i),y(i)) :第 j j j个特征的第 i i i 个训练样本,其中 i = 1 , . . . , m i=1, ..., m i=1,...,m j = 1 , . . . , n j=1, ..., n j=1,...,n

1.2.3 Cost Function 代价函数

1.2.4 Gradient Descent 梯度下降

Chapter 2 - Linear Regression 线性回归

x 0 x 1 ( 1 ) x 2 ( 1 ) ⋯ x n ( 1 ) y ( 1 ) x 0 x 1 ( 2 ) x 2 ( 2 ) ⋯ x n ( 2 ) y ( 2 ) ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ x 0 x 1 ( m ) x 2 ( m ) ⋯ x n ( m ) y ( m ) θ 0 θ 1 θ 2 ⋯ θ n \begin{matrix} x_0&x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n&&y^{(1)}\\ x_0&x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n&&y^{(2)}\\ \vdots&\vdots&\vdots&\ddots&\vdots&&\vdots\\ x_0&x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n&&y^{(m)}\\ \\ \theta_0&\theta_1&\theta_2&\cdots&\theta_n&& \end{matrix} x0x0x0θ0x1(1)x1(2)x1(m)θ1x2(1)x2(2)x2(m)θ2xn(1)xn(2)xn(m)θny(1)y(2)y(m)

2.1 Linear Regression with One Variable 单元线性回归

  • Hypothesis Function

    h θ ( x ) = θ 0 + θ 1 x h_{\theta}(x)=\theta_0+\theta_1x hθ(x)=θ0+θ1x

  • Cost Function - Square Error Cost Function 平方误差代价函数
    J ( θ 0 , θ 1 ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 J(\theta_0,\theta_1)=\frac{1}{2m}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2 J(θ0,θ1)=2m1i=1m(hθ(x(i))y(i))2

  • Goal

    min ⁡ ( θ 0 , θ 1 ) J ( θ 0 , θ 1 ) \min_{(\theta_0,\theta_1)}J(\theta_0,\theta_1) (θ0,θ1)minJ(θ0,θ1)

2.2 Multivariate Linear Regression 多元线性回归

  • Hypothesis Function

    KaTeX parse error: Undefined control sequence: \ at position 92: …atrix} \right],\̲ ̲x= \left[ \begi…

    h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 + ⋯ + θ n x n = θ T x \begin{aligned}h_\theta(x)&=\theta_0+\theta_1x_1+\theta_2x_2+\cdots+\theta_nx_n\\ &=\theta^Tx \end{aligned} hθ(x)=θ0+θ1x1+θ2x2++θnxn=θTx

  • Cost Function

    J ( θ T ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 J(\theta^T)=\frac{1}{2m}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2 J(θT)=2m1i=1m(hθ(x(i))y(i))2

  • Goal

    min ⁡ θ T J ( θ T ) \min_{\theta^T}J(\theta^T) θTminJ(θT)

2.3 Algorithm Optimization

2.3.1 Gradient Descent 梯度下降法

  • 算法过程
    Repeat until convergence(simultaneous update for each j = 1 , . . . , n j=1, ..., n j=1,...,n)
    θ j : = θ j − α ∂ ∂ θ j J ( θ T ) : = θ j − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \begin{aligned} \theta_j &:=\theta_j-\alpha{\partial\over\partial\theta_j}J(\theta^T)\\ &:=\theta_j-\alpha{1\over{m}}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}_j \end{aligned} θj:=θjαθjJ(θT):=θjαm1i=1m(hθ(x(i))y(i))xj(i)
  • Feature Scaling 特征缩放
    对每个特征 x j x_j xj x j = x j − μ j s j x_j={{x_j-\mu_j}\over{s_j}} xj=sjxjμj
    其中 μ j \mu_j μj m m m 个特征 x j x_j xj 的平均值, s j s_j sj m m m 个特征 x j x_j xj 的范围(最大值与最小值之差)或标准差。
  • Learning Rate 学习率

2.3.2 Normal Equation(s) 正规方程(组)


KaTeX parse error: Undefined control sequence: \ at position 212: …atrix} \right],\̲ ̲y=\left[ \begin…

其中 X X X m × ( n + 1 ) m\times(n+1) m×(n+1) 维矩阵, y y y m m m 维的列向量。则

θ = ( X T X ) − 1 X T y \theta=(X^TX)^{-1}X^Ty θ=(XTX)1XTy

如果 X T X X^TX XTX 不可逆(noninvertible),可能是因为:

  1. Redundant features 冗余特征:存在线性相关的两个特征,需要删除其中一个;
  2. 特征过多,如 m ≤ n m\leq n mn:需要删除一些特征,或对其进行正规化(regularization)处理。

2.4 Polynomial Regression 多项式回归

If a linear h θ ( x ) h_\theta(x) hθ(x) can’t fit the data well, we can change the behavior or curve of h θ ( x ) h_\theta(x) hθ(x) by making it a quadratic, cubic or square root function(or any other form).
e.g.

  • h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 1 2 ,   x 2 = x 1 2 h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2x_1^2,\ x_2=x_1^2 hθ(x)=θ0+θ1x1+θ2x12, x2=x12

  • h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 1 2 + θ 3 x 1 3 ,   x 2 = x 1 2 ,   x 3 = x 1 3 h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2x_1^2+\theta_3x_1^3,\ x_2=x_1^2,\ x_3=x_1^3 hθ(x)=θ0+θ1x1+θ2x12+θ3x13, x2=x12, x3=x13

  • h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 1 ,   x 2 = x 1 h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2\sqrt{x_1},\ x_2=\sqrt{x_1} hθ(x)=θ0+θ1x1+θ2x1 , x2=x1

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

想念小8的第1621天

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值