[Machinie Learning] 吴恩达机器学习课程笔记——Week1

Machine Learning by Andrew Ng

💡 吴恩达机器学习课程学习笔记——Week 1
🐠 本人学习笔记汇总 合订本
✓ 课程网址 standford machine learning
🍭 参考资源

学习提纲

Introduction

Machine Learning Definition 机器学习定义

💡 ML Definition
A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

Machine Learning Algorithm 机器学习的分类

  • Supervised Learning
  • Unsupervised Learning
  • other: RL, recommender systems

Supervised Learning 有监督学习

teach the machine to learn, given right answers
给定标注的答案,让机器学习到经验。

有监督学习分类:

  • Regression
    回归 predict a continuous value
  • Classification
    分类 predict a discrete value

Regression 回归
在这里插入图片描述

Classification 分类
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-aRCJEZ5e-1670502920852)(https://s3-us-west-2.amazonaws.com/secure.notion-static.com/e39658a4-189c-4f6c-b347-a7ac23d3ccae/Untitled.png)]
在这里插入图片描述

Unsupervised Learning 无监督学习

ask the machine to find the structure of an unlabeled data set
automatically find the structure of the data 让机器自动学习到数据的结构

在这里插入图片描述
在这里插入图片描述


The cocktail party 鸡尾酒派对问题

an unsupervised learning can separate different sources of voices
让机器把派对上多个叠加的声音区分开

在这里插入图片描述

Model and Cost Function

( x i , y i ) (x^i , y ^i) (xi,yi) denotes a training example 训练样本的数学表示方式
在这里插入图片描述
在这里插入图片描述

Model Representation 模型表示
h means hypothesis
Univariate linear regression 单变量线性回归

在这里插入图片描述

Cost Function

💡 Idea
Chose θ 0 , θ 1 \theta_0, \theta_1 θ0,θ1 s.t. h θ ( x ) h_\theta (x) hθ(x) is close to y

define the cost function as
在这里插入图片描述
Squared error cost function 平方差误差

our goal is
m i n i m i z e θ 1 , θ 2 1 2 m Σ i = 1 m ( h θ ( x i ) − y i ) 2 minimize_{\theta_1, \theta_2} \quad \frac{1}{2m} \Sigma_{i=1}^m (h_\theta (x^i) -y^i)^2 minimizeθ1,θ22m1Σi=1m(hθ(xi)yi)2

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RZNjbweq-1670502187521)(Week1%20dd6012e3a1bc4417a95b215cd58c5ab8/Untitled%2010.png)]

Our goal is to minimize the cost function and find the global minimum
在这里插入图片描述

A contour plot/figure to visualize the cost function
在这里插入图片描述
在这里插入图片描述

Parameter Learning

Gradient descent 梯度下降法

a general algorithm to minimize the function

在这里插入图片描述

Intuition
在这里插入图片描述

  • Learning rate
  • Simultaneously Update All Parameters
    一个容易出错的点,注意是同时去更新所有的参数
    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-wMaVbOyh-1670502187524)(Week1%20dd6012e3a1bc4417a95b215cd58c5ab8/Untitled%2016.png)]

derivative 导数
the (partial) derivative term
在这里插入图片描述

the slope of a line 斜率
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RFaKtJTw-1670503991613)(https://s3-us-west-2.amazonaws.com/secure.notion-static.com/381a13d1-e126-4d6c-857a-0f1f2689b8a5/Untitled.png)]

Learning rate 学习率的重要性

  • A small lr can lead to a slow converge
  • A large lr can lead to failure of converge or even diverge
    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-8tXUeVqk-1670502187526)(Week1%20dd6012e3a1bc4417a95b215cd58c5ab8/Untitled%2019.png)]

If initialized at a local optima
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-49ODB41k-1670502187527)(Week1%20dd6012e3a1bc4417a95b215cd58c5ab8/Untitled%2020.png)]

固定的学习率就可以让模型收敛

Gradient descent can converge to a local minima even with a fixed learning rate. Because the derivate term is becoming smaller when approaching the local minima

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jqWI8OWu-1670502187527)(Week1%20dd6012e3a1bc4417a95b215cd58c5ab8/Untitled%2021.png)]

Gradient Descent for Linear Regression

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UW02bTup-1670502187527)(Week1%20dd6012e3a1bc4417a95b215cd58c5ab8/Untitled%2022.png)]

convex function: a bow shaped function 凸函数

A convex function always converge to global minimum when using gradient descent with an appropriate learning rate (there is not local minima)

following the trajectory, it reaches the global minimum
在这里插入图片描述

Above is called Batch Gradient Descent

Each step of gradient descent we use all training examples

Linear Algebra Review 线性代数

Matrix
在这里插入图片描述

Matrix Elements (entries of matrix)
在这里插入图片描述

Vector: an n by 1 matrix 向量是n行1列的矩阵
1-indexed vs 0-indexed 两种写法
在这里插入图片描述

Capital case for matrices A B C
Lower case for vectors a b c

Addition and Scalar Multiplication
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9lIGKSd7-1670502187528)(Week1%20dd6012e3a1bc4417a95b215cd58c5ab8/Untitled%2027.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-V3hhNzVN-1670502187529)(Week1%20dd6012e3a1bc4417a95b215cd58c5ab8/Untitled%2028.png)]

Matrix Vector Multiplication
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-cmRANKTT-1670502187529)(Week1%20dd6012e3a1bc4417a95b215cd58c5ab8/Untitled%2029.png)]
在这里插入图片描述

Matrix Matrix Multiplication
在这里插入图片描述

在这里插入图片描述

Matrix Multiplication Properties

not commutative 不可交换

A ∗ B ≠ B ∗ A A*B\neq B*A AB=BA

associative 可结合

( A ∗ B ) ∗ C = A ∗ ( B ∗ C ) (A*B)*C= A * (B*C) (AB)C=A(BC)

Inverse And Transpose

The inverse of A is denoted as A − 1 A^{-1} A1

A non-square matrix does not have an inverse matrix.

For a square matrix that does not have an inverse, it is called singular or degenerate 不可逆矩阵

The transposition of A is denoted as A T A^T AT

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值