Machine Learning by Andrew Ng
💡 吴恩达机器学习课程学习笔记——Week 1
🐠 本人学习笔记汇总 合订本
✓ 课程网址 standford machine learning
🍭 参考资源
学习提纲
Introduction
Machine Learning Definition 机器学习定义
💡 ML Definition
A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
Machine Learning Algorithm 机器学习的分类
- Supervised Learning
- Unsupervised Learning
- other: RL, recommender systems
Supervised Learning 有监督学习
teach the machine to learn, given right answers
给定标注的答案,让机器学习到经验。
有监督学习分类:
- Regression
回归 predict a continuous value - Classification
分类 predict a discrete value
Regression 回归
Classification 分类
Unsupervised Learning 无监督学习
ask the machine to find the structure of an unlabeled data set
automatically find the structure of the data 让机器自动学习到数据的结构
The cocktail party 鸡尾酒派对问题
an unsupervised learning can separate different sources of voices
让机器把派对上多个叠加的声音区分开
Model and Cost Function
(
x
i
,
y
i
)
(x^i , y ^i)
(xi,yi) denotes a training example 训练样本的数学表示方式
Model Representation 模型表示
h means hypothesis
Univariate linear regression 单变量线性回归
Cost Function
💡 Idea
Chose θ 0 , θ 1 \theta_0, \theta_1 θ0,θ1 s.t. h θ ( x ) h_\theta (x) hθ(x) is close to y
define the cost function as
Squared error cost function 平方差误差
our goal is
m
i
n
i
m
i
z
e
θ
1
,
θ
2
1
2
m
Σ
i
=
1
m
(
h
θ
(
x
i
)
−
y
i
)
2
minimize_{\theta_1, \theta_2} \quad \frac{1}{2m} \Sigma_{i=1}^m (h_\theta (x^i) -y^i)^2
minimizeθ1,θ22m1Σi=1m(hθ(xi)−yi)2
Our goal is to minimize the cost function and find the global minimum
A contour plot/figure to visualize the cost function
Parameter Learning
Gradient descent 梯度下降法
a general algorithm to minimize the function
Intuition
- Learning rate
- Simultaneously Update All Parameters
一个容易出错的点,注意是同时去更新所有的参数
derivative 导数
the (partial) derivative term
the slope of a line 斜率
Learning rate 学习率的重要性
- A small lr can lead to a slow converge
- A large lr can lead to failure of converge or even diverge
If initialized at a local optima
固定的学习率就可以让模型收敛
Gradient descent can converge to a local minima even with a fixed learning rate. Because the derivate term is becoming smaller when approaching the local minima
Gradient Descent for Linear Regression
convex function: a bow shaped function 凸函数
A convex function always converge to global minimum when using gradient descent with an appropriate learning rate (there is not local minima)
following the trajectory, it reaches the global minimum
Above is called Batch Gradient Descent
Each step of gradient descent we use all training examples
Linear Algebra Review 线性代数
Matrix
Matrix Elements (entries of matrix)
Vector: an n by 1 matrix 向量是n行1列的矩阵
1-indexed vs 0-indexed 两种写法
Capital case for matrices A B C
Lower case for vectors a b c
Addition and Scalar Multiplication
Matrix Vector Multiplication
Matrix Matrix Multiplication
Matrix Multiplication Properties
not commutative 不可交换
A ∗ B ≠ B ∗ A A*B\neq B*A A∗B=B∗A
associative 可结合
( A ∗ B ) ∗ C = A ∗ ( B ∗ C ) (A*B)*C= A * (B*C) (A∗B)∗C=A∗(B∗C)
Inverse And Transpose
The inverse of A is denoted as A − 1 A^{-1} A−1
A non-square matrix does not have an inverse matrix.
For a square matrix that does not have an inverse, it is called singular or degenerate 不可逆矩阵
The transposition of A is denoted as A T A^T AT