我的github链接 - 课程相关代码:
0. Introduction
- Machine Learning: define a set of function, goodness of function, pick the best function
- Regression输出的是一个标量,Classification输出的是(1)是或否(Binary Classification) (2) Multi-class Classification
- 选不同的function set其实就是选不同的model,model里面最简单的就是linear model;此外还有很多nonlinear model,如deep learning, SVM, decision tree, kNN...... 以上都是supervised learning - 需要搜集很多training data
- Semi-supervised learning(半监督学习) - 有些有有些没有label
- Transfer Learning - data not related to the task considered
- Unsupervised Learning(非监督学习)
- Structured Learning - Beyond Classification (输出的是一个有结构性的object)
- Reinforcement Learning - 没有监督知道,只有一个好or坏的评分机制(learning from critics)
- 蓝色: scenario; 红色: task - 要解的问题; 绿色: method.
1. Regression
- output a scalar
- Step1: Model: w and b are parameters, w: weight, b: bias
- Linear Model:
- Step2: Goodness of Function - Loss function L: input is a function, output is how bad it is (损失函数)
- first version of Loss Function: or
- Step3: Best Function -
- Gradient Descent (梯度下降法) - 只要loss func对与它的参数是可微分的就可以用,不需要一定是线性方程
- - Pick an initial value ; - Compute ; - , where is learning rate. Continue this step until finding gradient that is equal to zero.
- For two parameters: ; - Pick initial value: ; - Compute ; - 。Continue this step until finding gradient that is equal to zero.
- 以上方法得出来的结果满足:
- gradient descent缺点:可能会卡在saddle point或者local minima
- 对于linear regression, 由于它是convex的函数,所以不存在上述缺点。
- Liner Regression - Gradient descent formula summary:
- 复杂的模型在test data上不一定有更好的表现,有可能是overfitting(过拟合)
- overfit的解决方法:1. 增加input数据集 2. regularization
- Regularization (正则化)
- 不但要选择一个loss小的function,还要选择一个平滑的function(正则化使函数更平滑, 因为w比较小) - smoother function is more likely to be correct
- 大,找出来的function就比较smooth。反之,找出来的则不太smooth. 在由小到大变化的过程中,函数不止要考虑loss最小化还要考虑weights最小化,所以对training error最小化的考虑就会相对(于没有正则化的时候)减小,因此training error会随着增大而增大。test error则先减小再增大。
2. Error
- Bias: ; Variance: ; ; want low bias & low variance
- when using low degree(simple) models, variance is small, while complicate model leads to large variance. 简单的模型受采样数据的影响较小
- Bias: If we average all the , it is close to . 是每次训练的最佳函数(model)解(注:每次训练包含多个数据样本-sample data),而是真实的函数(model)。
- simple models have larger bias & smaller variance, while complicate models have smaller bias & larger variance.
- 如果error来自于variance很大,说明现在的模型是overfitting;如果error来自bias很大,说明现在的模型是underfitting
- 如果模型没法fit training data,说明此时bias很大;如果模型很fit training data, 但是很不fit test data,说明此时variance很大
- For large bias: add more feature, make a more complicate model
- For large variance: get more data, or regularization (所有曲线都会变得比较平滑)
- Cross Validation: Training Set, Validation Set, Testing Set (Public, Private)
- N-fold Cross Validation - 交叉验证: 可以先分成training set和validation set, train的用来训练model, validation的用来挑选model。选定model之后再用整个data set (training set+validation set)来重新train这个model的参数
3. Gradient Descent
- L: loss function, : parameters
- 假设有两个变量 , 则:
- ;