机器学习简单笔记

最新推荐文章于 2023-03-25 17:26:42 发布

feilengcui008

最新推荐文章于 2023-03-25 17:26:42 发布

阅读量815

点赞数 1

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/feilengcui008/article/details/48395575

版权

机器学习专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Machine Learning Simple Notes

(一些基础的notes…)

Basics

Machine Learning

Model(模型) + Evaluation(评估标准) + Optimization(优化算法) + Validation(验证)
Using datasets D to learn specific model G from model space(hypothesis space) H
so that G is close to the best model F in H.

ML Problem Classification

out space
- Classification
  - binary-class
  - multi-class
    - one-v-all:k binary Classifiers
    - one-v-one:train a binary classifier for each pair of class,
      k(k-1)/2 SVM binary classifiers
    - softmax regression(multiclass logistic regression)
  - structure(sentence classification)
- Regression
- Clustering
data input space
- Supervised Learning
- Unsupervised Learning
- Semi-supervised Learning
- Reinforce Learning
algorithms space
- Parameter
- Non-parameter
  - K-nearest Neighbors
  - Kernel Estimation
  - Locally Weighted Linear Regression
- Semi-parameter
learning with different protocols
- batch learning
- online learning
- active learning
features input space
- concrete features
- raw features
- abstract features

Theories

Describe the learning feasibility and learning process(PAC learning theory)
- finite hypothesis space H
- infinite hypothesis space H
  - dichotomy
    (a specific combination case of N input samples),denotes
    H(S1, S2, S3,…, Sn)(函数簇，每个函数簇中的假设函数属于同一dichotomy)
  - growth function
    (maximum dichotomy[最大函数簇] for a specific hypothesis space H and N samples), denots mH(N)
  - bound function
    (maximum growth function for a specific break point k and N samples with different H)
    denotes B(N,k), B(N,k) <= B(N-1,k) + B(N-1,k-1) < N^(k-1) = N^(Dvc)
  - 本质上由于样本集合是稀疏离散的，导致无限维假设空间中只有某些簇对样本能起到有效分割，每一簇中的假设函数对此样本集合起到相同的分割作用，也就是在hoeffding不等式中，每一簇中的假设函数导致|E(in,h)-E(out,h)|>epsilon误差的概率是重合的(不独立)，其概率P(|E(in,h)-E(out,h)|>epsilon)是同时发生的，反映所有假设函数P(E(in,h)-E(out,h)>epsilon)的并集不会无限增大，及不等式右侧2*M*e{-2*epsilon^2*N}中的M退化为有限维，而此维数受到样本集合大小N限制。从而转变为有限维的情况，而在有限维情况下有结论：给定足够大的N，能够使得Probably Approximately Correct(PAC)下，有
    P(for any h belong to H,|E(in,h)-E(out,h)|>epsilon) < 2*M*e{-2*epsilon^2*N}，由此从理论上证明了学习背后的可行性和正确性。意即确实学习了，improve了。
- Empirical Risk Minimization
- VC bound
  - for any h in H
    P(|E(in,h)-E(out,h)|>epsilon) <= 4*mH(2N)exp(-1/8*epsilon^2*N) <= 4(2N)^(k-1)*exp()
- VC dimension:Maximum_not_break_point(minimum_break_point-1)
  - Dvc不是无限大(存在break_point),给定足够大的N,能保证PAC条件下hoeffding不等式
    成立。即在一定误差下能用E(in)估计E(out),即能用输入datasets估计generalization的情况。
  - 假设函数空间H最多能把Dvc数量的样本集合shatter，即能学习样本集合的所有组合,可以
    衡量假设函数空间的学习复杂度
  - VC dimension与输入样本特征数量的关系:Dvc = d(特征数量)+1
    - 证明思路：Dvc <= d+1 && Dvc >= d+1
    - Dvc <= d+1 <=>
      对于d+2个输入的任意组合,不能shatter <=>
      不存在W,使得WX=Y成立 <=>
      由于X是(N+2)*(N+1)维的,而且输入向量之间线性无关,所以方程个数大于(W)自由变量的个数,导致W无解
    - Dvc >= d+1 <=>
      能shatter d+1个输入的某个组合 <=>
      存在W,使得WX=Y成立 <=>
      由于X是(N+1)*(N+1)维的,且可构造X为正定矩阵,所以X可逆,即W=inv(X)Y,有解
  - 衡量假设函数空间的学习复杂度(自由度)，或者说从另一方面衡量样本集合的学习能力
  - VC dimension与样本数量N、特征维数的关系
  - with propability 1-a
    E(out,g) <= E(in,g) + sqrt(8/N*ln[4*((2N)^Dvc)/a]) = E(in,g) + omega(N,H,a)(模型复杂度model complexity)
  - sample compexity N
Bias and Variance(Underfitting and Overfitting) trade-off
- underfitting
- overfitting
  - datasets too small
  - noise too large(stochastic noise and deterministic noise[depends on H])
  - model too compexity(VC dimension too big)
Regularization
- 本质：通过增加对特征系数w的限制，减少需要搜索的假设空间的复杂度或者说维数，从而使得模型复杂度较高的同时较少deterministic noise，从而得到较好的tradeoff，
  另一方面，从贝叶斯角度，相当于添加了先验知识(先验概率)，然后极大化后验概率。
Training Error(Risk)
Model Selection
Feature Selection
Cross Validation
Model Metrics
- accuracy,precision,recall
  - accuracy = TP+TN/TP+TF+NP+NF ==> 正确分类的比例
  - precision = TP/TP+FP ==> 衡量不会将负样本错误分类为正样本的概率(欺诈检测)
  - recall = TP/TP+FN ==> 衡量从样本集区分全体正样本的能力
- confusion_matrix
  - 行代表prediction类别，列代表事实上的类别，A(i,j)表示将类别i分类为类别j的数量
- f1-measure = 2*(precision*recall)/(precision+recall)[in f-measure where a=1]
  - f-measure = (a^2+1)*(precision*recall)/(precision+recall)
  - 综合考虑精确率与召回率的影响
- ROC and AUC ==> inbalanced datasets

Algorithms(Models)

Supervised Learning

1.Least Square Mean(LSM)
2.Logistic Regression(LR)
3.Percepton
4.Naive Bayes(NB)
5.Support Vector Machine(SVM)
6.Decision Tree
7.Neighbors
8.Linear Discriminative Analysis(LDA)
9.Resemble Methods
- (1)Boosting : Gradient Boosted Decision[Regression] Trees(GBRT[GBDT])
- (2)Bagging : Random Forests

Unsupervised Learning

Clustering / Dimension Reduction / Density Estimation
- 1.KMeans Clustering
- 2.Hierarchy Clustering
- 3.Expectation Maxmization(EM)
- 4.Gaussian Mixture Models(GMM)
- 5.Density-Based Spatial Clustering of Applications with Noise(DBSCAN)
- 6.Mean Shift

Others

1.Artificial Neural Network and Deep Learning
2.Dimension Reduction
- (1)PCA/Kernel PCA
- (2)Matrix Factorization/SVD
3.Gaussian Process
4.Bayesian Network and Graphical Models
5.LDA
6.PageRank
7.Apriori
8.Empirical Risk Minimization(ERM)

Techniques

This section includes some theories aspects and techniques used in machine learning.
- Normalization
- Principle Componets Ananysis
- Singular Value Decomposition
- Matrix Factorization

inbalanced datasets
- oversampling and undersampling
- different metrics such as AUC
- ensemble different algorithms
- cost matrix
nonlinear transformation for linear models
拉格朗日数乘法的本质
- min{ f(x) } , s.t. g(x) <= 0
- L(x) = f(x) + C*g(x) with C>0
- L(x)’ = f’ + C*g’ = 0 ==> 假设f的等值线与g在x0处相交且最小，f’和g’在x0处的法向量共线，
  也即f的梯度没有沿着g曲线的切向分量了，梯度下降停止

ref:机器学习基石课程

feilengcui008

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习简单笔记

Machine Learning Simple Notes(一些很基础的notes…)BasicsMachine LearningModel(模型) + Evaluation(评估标准) + Optimization(优化算法) + Validation(验证)Using datasets D to learn specific model G from model space(hypothes
复制链接

扫一扫