机器学习简单笔记

Machine Learning Simple Notes

(一些基础的notes…)

Basics
Machine Learning
  • Model(模型) + Evaluation(评估标准) + Optimization(优化算法) + Validation(验证)
  • Using datasets D to learn specific model G from model space(hypothesis space) H
    so that G is close to the best model F in H.
ML Problem Classification
  • out space
    • Classification
      • binary-class
      • multi-class
        • one-v-all:k binary Classifiers
        • one-v-one:train a binary classifier for each pair of class,
          k(k-1)/2 SVM binary classifiers
        • softmax regression(multiclass logistic regression)
      • structure(sentence classification)
    • Regression
    • Clustering
  • data input space
    • Supervised Learning
    • Unsupervised Learning
    • Semi-supervised Learning
    • Reinforce Learning
  • algorithms space
    • Parameter
    • Non-parameter
      • K-nearest Neighbors
      • Kernel Estimation
      • Locally Weighted Linear Regression
    • Semi-parameter
  • learning with different protocols
    • batch learning
    • online learning
    • active learning
  • features input space
    • concrete features
    • raw features
    • abstract features
Theories
  • Describe the learning feasibility and learning process(PAC learning theory)

    • finite hypothesis space H
    • infinite hypothesis space H
      • dichotomy
        (a specific combination case of N input samples),denotes
        H(S1, S2, S3,…, Sn)(函数簇,每个函数簇中的假设函数属于同一dichotomy)
      • growth function
        (maximum dichotomy[最大函数簇] for a specific hypothesis space H and N samples), denots mH(N)
      • bound function
        (maximum growth function for a specific break point k and N samples with different H)
        denotes B(N,k), B(N,k) <= B(N-1,k) + B(N-1,k-1) < N^(k-1) = N^(Dvc)
      • 本质上由于样本集合是稀疏离散的,导致无限维假设空间中只有某些簇对样本能起到有效分割,每一簇中的假设函数对此样本集合起到相同的分割作用,也就是在hoeffding不等式中,每一簇中的假设函数导致|E(in,h)-E(out,h)|>epsilon误差的概率是重合的(不独立),其概率P(|E(in,h)-E(out,h)|>epsilon)是同时发生的,反映所有假设函数P(E(in,h)-E(out,h)>epsilon)的并集不会无限增大,及不等式右侧2*M*e{-2*epsilon^2*N}中的M退化为有限维,而此维数受到样本集合大小N限制。从而转变为有限维的情况,而在有限维情况下有结论:给定足够大的N,能够使得Probably Approximately Correct(PAC)下,有
        P(for any h belong to H,|E(in,h)-E(out,h)|>epsilon) < 2*M*e{-2*epsilon^2*N},由此从理论上证明了学习背后的可行性和正确性。意即确实学习了,improve了。
    • Empirical Risk Minimization
    • VC bound
      • for any h in H
        P(|E(in,h)-E(out,h)|>epsilon) <= 4*mH(2N)exp(-1/8*epsilon^2*N) <= 4(2N)^(k-1)*exp()
    • VC dimension:Maximum_not_break_point(minimum_break_point-1)
      • Dvc不是无限大(存在break_point),给定足够大的N,能保证PAC条件下hoeffding不等式
        成立。即在一定误差下能用E(in)估计E(out),即能用输入datasets估计generalization的情况。
      • 假设函数空间H最多能把Dvc数量的样本集合shatter,即能学习样本集合的所有组合,可以
        衡量假设函数空间的学习复杂度
      • VC dimension与输入样本特征数量的关系:Dvc = d(特征数量)+1
        • 证明思路:Dvc <= d+1 && Dvc >= d+1
        • Dvc <= d+1 <=>
          对于d+2个输入的任意组合,不能shatter <=>
          不存在W,使得WX=Y成立 <=>
          由于X是(N+2)*(N+1)维的,而且输入向量之间线性无关,所以方程个数大于(W)自由变量的个数,导致W无解
        • Dvc >= d+1 <=>
          能shatter d+1个输入的某个组合 <=>
          存在W,使得WX=Y成立 <=>
          由于X是(N+1)*(N+1)维的,且可构造X为正定矩阵,所以X可逆,即W=inv(X)Y,有解
      • 衡量假设函数空间的学习复杂度(自由度),或者说从另一方面衡量样本集合的学习能力
      • VC dimension与样本数量N、特征维数的关系
      • with propability 1-a
        E(out,g) <= E(in,g) + sqrt(8/N*ln[4*((2N)^Dvc)/a]) = E(in,g) + omega(N,H,a)(模型复杂度model complexity)
      • sample compexity N
  • Bias and Variance(Underfitting and Overfitting) trade-off

    • underfitting
    • overfitting
      • datasets too small
      • noise too large(stochastic noise and deterministic noise[depends on H])
      • model too compexity(VC dimension too big)
  • Regularization
    • 本质:通过增加对特征系数w的限制,减少需要搜索的假设空间的复杂度或者说维数,从而使得模型复杂度较高的同时较少deterministic noise,从而得到较好的tradeoff,
      另一方面,从贝叶斯角度,相当于添加了先验知识(先验概率),然后极大化后验概率。
  • Training Error(Risk)
  • Model Selection
  • Feature Selection
  • Cross Validation
  • Model Metrics
    • accuracy,precision,recall
      • accuracy = TP+TN/TP+TF+NP+NF ==> 正确分类的比例
      • precision = TP/TP+FP ==> 衡量不会将负样本错误分类为正样本的概率(欺诈检测)
      • recall = TP/TP+FN ==> 衡量从样本集区分全体正样本的能力
    • confusion_matrix
      • 行代表prediction类别,列代表事实上的类别,A(i,j)表示将类别i分类为类别j的数量
    • f1-measure = 2*(precision*recall)/(precision+recall)[in f-measure where a=1]
      • f-measure = (a^2+1)*(precision*recall)/(precision+recall)
      • 综合考虑精确率与召回率的影响
    • ROC and AUC ==> inbalanced datasets
Algorithms(Models)
Supervised Learning
  • 1.Least Square Mean(LSM)
  • 2.Logistic Regression(LR)
  • 3.Percepton
  • 4.Naive Bayes(NB)
  • 5.Support Vector Machine(SVM)
  • 6.Decision Tree
  • 7.Neighbors
  • 8.Linear Discriminative Analysis(LDA)
  • 9.Resemble Methods
    • (1)Boosting : Gradient Boosted Decision[Regression] Trees(GBRT[GBDT])
    • (2)Bagging : Random Forests
Unsupervised Learning

Clustering / Dimension Reduction / Density Estimation
- 1.KMeans Clustering
- 2.Hierarchy Clustering
- 3.Expectation Maxmization(EM)
- 4.Gaussian Mixture Models(GMM)
- 5.Density-Based Spatial Clustering of Applications with Noise(DBSCAN)
- 6.Mean Shift

Others
  • 1.Artificial Neural Network and Deep Learning
  • 2.Dimension Reduction
    • (1)PCA/Kernel PCA
    • (2)Matrix Factorization/SVD
  • 3.Gaussian Process
  • 4.Bayesian Network and Graphical Models
  • 5.LDA
  • 6.PageRank
  • 7.Apriori
  • 8.Empirical Risk Minimization(ERM)
Techniques

This section includes some theories aspects and techniques used in machine learning.
- Normalization
- Principle Componets Ananysis
- Singular Value Decomposition
- Matrix Factorization

  • inbalanced datasets

    • oversampling and undersampling
    • different metrics such as AUC
    • ensemble different algorithms
    • cost matrix
  • nonlinear transformation for linear models

  • 拉格朗日数乘法的本质

    • min{ f(x) } , s.t. g(x) <= 0
    • L(x) = f(x) + C*g(x) with C>0
    • L(x)’ = f’ + C*g’ = 0 ==> 假设f的等值线与g在x0处相交且最小,f’和g’在x0处的法向量共线,
      也即f的梯度没有沿着g曲线的切向分量了,梯度下降停止

ref:机器学习基石课程

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值