[Image] Unsupervised Learning

Unsupervised Learning

Motivation: Before, some assumptions we adopted:
 1. a function that maps the the observed input to output;
 2. a training dataset to train the parameter for the model;
Without these assumption, we have
 1. unknown functional form: non-parametric density estimation: only get the probability value, without knowing the exact form of PDF(probability density function);
 2. only data, no output, data clustering.

Density Estimation

  • Histogram: discretize the feature space into bins and count
    • Pro:
      • with infinite amount of data any density can be approximated arbitrarily well >> approach continuous
      • computationally simple
    • Con:
      • curse of dimensionality: the number of bins, and thus the number of data increase exponentially with the increase of data dimensionality;
      • the size of bins is hard to determine, and even no optimal size.
  • Kernel density estimation

    • we are given an data point x first, then we need to output its probability Pr(x);
      Pr(x)=KNV=1Nhdi=1Nkh(xxi) P r ( x ) = K N V = 1 N h d ∑ i = 1 N k h ( x − x i )
    • where:
      • hd h d is the volume in d-dimensionality space;
      • N is the total number of given dataset
      • kh k h is the kernel function, where h stands for the kernel width;
    • Understanding:

      In general way, the kernel function kh(xxi) k h ( x − x i ) defines a weight function, which depends on the distance to examined location in feature space.
      While the denominator NV N V can account for the terminology density, where over them accounts for the calculation of density.

    • Different kernel function:

      • Parzen window estimator:
  • Bias-variance trade-off: refers to the influence radius of a point
    • For example, bin width of histogram, kernel width of kernel function, number of neighbors for kNN;
    • Too large radius result in too smooth case >> has large bias: a multimodal is mistakenly fitted into a single peak Gaussian
    • Too small radius result in too variant case >> has large variance: a single peak Gaussian is mistakenly fitted into a multimodal function;

Mixture Model

  • Parameters we want to get in Expectation maximization(EM)
    • the mean μi μ i , vcm i ∑ i and occurrence probability (or called mixing coefficient for the i-th Gaussian distribution) wi w i ;
    • The updation of μi μ i , vcm
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值