第十课 VC维,模型选择,贝叶斯统计

  • VC dimension
  • Model selection
    • cross validation
    • feature selection
  • Bayesian statistics
1.VC dimension

概念:
这里写图片描述
所需要的训练样本的个数m近似线性于H的参数个数。

2.Model selection
2.1cross validation

这里写图片描述
这里写图片描述
当k=m时,称作leave-one-out cross validation.

2.2feature selection

实际上,输入变量太多n远大于m时,容易出现过拟合现象,可以通过筛选有用输入特征的方法减少输入特征。可以使用forward search方法:
这里写图片描述
结束条件可以设置为需要保留的特征个数。类似的,其实条件为F={1,2,…,n}时,逐个减少特征的方法叫做backward search。

上述方法计算量大,另外一种启发式算法Filter feature selection计算量小。该方法可以通过计算互信息量mutual information,通过比较该值的大小来筛选有用特征。
这里写图片描述
根据互信息量大小对特征进行排序后,通常可以使用cross validation确定具体需要的特征个数k。

3.Bayesian statistics and regularization

在贝叶斯观点下,参数 θ 是一个随机变量,在该假设下对y进行估计,计算y的期望就是y的估计值:
这里写图片描述
这里写图片描述
这里写图片描述
实际应用中,对 θ 的积分不好计算,因此我们选择特定的 θ 进行代替,那么得到:
这里写图片描述
可见该假设下的估计与最大可能性估计很相似,不同的是多了 p(θ) ,正是由于这一项的差异,当我们假设 θ 服从0均值高斯分布时,得到的参数norm值更小,因此不会像最大可能性方法一样容易过拟合。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Bayesian model selection is a fundamental part of the Bayesian statistical modeling process. In principle, the Bayesian analysis is straightforward. Specifying the data sampling and prior distributions, a joint probability distribution is used to express the relationships between all the unknowns and the data information. Bayesian inference is implemented based on the posterior distribution, the conditional probability distribution of the unknowns given the data information. The results from the Bayesian posterior inference are then used for the decision making, forecasting, stochastic structure explorations and many other problems. However, the quality of these solutions usually depends on the quality of the constructed Bayesian models. This crucial issue has been realized by researchers and practitioners. Therefore, the Bayesian model selection problems have been extensively investigated. The Bayesian inference on a statistical model was previously complex. It is now possible to implement the various types of the Bayesian inference thanks to advances in computing technology and the use of new sampling methods, including Markov chain Monte Carlo (MCMC). Such developments together with the availability of statistical software have facilitated a rapid growth in the utilization of Bayesian statistical modeling through the computer simulations. Nonetheless, model selection is central to all Bayesian statistical modeling. There is a growing need for evaluating the Bayesian models constructed by the simulation methods.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值