2015-09-20
http://scikit-learn.org/stable/modules/ensemble.html
1. Ensemble methods is to combinethe predictions of several base estimators built with a given learningalgorithm in order to improve generalizability/robustness over a singleestimator.Two families of ensemble methods are usually distinguished:
² In averaging methods, the driving principles is to build several estimatorsindependently and then to average their predictions. On average, the combinedestimator is usually better than any of the single base estimator because its varianceis reduced.
Examples: Bagging methods, Forests ofrandomized trees
² By contrast, in boosting methods, best estimators are builtsequentially and one tries to reduce the bias of the combined estimator. Themotivation is to combine several weak models to produce a powerful ensemble.
Examples: Adaboost, Gradient TreeBoosting
2. Bagging meta-estimator
² Simple
² Reduce overfitting
² Bagging methods work best with strong and complex models(e.g, fullydeveloped decision trees), in contrast with boosting methods which usually workbest with weak models(e.g , shallow decision trees)
² Bagging 分类
3. Forests of randomized trees
今日完成:
1. 特征提取算法查看
2. 随机森林原理的查看
3. 过拟合
4. 看了数据挖掘公开课和英语公开课
5. 平板支撑回去做
问题:
1. 为什么会遇到k-means测试集为1.0的准确率的情况
2. 去掉了某一特征之后发现效果确实有点变好,仍旧不知道这是不是categorical data的问题
3. K-means 有很多改进的方法,还有很多论文没有看
明日计划:
1. 看论文,把k-means相关改进的算法都大概看一遍
2. 解决feature去掉之后的效果问题
3. 归一化,各种特征提取方法的尝试(后天处理吧)