This is a memo to share what I have learnt in Machine Learning with Tree-Based Models (using Python), capturing the learning objectives as well as my personal notes. The course is taught by Elie Kawerk from DataCamp.
这是一份备忘录,分享了我在基于树的模型(使用Python)的机器学习中学到的知识,记录了学习目标以及我的个人笔记。 该课程由DataCamp的Elie Kawerk教授。
Decision trees are supervised learning models used for problems involving classification and regression.
决策树是用于涉及分类和回归问题的监督学习模型。
I have learnt the following topics:
我已经学习了以下主题:
Use Python to train decision trees and tree-based models.
使用Python训练决策树和基于树的模型。
Decision-Tree Learning, applying CART algorithm to train decision trees for classification/regression problems.
决策树学习,应用CART算法训练分类/回归问题的决策树。
Generalization Error of a supervised learning model, to diagnose underfitting and overfitting using Cross-Validation.
监督学习模型的泛化误差,用于使用交叉验证诊断拟合不足和拟合过度。
Ensembling can produce better results than individual decision trees.
组合可以比单独的决策树产生更好的结果。
Advantages and disadvantages of trees.
树木的优缺点。
Bagging, applied randomization through bootstrapping and constructed a diverse set of trees in an ensemble through bagging.
套袋,通过自举应用随机化,并通过套袋在合奏中构建出一组不同的树。
Random Forests introduces further randomization by sampling features at the level of each node in each tree forming the ensemble.
随机森林通过在形成整体的每棵树中每个节点的级别上对特征进行采样来引入进一步的随机化。
AdaBoost and Gradient-Boosting, ie, an ensemble method in which predictors are trained sequentially and each predictor tries to correct the errors made by its predecessors.
AdaBoost和Gradient-Boosting,即一种集成方法,在该方法中,对预测变量进行顺序训练,并且每个预测变量都尝试纠正其前辈产生的错误。
Subsampling instances and features can lead to a better performance through Stochastic Gradient Boosting.
二次采样实例和功能可以通过随机梯度增强来提高性能。
- Model hyperparameter tuning through Grid Search CV. 通过Grid Search CV进行模型超参数调整。
翻译自: https://medium.com/ai-in-plain-english/machine-learning-with-tree-based-models-51261c4eaae6