Kaggle
文章平均质量分 61
pingzishinee
这个作者很懒,什么都没留下…
展开
-
Learn: OverfittingAndUnderfitting、一种缓解方式之决策回归树中设置max_leaf_nodes
过拟合、欠拟合以决策树为例,来说。dataset被划分到叶子,树太浅,假如数据集仅被split成了2个groups(划分的粒度特粗),每个group里必然特别多的houses。如果树特别地深,假如数据集被split成了1024个groups(划分的粒度特粗),叶子特别多,每个叶子上的houses则特别地少。简单说,树太deep,易发生过拟合。树太shallow,易发生欠拟合。Overf...原创 2018-11-28 21:16:12 · 4158 阅读 · 1 评论 -
Housing Prices Competition
# Code you have previously used to load dataimport pandas as pdfrom sklearn.ensemble import RandomForestRegressorfrom sklearn.metrics import mean_absolute_errorfrom sklearn.model_selection import ...原创 2018-11-29 17:49:49 · 428 阅读 · 0 评论 -
Learn:Build Meachine Learning Model——以预测Melbourne房子价格为例
预测Melbourne房子价格有监督模型采用的决策树回归模型导入数据,初步分析import pandas as pdmelbourne_file_path =r'G:\kaggle\melb_data.csv'melbourne_data = pd.read_csv(melbourne_file_path) melbourne_data.columnsIndex([u'Subu...原创 2018-11-27 20:58:37 · 460 阅读 · 0 评论 -
Learn: Model Validation
文章目录Mean Absolute Error(MAE)Model算MAE:mean absolute error"In-Sample" Scores的问题解决"In-Sample" Scores问题分割数据集为训练集和验证集(train_test_split())用训练集训练模型用验证集测试模型Mean Absolute Error(MAE)There are many metrics fo...原创 2018-11-28 12:05:36 · 227 阅读 · 0 评论 -
One-Hot Encoding独热编码
one-hot encoding:The Standard Approach for Categorical FeaturesCategorical feature:如,color of flowers: yellow, red, green。one-hot encoding:一种码制,有多少个状态(或者叫类别值)就有多少个比特,且只有一个比特为1,其它全为0.Pandas offers ...原创 2018-12-13 10:32:58 · 1611 阅读 · 0 评论 -
PartialDependencePlots
部分依赖图是一个extract insights from complex models的好方法。部分依赖图显示了目标相应和一组特征之间的独立性,排除了其他所有的特征。直观的,可将部分依赖解释为预期的目标响应,和目标特征的函数。key code:from sklearn.ensemble.partial_dependence import partial_dependence, plot_p...原创 2018-12-19 16:46:33 · 7436 阅读 · 5 评论 -
XGboost模型训练与调参
XGboostXGboost是(Gradient Boosting Decision Tree)梯度提升树的一种实现。DBDT Model cycle:Exampleimport pandas as pddata= pd.read_csv(r'G:\kaggle\housePrice\train.csv')data.head() ...原创 2018-12-14 16:50:40 · 3028 阅读 · 0 评论 -
Missing Values(缺失值)
缺失值之心里有数import pandas as pddata= pd.read_csv(r'G:\kaggle\melb_data.csv')#统计缺失值的数量missing_val_count_by_column= data.isnull().sum()missing_val_count_by_column #pandas Series类型Suburb ...原创 2018-12-11 22:37:21 · 7241 阅读 · 0 评论