数据挖掘
circle_yy
努力成为大神的小菜鸡
展开
-
简单eda+baseline二手车预测(改自天池baseline)
首先把训练集读进来简单看看各列的情况,主要看一下预测目标price的情况,发现均值在5900左右,标准差在7500左右,然而最大值居然有99999,可以看出事情不简单,回归题最怕存在离群点…import pandas as pdimport numpy as npimport warningswarnings.filterwarnings('ignore')pd.set_option('...原创 2020-03-24 21:56:29 · 823 阅读 · 0 评论 -
离散型变量的数据分析
import pandas as pdpd.set_option('display.max_column',30)import numpy as npimport statsmodels.api as smimport matplotlib.pyplot as pltimport seaborn as snssns.set()from pylab import rcParams #...原创 2019-12-11 22:35:37 · 5148 阅读 · 0 评论 -
GDBT特征重要性可视化
import pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import GradientBoostingClassifierfrom sklearn.externals import joblibdata = pd.read_csv(r"./data_trai...原创 2019-12-11 21:09:32 · 3651 阅读 · 0 评论 -
数据挖掘实战一:输入预测分类
# 导入第三方包import pandas as pdimport numpy as npimport seaborn as sns# 数据读取income = pd.read_excel(r'./income.xlsx')income.head()#了解数据的大体结构。输出前几行 age workclass ...原创 2019-12-11 14:20:48 · 1012 阅读 · 1 评论 -
使用GridSearchCV对CatBoostClassifier分类器调参
实战:params = {'depth': [4, 6, 10], 'learning_rate' : [0.05, 0.1, 0.15],# 'l2_leaf_reg': [1,4,9]# 'iterations': [1200],# 'early_stopping_rounds':[1000],# ...原创 2019-09-20 16:48:07 · 7229 阅读 · 1 评论