数据竞赛常用代码总结

最新推荐文章于 2024-06-18 08:34:35 发布

我当时害怕极了

最新推荐文章于 2024-06-18 08:34:35 发布

阅读量461

点赞数

文章标签：机器学习可视化数据分析

本文链接：https://blog.csdn.net/m0_46163065/article/details/105981630

版权

构建一个学习算法的推荐方法

1.从一个简单能快速实现的算法开始，实现该算法并用交叉验证集数据测试数据
2.绘制学习曲线，决定增加更多的数据或者时添加更多的特征，还是其他的选择。
3.进行误差分析，人工检查交叉验证集中我们的算法产生预测误差的实例，看看这些实例是否存在某种系统化的趋势。
注意：误差分析并不能总帮助我们判断因该采取什么样的行动，有时候我们需要尝试不同的模型，然后进行比较，在模型比较时，用数值来判断哪一个模型更好，通常我们看交叉验证集的误差。

箱型图异常值处理代码

def outliers_proc(data,col_name,scale=3):
	def box_plot_outliers(data_ser,box_scale):
		iqr = box_scale*(data_ser*quantile(0.75)-data_ser*quantile(0.25))
		val_up = data_ser.quantile(0.75)+iqr
		val_down = data_ser.quantile(0.25)-iqr
		rule_low = (data_ser<val_low)
		rule_up = (data_ser>val_up)
		return (rule_low,rule_up),(val_low,val_up)
	data_n = data.copy()
	data_series = data_n[col_name]
	rule,value = box_plot_outliers(data_series,box_scale=scale)
	index = np.arange(data_series.shape[0])[rule[0]|rule[1]]
	data_n = data_n.drop(index)
	data_n.reset_index(drop=True,inplace=True)
	index_low = np.arange(data_series.shape[0])[rule[0]]
	outliers = data_series.iloc[index_low]
	pd.Series(outliers).describe()

数据中有时间出错的，转换时可以加入参数 errors = “coerce”,忽略错误信息，如果错误有规则可以找到，可以用一个函数进行错误处理。

data["use_time"] = (pd.to_datetime(data["creat_date"],format="%y%m%d",errors="coerce")-\
	pd.to_datetime(data["regdate"],format="%y%m%d",errors="coerce").dt.days

对于cut分箱时要使用labbels=False进行转换

data["cut_powr"] = pd.cut(data["power"],bin,labels=False)
bin = [i*10 for i in range(30) ]

1.过滤式中的相关性分析

method = "spearman“

data["power"].corr(data["price"],method="spearman")

# 图形展示为
data_numeric = data[num_features] 
correlation = data_numeric.corr()
fig,ax = plt.subplots(figsize=(10,10))
sns,heatmap(correlation,square=True,Vmax=0.8)

查看预测值的情况

from scipy  import stats 
plt.figure(1)
sns.distplot(train["price"],fit=stats.norm)
plt.figure(2)
sns.distplot(train["price&

最低0.47元/天解锁文章

我当时害怕极了

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
数据竞赛常用代码总结

箱型图异常值处理代码def outliers_proc(data,col_name,scale=3): def box_plot_outliers(data_ser,box_scale): iqr = box_scale*(data_ser*quantile(0.75)-data_ser*quantile(0.25)) val_up = data_ser.quantile(0.75)+...
复制链接

扫一扫