python数据分析案例1

import pandas as pd
import matplotlib.pyplot as plt

dir = './data/'
train = pd.read_table(dir + 'train_20171215.txt',engine='python')
test_A = pd.read_table(dir + 'test_A_20171225.txt',engine='python')
sample_A = pd.read_table(dir + 'sample_A_20171225.txt',engine='python',header=None)
sample_A.columns = ['date','day_of_week']

# 因为第一赛季只是预测与时间相关的cnt的数量
# 所以可以对数据以date和day_of_week进行数据合并
train = train.groupby(['date','day_of_week'],as_index=False).cnt.sum()
# print(train)
# plt.plot(train['day_of_week'],train['cnt'],'*')
# plt.show()

#观察星期约束下销量在时间轴上的分布图
# for i in range(7):
#     tmp = train[train['day_of_week']==i+1]
#     plt.subplot(7, 1, i+1)
#     plt.plot(tmp['date'],tmp['cnt'],'*')
# plt.show()

#筛选测试集和训练集
xx_train = train[train['date']<=756]
xx_test = train[train['date']>756]
print('test shape',xx_test.shape)
print('train shape',xx_train.shape)

# 方案零:均值大法(原始数据验证)
from sklearn.metrics import mean_squared_error
# 线下统计每周的均值数据,不加权
xx_train = xx_train.groupby(['day_of_week'],as_index=False).cnt.mean()
xx_result = pd.merge(xx_test,xx_train,on=['day_of_week'],how='left')
print('xx_result shape',xx_result.shape)
print(xx_result)
print(mean_squared_error(xx_result['cnt_x'],xx_result['cnt_y']))
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值