帆软所有销量为0的显示为空值_商品未来销量预测

最新推荐文章于 2023-09-02 19:30:22 发布

weixin_36316494

最新推荐文章于 2023-09-02 19:30:22 发布

阅读量976

点赞数

文章标签：帆软所有销量为0的显示为空值

本文链接：https://blog.csdn.net/weixin_36316494/article/details/112476466

版权

该项目是针对Kaggle上1C Company提供的销售数据，预测2015年11月的商品销量。数据包括商品、店铺、销售记录等，通过数据处理、特征工程和XGboost模型构建，最终评估采用RMSD。项目重点在于数据预处理，如异常值处理、特征提取和滞后特征构造。

摘要由CSDN通过智能技术生成

1. 关于项目

1.1 背景介绍

这是Kaggle竞赛上的一个项目。项目数据由俄罗斯最大的软件公司之一的 1C Company 提供。数据集包含了2013年1月1日到2015年10月31日该公司各商店的商品销售记录。项目目标是预测该公司接下来2015年11月的商品销量。项目得分使用RMSD(均方根误差，即得分越低代表预测结果的误差越小，预测效果越好。)进行评估。
项目链接：https://www.kaggle.com/c/competitive-data-science-predict-future-sales

1.2 项目数据集说明

共4份数据，item_categories.csv为商品类型与商品类型id一一对应的编码表；items.csv为商品名与商品id和商品类型id一一对应的编码表；shops.csv为门店名与门店id一一对应的编码表；sales.csv为每天，每家门店，每件商品的销售量和销售单价记录明细表。

2.数据处理

导入相关库，并读取文件

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.preprocessing import LabelEncoder
from xgboost import XGBRegressor
from xgboost import plot_importance


items = pd.read_csv(r'/sales/items.csv')
shops = pd.read_csv(r'/sales/shops.csv')
cats=pd.read_csv(r'D:sales/item_categories.csv')
train = pd.read_csv(r'sales/sales_train.csv')
#将索引设置为ID，以避免以后删除
test = pd.read_csv(r'sales/test.csv').set_index("ID")

查看训练集

train.head()

通过箱型图查看价格和销量大致分布，去除异常值

#画图看分布，去异常值
plt.figure(figsize=(10,4))
plt.xlim(-100,3000)
sns.boxplot(x=train.item_cnt_day)

plt.figure(figsize=(10,4))
plt.xlim(train.item_price.min(),train.item_price.max()*1.1)
sns.boxplot(x=train.item_price)

#对销售价格大于100000和销量大于1001的极端值
train = train[train.item_price<100000]
train=train[train.item_cnt_day<1001]

#对价格小于0的行以中位数填充
median= train[(train.shop_id==32)&(train.item_id==2973)&(train.date_block_num==4)&(train.item_price>0)].item_price.median()
train.loc[train.item_price<0,"item_price"]=median

通过观察商店名称列，发现一般是城市名开头，提取城市字段，并将城市名称标准化：

shops['city']=shops['shop_name'].str.split(' ').map(lambda x :x[0])
shops['city_code']=LabelEncoder().fit_transform(shops['city'])
shops = shops[['shop_id','city_code']]

通过对商品品类名称观察，一般是一级类目-二级类目的命名方式，故分布提取，再标准化：

#品类名切分取出类型并进行标准化
cats['split']=cats['item_category_name'].str.split('-')
cats['type']=cats['split'].map(lambda x :x[0].strip())
cats['type_code']=LabelEncoder().fit_transform(cats['type'])

#子类型处理
cats['subtype'] = cats['split'].map(lambda x: x[1].strip() if len(x) > 1 else x[0].strip())
cats['subtype_code'] = LabelEncoder().fit_transform(cats['subtype'])
cats = cats[['item_category_id','type_code', 'subtype_code']]
#剔除商品名
items.drop(['item_name'], axis=1, inplace=True)

要求预测的是2015年11月份当月各商店和商品的月销售总量

最低0.47元/天解锁文章

weixin_36316494

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
帆软所有销量为0的显示为空值_商品未来销量预测

1. 关于项目1.1 背景介绍这是Kaggle竞赛上的一个项目。项目数据由俄罗斯最大的软件公司之一的 1C Company 提供。数据集包含了2013年1月1日到2015年10月31日该公司各商店的商品销售记录。项目目标是预测该公司接下来2015年11月的商品销量。项目得分使用RMSD(均方根误差，即得分越低代表预测结果的误差越小，预测效果越好。)进行评估。项目链接：https://www.kag...
复制链接

扫一扫