kaggle比赛 - 销量预测实战全记录

最新推荐文章于 2024-02-23 19:31:44 发布

VIP文章 Elffer

最新推荐文章于 2024-02-23 19:31:44 发布

阅读量3k

点赞数

分类专栏：机器学习 AI应用

本文链接：https://blog.csdn.net/bmwlwg/article/details/105565926

版权

M5 Forecasting - Accuracy

总体思路

TODO LIST （端午节完成，目前基础资料已齐备，不需要过多收集资料）:

[DONE] 1. 完成数据分析，分五个方面-我的数据分析baseline；
[DONE] 2. 完成数据工程pipeline - 基于lightGBM 模型解析；
[DONE] 3. 压缩数据量，从原有数据中随机抽取1/10；
[DONE] 4. 构建RMSE， MAPE, WMAPE评估方式；
[CLOSED] 5. 单lightGBM模型基于Grid Search 方法调参；
[CLOSED] 6. 基于多模型Prophet, RF, LightGBM进行集成；
[DOING] 7. 整理报告；
8. 扩展，用于大仓，高频SKU预测；

Part I. 课题了解；

1.1 OBJECT:

How much camping gear will one store sell each month in a year?

In this competition, the fifth iteration, you will use hierarchical sales data from Walmart, the world’s largest company by revenue, to forecast daily sales for the next 28 days.

1.1.1 输入数据

The data, covers stores in three US States (California, Texas, and Wisconsin) and includes item level, department, product categories, and store details. In addition, it has explanatory variables such as price, promotions, day of the week, and special events. Together, this robust dataset can be used to improve forecasting accuracy.

Time range: [2011-01-29 , 2016-06-19]

Totally 1969 days.

/kaggle/input/m5-forecasting-accuracy/sample_submission.csv
/kaggle/input/m5-forecasting-accuracy/sales_train_validation.csv
/kaggle/input/m5-forecasting-accuracy/sell_prices.csv
/kaggle/input/m5-forecasting-accuracy/calendar.csv

1.1.2 输出结果与输出形式

Each row contains an id that is a concatenation of an item_id and a store_id, which is either validation (corresponding to the Public leaderboard), or evaluation (corresponding to the Private leaderboard).

In the challenge, you are predicting item sales at stores in various locations for two 28-day time periods.

id,F1,...F28
HOBBIES_1_001_CA_1_validation,0,...,2
HOBBIES_1_002_CA_1_validation,2,...,11
...
HOBBIES_1_001_CA_1_evaluation,3,...,7
HOBBIES_1_002_CA_1_evaluation,1,...,4

1.1.3 baseline模型（见附录）

1.1.4 优化方法论：

数据分析，根据销售数量分布来选择模型，如是否有规律可循，是否量很少，销量是否平稳，是否有异常点outlier；
将历史销量，结合滑动平均作为特征来预测未来销量；

import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import seaborn as sns
from itertools import cycle
pd.set_option('max_columns', 50)
plt.style.use('bmh')
color_pal = plt.rcParams['axes.prop_cycle'].by_key()['color']
color_cycle = cycle(plt.rcParams['axes.prop_cycle'].by_key()['color'])

# Read in the data
INPUT_DIR = '../input/m5-forecasting-accuracy'
cal = pd.read_csv(f'{INPUT_DIR}/calendar.csv')
stv = pd.read_csv(f'{INPUT_DIR}/sales_train_validation.csv')
ss = pd.read_csv(f'{INPUT_DIR}/sample_submission.csv')
sellp = pd.read_csv(f'{INPUT_DIR}/sell_prices.csv')

Part II. 数据分析

2.1 数据范围分析

Visualizing the data for a single item

d_cols = [c for c in stv.columns if 'd_' in c] # sales data columns

# Below we are chaining the following steps in pandas:
# 1. Select the item.
# 2. Set the id as the index, Keep only sales data columns
# 3. Transform so it's a column
# 4. Plot the data
stv.loc[stv['id'] == 'FOODS_3_090_CA_3_validation'] \
    .set_index('id')[d_cols] \
    .T \
    .plot(figsize=(15, 5),
          title='FOODS_3_090_CA_3 sales by "d" number',
          color=next(color_cycle))
plt.legend('')
plt.show()

最低0.47元/天解锁文章

Elffer

关注

0
点赞
踩
17

收藏

觉得还不错? 一键收藏
0
评论
kaggle比赛 - 销量预测实战全记录

M5 Forecasting - Accuracy总体思路Part I. 课题了解；输入数据输出结果与输出形式Part II. 数据分析2.1 数据范围分析2.2数据质量（异常值处理）2.3数据相关性分析2.4 初步特征工程，2.5 建一个baseline宽表；2.6 建立pipelinePart III.预测模型3.1 模...
复制链接

扫一扫