Prophet模型预测商品销售量

最新推荐文章于 2024-08-08 16:30:11 发布

Goodsta

最新推荐文章于 2024-08-08 16:30:11 发布

阅读量6.3k

点赞数 1

文章标签：机器学习

本文链接：https://blog.csdn.net/wong2016/article/details/86674043

版权

本案例通过R语言的Prophet模型分析2013年至2017年商店商品销售数据，探讨销售量的年、月、日变化趋势，揭示商店间和商品间的销售差异，并预测未来3个月的销售量。数据源自Kaggle竞赛，Prophet模型能够捕捉季节性和趋势变化，但需要优化以减少过拟合和考虑假日效应。

摘要由CSDN通过智能技术生成

机器学习训练营（入群联系qq：2279055353）—— 机器学习案例详解的直播互动平台
下期直播案例预告：大数据预测商品的销售量波动趋势

案例介绍

像沃尔玛、家乐福这样的超市零售商，使用销售预报系统和工具补充商品。一套完善的预报系统有助于赢得其它的供应链渠道。如果你擅长预测商品未来一段时间的销量，你就能合理地安排你的库存盘点和分类。

该案例要求根据提供的2013年至2017年商店-商品的销售数据，使用时间序列技术，预测10家不同的商店的50种不同的商品在未来3个月的销售量。

数据来源：Kaggle 竞赛
代码实现：R 语言

数据文件

训练集train.csv与检验集test.csv. 请到以下百度网盘下载：

链接：https://pan.baidu.com/s/1j2FCvikgR2aPkfW2bDtLsA 密码：mmby

加载R包

首先，加载必需的R包。

rm(list=ls())

suppressMessages(library(data.table))
suppressMessages(library(DT))
suppressMessages(library(timeSeries))
suppressMessages(library(tidyverse))
suppressMessages(library(reshape))
suppressMessages(library(stringr))
suppressMessages(library(doBy))
suppressMessages(library(formattable))
suppressMessages(library(gridExtra))

suppressMessages(library(ggplot2))
suppressMessages(library(plotly))
suppressMessages(library(corrplot))
suppressMessages(library(wesanderson))
suppressMessages(library(RColorBrewer))
suppressMessages(library(gridExtra))
suppressMessages(library(zoo))

suppressMessages(library(forecast))
suppressMessages(library(prophet)) 

set.seed(2018)

加载数据集

train=fread("e:/kaggle_exercises/forecast/input/train.csv")
sprintf("The train data set has %d rows and %d columns", nrow(train), ncol(train) )
str(train)

test  <- fread("e:/kaggle_exercises/forecast/input/test.csv")
sprintf("The test data set has %d rows and %d columns", nrow(test), ncol(test) )
str(test)

print("the summary of train sales is:")
summary(train$sales)

在这里插入图片描述

从变量date里提取“年、月”变量

# Extraction of Year and Month of Year :
train$Year=year(train$date)        #returns the year from date i.e. 2013, 2014 etc.
train$Month=as.yearmon(train$date) #this yearmon() function is coming from zoo package returns the month of an year i.e Jan 2013, Feb 2015 etc
head(train)

缺失值检查

colSums(is.na(train))
# Function 1 : For ploting missing value
plot_missing <- function(data, title = NULL, ggtheme = theme_gray(), theme_config = list("legend.position" = c("bottom"))) {
  ## Declare variable first to pass R CMD check
  feature <- num_missing <- pct_missing <- group <- NULL
  ## Check if input is data.table
  is_data_table <- is.data.table(data)
  ## Detect input data class
  data_class <- class(data)
  ## Set data to data.table
  if (!is_data_table) data <- data.table(data)
  ## Extract missing value distribution
  missing_value <- data.table(
    "feature" = names(data),
    "num_missing" = sapply(data, function(x) {sum(is.na(x))})
  )
  missing_value[, feature := factor(feature, levels = feature[order(-rank(num_missing))])]
  missing_value[, pct_missing := num_missing / nrow(data)]
  missing_value[pct_missing < 0.05, group := "Good"]
  missing_value[pct_missing >= 0.05 & pct_missing < 0.4, group := "OK"]
  missing_value[pct_missing >= 0.4 & pct_missing < 0.8, group := "Bad"]
  missing_value[pct_missing >= 0.8, group := "Remove"][]
  ## Set data class back to original
  if (!is_data_table) class(missing_value) <- data_class
  ## Create ggplot object
  output <- ggplot(missing_value, aes_string(x = "feature", y = "num_missing", fill = "group")) +
    geom_bar(stat = "identity") +
    geom_text(aes(label = paste0(round(100 * pct_missing, 2), "%"))) +
    scale_fill_manual("Group", values = c("Good" = "#1a9641", "OK" = "#a6d96a", "Bad" = "#fdae61", "Remove" = "#d7191c"), breaks = c("Good", "OK", "Bad", "Remove")) +
    scale_y_continuous(labels = comma) +
    coord_flip() +
    xlab("Features") + ylab("Number of missing rows") +
    ggtitle(title) +
    ggtheme + theme_linedraw()+
    do.call(theme, theme_config)
  ## Print plot
  print(outpu