乘用车销量预测_汽车销量预测模型有哪些-CSDN博客

搭建预测模型

根据60款车型在22个细分市场（省份）的销量连续24个月的销量数据，建立销量预测模型；
基于该模型预测同一款车型和相同细分市场在接下来一个季度连续4个月份的销量；

数据字段包括（10个维度）：
1月 - 12月
省份省份编码车型编码车身类型年月销量搜索量对车型相关新闻文章的评论数量对车型的评价数量
初赛60款车型，复赛82款车型
目标
预测次年1月到4月各车型在各细分市场中的销量
评价指标
归一化均方根误差的均值

特征工程

包括传统的时序特征和统计特征，趋势特征，节假日特征。
销量环比：反映当前与历史前n个月的销量变化幅度
销量同比：反映当前与历史同期的销量变化幅度
历史销量：反映历史前n月销量大小
前几个月统计信息
省份、车型、月份编码、省份特征可以很好地对每个样本进行表征。
一阶、二阶差分特征：反应过去到现在的销量变化多少
EMA：该指标用在规则上效果显著
难点在一月和二月
注：16-18年的春节分别在二月、一月、二月
因此对春节所在月份进行了标记，还有当前月距离最近的春节间隔了几个月

def extract_feature(data, history, gap=1):
    
    dataset = data.copy()
    hist = history.copy()
    dataset.reset_index(drop=True, inplace=True)
    hist.reset_index(drop=True, inplace=True)
    
    print('Before data shape: ', dataset.shape)
    print('-' * 30)
    
    '''历史特征'''
    # sale
    last_mt_sales, last_y_pops = [], []
    prev_mt1_sales, prev_mt2_sales, prev_mt3_sales, prev_mt4_sales, prev_mt5_sales, prev_mt6_sales = [], [], [], [], [], []
    prev_mt7_sales, prev_mt8_sales, prev_mt9_sales, prev_mt10_sales, prev_mt11_sales, prev_mt12_sales = [], [], [], [], [], []
    
    last_mt_pops, last_y_pops = [], []
    prev_mt1_pops, prev_mt2_pops, prev_mt3_pops, prev_mt4_pops, prev_mt5_pops, prev_mt6_pops = [], [], [], [], [], []
    prev_mt7_pops, prev_mt8_pops, prev_mt9_pops, prev_mt10_pops, prev_mt11_pops, prev_mt12_pops = [], [], [], [], [], []
    
    for row in dataset.itertuples():
#         row = dataset[i:i+1][['province', 'model', 'regYear', 'regMonth']]
#         values = row.values.tolist()[0]
        province, model, regYear, regMonth = row.province, row.model, row.regYear, row.regMonth
        # 去年同月
        last_mt_sale = hist[(hist['province']==province) & (hist['model']==model) & (hist['regMonth']==regMonth-12)]['salesVolume'].values[0]
        last_mt_sales.append(last_mt_sale)
        last_mt_pop = hist[(hist['province']==province) & (hist['model']==model) & (hist['regMonth']==regMonth-12)]['popularity'].values[0]
        last_mt_pops.append(last_mt_pop)
        # 前几个月sale
        prev_sale1 = hist[(hist['province']==province) & (hist['model']==model) & (hist['regMonth']==regMonth-1)]['salesVolume'].values[0]
        prev_mt1_sales.append(prev_sale1)
        prev_sale2 = hist[(hist['province']==province) & (hist['model']==model) & (hist['regMonth']==regMonth-2)]['salesVolume'].values[0]
        prev_mt2_sales.append(prev_sale2)
        prev_sale3 = hist[(hist['province']==province) & (hist['model']==model) & (hist['regMonth']==regMonth-3)]['salesVolume'].values[0]
        prev_mt3_sales.append(prev_sale3)
        prev_sale4 = hist[(hist['province']==province) & (hist['model']==model) & (hist['regMonth']==regMonth-4)]['salesVolume'].values[0]
        prev_mt4_sales.append(prev_sale4)
        prev_sale5 = hist[(hist['province']==province) & (hist['model']==model) & (hist['regMonth']==regMonth-5)]['salesVolume'].values[0]
        prev_mt5_sales.append(prev_sale5)
        prev_sale6 = hist[(hist['province']==province) & (hist['model']==model) & (hist['regMonth']==regMonth-6)]['salesVolume'].values[0]
        prev_mt6_sales.append(prev_sale6)
        prev_sale7 = hist[(hist['province']==province) & (hist['model']==model) & (hist['regMonth']==regMonth-7)]['salesVolume'].values[0]
        prev_mt7_sales.append(prev_sale7)
        prev_sale8 = hist[(hist['province']==province) & (hist['model']==model) & (hist['regMonth']==regMonth-8)]['salesVolume'].values[0]
        prev_mt8_sales.append(prev_sale8)
        prev_sale9 = hist[(hist['province']==province) & (hist['model']==model) & (hist['regMonth']==regMonth-9)]['salesVolume'].values[0]
        prev_mt9_sales.append(prev_sale9)
        prev_sale10 = hist[(hist['province']==province) & (hist['model']==model) & (hist['regMonth']==regMonth-10)]['salesVolume'].values[0]
        prev_mt10_sales.append(prev_sale10)
        prev_sale11 = hist[(hist['province']==province) & (hist['model']==model) & (hist['regMonth']==regMonth-11)]['salesVolume'].values[0]
        prev_mt11_sales.append(prev_sale11)

多模型建模策略

模型使用规则 + 机器学习lgb + 深度学习LSTM
LightGBMs，LSTMs ->多个模型、每个模型采用不同的特征子集特征集合；
构建了多个机器学习模型和深度学习模型，最后分月线性加权融合得到最终结果。
–其中，为了避免误差传递：要预测1月的销量，则滑窗时用下一个月的销量对其进行打标；预测二月时，打标的时候要再间隔1个月；
预测三月时，打标的时候要再间隔2个月；预测四月时，打标的时候要再间隔3个月。
这种建模方式就有效避免了误差传递。
–其中，搭建的神经网络模型，是利用前n个月的销量预测当月销量，如：用1-9月预测10月，往后平移一个月，用2-10月预测11月的销量。评价指标使用的RMSE，预测结果对模型结构以及网络权重极其敏感，并且非常容易产生过拟合，我们除了使用Dropout和早停外，还尝试了SWA这种模型优化方法。SWA是一种基于权重空间进行加权的思想。

import os
import random
import warnings
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
from keras import backend as K
from keras.utils import plot_model
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.callbacks import EarlyStopping, ModelCheckpoint
from sklearn.preprocessing import MinMaxScaler
from keras.layers import LSTM, RNN, GRU, SimpleRNN
from sklearn.model_selection import train_test_split
from keras.initializers import glorot_uniform

seed = 2019
random.seed(seed)
tf.set_random_seed(seed)
np.random.seed(seed)
warnings.filterwarnings('ignore')

模型分月融合

不同模型对不同月份预测效果不一样，LSTM对3、4月预测更为精准，融合收益特别高，LGB2+LSTMs能到更高接近0.7。最终模型融合采用的分月融合策略

def model_fusion_function(value_1, value_2, value_3, value_4, month):
    if month == 1:
        return value_1 * w11 + value_2 * w21 + value_3 * w31
    elif month == 2:
        return value_1 * w12 + value_2 * w22 + value_3 * w32
    elif month == 3:
        return value_1 * w13 + value_2 * w23 + value_3 * w33 + w43 * value_4
    elif month == 4:
        return value_1 * w14 + value_2 * w24 + value_3 * w34 + w44 * value_4