【BI学习作业10-时间序列实战】


1.编程题

1.1交通流量预测

题目名称:JetRail高铁的乘客数量预测
数据集:jetrail.csv,根据过往两年的数据(2012 年 8 月至 2014 年 9月),需要用这些数据预测接下来 7个月的乘客数量,以每天为单位聚合数据集

import pandas as pd
import matplotlib.pyplot as plt
 
# Subsetting the dataset
# Index 11856 marks the end of year 2013
df = pd.read_csv('train.csv', nrows=11856)
 
# Creating train and test set
# Index 10392 marks the end of October 2013
train = df[0:10392]
test = df[10392:]
 
# Aggregating the dataset at daily level
df['Timestamp'] = pd.to_datetime(df['Datetime'], format='%d-%m-%Y %H:%M') # 4位年用Y,2位年用y
df.index = df['Timestamp']
df = df.resample('D').mean() #按天采样,计算均值
 
train['Timestamp'] = pd.to_datetime(train['Datetime'], format='%d-%m-%Y %H:%M')
train.index = train['Timestamp']
train = train.resample('D').mean() #
 
test['Timestamp'] = pd.to_datetime(test['Datetime'], format='%d-%m-%Y %H:%M')
test.index = test['Timestamp']
test = test.resample('D').mean()
 
#Plotting data
train.Count.plot(figsize=(15,8), title= 'Daily Ridership', fontsize=14)
test.Count.plot(figsize=(15,8), title= 'Daily Ridership', fontsize=14)
plt.show()

在这里插入图片描述

1.2资金流入流出预测

数据集地址:https://tianchi.aliyun.com/competition/entrance/231573/information

  • 数据集一共包括4张表:用户基本信息数据、用户申购赎回数据、收益率表和银行间拆借利率表
  • 2.8万用户,284万行为数据,294天拆解利率,427天收益率
  • 2013-07-01到2014-08-31,预测2014年9月的申购和赎回
1.2.1数据集信息

用户信息表,user_profile_table
总共随机抽取了约3万用户,主要包含了用户的性别、城市和星座,其中部分用户在 2014 年 9 月份第一次出现,这些用户只在测试数据中

列名类型含义示例
user_idbigint用户 ID1234
Sexbigint用户性别{1:男,0:女)}0
Citybigint所在城市6081949
constellationstring星座射手座

在这里插入图片描述
用户申购赎回数据表 user_balance_table

  • 数据包括了 20130701 至 20140831 申购和赎回信息,字段包括用户操作时间和操作记录,其中操作记录包括申购和赎回两个部分
  • 金额的单位是分,即0.01元
  • 如果用户今日消费总量为0,即consume_amt=0,同时四个category字段为空
  • 数据经过了脱敏,同时保证了:
    今日余额 = 昨日余额 + 今日申购 - 今日赎回,不会出现负值
列名类型含义示例
user_idbigint用户 ID1234
report_datestring日期20140407
tBalancebigint今日余额109004
total_purchase_amtbigint今日总购买量=直接购买+收益21876
direct_purchase_amtbigint今日直接购买量21863
purchase_bal_amtbigint今日支付宝余额购买量0
purchase_bank_amtbigint今日银行卡购买量21863
total_redeem_amtbigint今日总赎回量=消费+转出10261
consume_amtbigint今日消费总量0
transfer_amtbigint今日转出总量10261
tftobal_amtbigint今日转出到支付宝余额总量0
tftocard_amtbigint今日转出到银行卡总量10261
share_amtbigint今日收益13
category1bigint今日类目1消费总额0
category2bigint今日类目2消费总额0
category3bigint今日类目3消费总额0
category4bigint今日类目4消费总额0

收益率表 mfd_day_share_interest

  • 收益表为余额宝在 14 个月内的收益率表
列名类型含义示例
mfd_datestring日期20140102
mfd_daily_yielddouble万份收益,即1万块钱的收益1.5787
mfd_7daily_yielddouble七日年化收益率(%)6.307

收益计算方式

  • 主要基于实际余额宝收益计算方法,进行了简化

1)收益计算的时间不再是会计日,而是自然日,以0点为分隔(0点之前算昨天,0点之后算今天)

2)收益的显示时间,即实际将第一份收益打入用户账户的时间,以周一转入周三显示为例,如果用户在周一存入10000元,即1000000分,那么这笔金额是周一确认,周二是开始产生收益,在周三将周二产生的收益打入到用户的账户中,此时用户的账户中显示的是1000110分

转入时间首次显示收益时间
周一周三
周二周四
周三周五
周四周六
周五下周二
周六下周三
周天下周三

提交结果表 tc_comp_predict_table

字段类型含义示例
report_datebigint日期20140901
purchasebigint申购总额40000000
redeembigint赎回总额30000000

每一行数据是一天对申购、赎回总额的预测值,输出2014年9月每天的预测,共30行。 purchase 和 redeem 都是金额数据,精确到分

输出示意:

201409014000000030000000
201409024000000030000000
201409034000000030000000
1.2.2数据探索

导入相关模块

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from fbprophet import Prophet
from sklearn.metrics import mean_squared_error, mean_absolute_error
plt.style.use('fivethirtyeight') # For plots

ubt = pd.read_csv(PATH+'user_balance_table.csv',parse_dates=(['report_date']))

查看数据集

ubt.head()

在这里插入图片描述

plt.rcParams['figure.figsize'] = (25, 4.0)  # set figure size

ubt[['total_purchase_amt', 'total_redeem_amt']].plot()
plt.grid(True, linestyle="-", color="green", linewidth="0.5")
plt.legend()
plt.title('purchase and redeem of every month')

plt.gca().spines["top"].set_alpha(0.0)
plt.gca().spines["bottom"].set_alpha(0.3)
plt.gca().spines["right"].set_alpha(0.0)
plt.gca().spines["left"].set_alpha(0.3)

plt.show()

在这里插入图片描述
完整代码

import math
import numpy
import pandas
from keras.layers import LSTM, RNN, GRU, SimpleRNN
from keras.layers import Dense, Dropout
from keras.callbacks import EarlyStopping
import matplotlib.pyplot as plt
from keras.models import Sequential
from sklearn.preprocessing import MinMaxScaler
import os

numpy.random.seed(2019)


class RNNModel(object):
    def __init__(self, look_back=1, epochs_purchase=20, epochs_redeem=40, batch_size=1, verbose=2, patience=10, store_result=False):
        self.look_back = look_back
        self.epochs_purchase = epochs_purchase
        self.epochs_redeem = epochs_redeem
        self.batch_size = batch_size
        self.verbose = verbose
        self.store_result = store_result
        self.patience = patience
        self.purchase = df_tmp.values[:, 0:1]
        self.redeem = df_tmp.values[:, 1:2]
        
    def access_data(self, data_frame):
        # load the data set
        data_set = data_frame
        data_set = data_set.astype('float32')

        # LSTMs are sensitive to the scale of the input data, specifically when the sigmoid (default) or tanh activation functions are used. It can be a good practice to rescale the data to the range of 0-to-1, also called normalizing.
        scaler = MinMaxScaler(feature_range=(0, 1))
        data_set = scaler.fit_transform(data_set)

        # reshape into X=t and Y=t+1
        train_x, train_y, test = self.create_data_set(data_set)

        # reshape input to be [samples, time steps, features]
        train_x = numpy.reshape(train_x, (train_x.shape[0], 1, train_x.shape[1]))
        return train_x, train_y, test, scaler

    # convert an array of values into a data set matrix
    def create_data_set(self, data_set):
        data_x, data_y = [], []
        for i in range(len(data_set)-self.look_back - 30):
            a = data_set[i:(i + self.look_back), 0]
            data_x.append(a)
            data_y.append(list(data_set[i + self.look_back: i + self.look_back + 30, 0]))
        # print(numpy.array(data_y).shape)
        return numpy.array(data_x), numpy.array(data_y), data_set[-self.look_back:, 0].reshape(1, 1, self.look_back)

    def rnn_model(self, train_x, train_y, epochs):
        model = Sequential()
        model.add(LSTM(64, input_shape=(1, self.look_back), return_sequences=True))
        model.add(LSTM(32, return_sequences=False))
        model.add(Dense(32))
        model.add(Dense(30))
        model.compile(loss='mean_squared_error', optimizer='adam')
        model.summary()
        early_stopping = EarlyStopping('loss', patience=self.patience)
        history = model.fit(train_x, train_y, epochs=epochs, batch_size=self.batch_size, verbose=self.verbose, callbacks=[early_stopping])
        return model

    def predict(self, model, data):
        prediction = model.predict(data)
        return prediction

    def plot_show(self, predict):
        predict = predict[['purchase', 'redeem']]
        predict.plot()
        plt.show()

    def run(self):
        purchase_train_x, purchase_train_y, purchase_test, purchase_scaler = self.access_data(self.purchase)
        redeem_train_x, redeem_train_y, redeem_test, redeem_scaler = self.access_data(self.redeem)

        purchase_model = self.rnn_model(purchase_train_x, purchase_train_y, self.epochs_purchase)
        redeem_model = self.rnn_model(redeem_train_x, redeem_train_y, self.epochs_redeem)

        purchase_predict = self.predict(purchase_model, purchase_test)
        redeem_predict = self.predict(redeem_model, redeem_test)

        test_user = pandas.DataFrame({'report_date': [20140900 + i for i in range(1, 31)]})

        purchase = purchase_scaler.inverse_transform(purchase_predict).reshape(30, 1)
        redeem = redeem_scaler.inverse_transform(redeem_predict).reshape(30, 1)

        test_user['purchase'] = purchase
        test_user['redeem'] = redeem
        print(test_user)

        """Store submit file"""
        if self.store_result is True:
            test_user.to_csv('submit_lstm.csv', encoding='utf-8', index=None, header=None)
            
        """plot result picture"""
        self.plot_show(test_user)
        
if __name__ == '__main__':
    initiation = RNNModel(look_back=40, epochs_purchase=150, epochs_redeem=230, batch_size=16, verbose=2, patience=50, store_result=True)
    initiation.run()

提交结果得到的分数:
在这里插入图片描述

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

水花

您的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值