资金流入流出预测—数据探索与分析

数据探索与分析

背景介绍

  • 蚂蚁金服拥有上亿会员,每天都涉及大量的资金流入和流出,资金管理压力会非常大
  • 在既保证资金流动性风险最小,又满足日常业务运转的情况下,精确地预测资金的流入流出情况变得尤为重要

赛题分析

赛题链接资金流入流出预测
赛题数据主要包含四个部分,分别为用户基本信息数据、用户申购赎回数据、收益率表和银行间拆借利率表

  • 用户信息表:user_profile_table

    总共随机抽取了约 3 万用户,其中部分用户在 2014 年 9 月份第一次出现,这部分用户用作测试数据。训练集是约 2.8 万 个用户的基本数据,主要包含了用户的性别、城市和星座。具体的字段如下表:

列名类型含义示例
user_idbigint用户 ID1234
Sexbigint用户性别( 1 :男, 0 :女 )0
Citybigint所在城市6081949
constellationstring星座射手座
  • 用户申购赎回数据表:user_balance_table

    储存了 20130701 至 20140831 的申购和赎回信息、以及所有的子类目信息。数据主要包括用户操作时间和操作记录,其中操作记录包括申购和赎回两个部分。金额的单位是分,即 0.01 元人民币。 如果用户今日消费总量为0,即consume_amt=0,则四个字类目为空。

列名类型含义示例
user_idbigint用户 ID1234
report_datestring日期20140407
tBalancebigint今日余额109004
yBalancebigint昨日余额97389
total_purchase_amtbigint今日总购买量 = 直接购买 + 收益2187
direct_purchase_amtbigint今日直接购买量21863
purchase_bal_amtbigint今日支付宝余额购买量0
purchase_bank_amtbigint今日银行卡购买量21863
total_redeem_amtbigint今日总赎回量 = 消费 + 转出10261
consume_amtbigint今日消费总量0
transfer_amtbigint今日转出总量10261
tftobal_amtbigint今日转出到支付宝余额总量0
tftocard_amtbigint今日转出到银行卡总量10261
share_amtbigint今日收益13
category 1-4bigint今日类目1-4 消费总额0
  • 收益率表: mfd_day_share_interest
列名类型含义示例
mfd_datestring日期20140102
mfd_daily_yielddouble万份收益,即 1 万块钱的收益1.5787
mfd_7daily_yielddouble七日年化收益率( % )6.307
  • 上海银行间同业拆放利率(Shibor)表:mfd_bank_shibor
列名类型含义示例
mfd_dateString日期20140102
Interest_O_NDouble隔夜利率(%)2.8
Interest_1_WDouble1周利率(%)4.25
Interest_2_WDouble2周利率(%)4.9
Interest_1_MDouble1个月利率(%)5.04
Interest_3_MDouble3个月利率(%)4.91
Interest_6_MDouble6个月利率(%)4.79
Interest_9_MDouble9个月利率(%)4.76
Interest_1_YDouble1年利率(%)4.78

收益计算方式

首先,资金的转入、转出是以0点为分隔,0 点之后转入或转出的算作当天的,0 点之前转入或转出的算作前一天的。收益的计算如下表所示:

列名类型
转入时间首次显示收益时间
周一周三
周二周四
周三周五
周四周六
周五下周二
周六下周三
周天下周三

需要提交的结果表: tc_comp_predict_table

对 2014 年 9 月 的 purchase 和 redeem 进行预测,每行数据是一天对申购、赎回总额的预测值。 Purchase 和 redeem 都是金额数据,精确到分,而不是精确到元。

列名类型含义示例
report_datebigint日期20140901
purchasebigint申购总额40000000
redeembigint赎回总额30000000

评估指标

选用积分式的计算方法:每天的误差选用相对误差来计算,然后根据预测的申购和赎回的相对误差,通过得分函数映射得到一个每天预测结果的得分,将 30 天内的得分汇总,然后结合实际业务的倾向,对申购赎回总量预测的得分情况进行加权求和,得到最终评分。具体的操作如下:

  1. 计算所有用户在测试集上每天的申购及赎回总额与实际情况总额的相对误差:
    每日申购相对误差(真实值 z i z_i zi,预测值 z ^ i \hat{z}_i z^i
    P u r c h a s e i = ∣ z i − z ^ i ∣ z i , Purchase_i = \frac{|z_i-\hat{z}_i|}{z_i}, Purchasei=ziziz^i,
    每日赎回相对误差(真实值 y i y_i yi,预测值 y ^ i \hat{y}_i y^i
    R e d e e m i = ∣ y i − y ^ i ∣ y i . Redeem_i = \frac{|y_i-\hat{y}_i|}{y_i}. Redeemi=yiyiy^i.
  2. 申购预测得分与 $Purchase_i $ 相关,赎回预测得分与 R e d e e m i Redeem_i Redeemi 相关 , 误差与得分之间的计算公式不公布,但保证该计算公式为单调递减的,即误差越小,得分越高,误差与大,得分越低。当第 i i i 天的申购误差 P u r c h a s e i = 0 Purchase_i = 0 Purchasei=0,这一天的得分为 10 10 10 分;当 P u r c h a s e i > 0.3 Purchase_i >0.3 Purchasei>0.3,其得分为 0 0 0.
  3. 最后公布总积分 = 申购预测得分 ∗ 45 % *45\% 45% + 赎回预测得分 ∗ 55 % . *55\%. 55%.

其他常用的评价指标还有 MSE(均方误差)、MAE(平均绝对误差)、MAPE(平均绝对相对误差)等。利用scikit-learn求常用误差的代码如下:

from sklearn.metrics import mean_squared_error # 均方误差
from sklearn.metrics import mean_absolute_error # 平均绝对误差
import numpy as np
MSE = mean_squared_error(y_test,y_predict)
MAE = mean_absolute_error(y_test,y_predict)
MAPE = np.mean((y_test-y_predict)/y_test) #平均绝对相对误差

数据探索

相关知识

关于Seaborn绘图:
Seaborn常见绘图总结
Python数据可视化—seaborn简介和实例
关于核密度函数:
核密度估计Kernel Density Estimation(KDE)
非参数估计:核密度估计KDE

导入库函数并读取数据

import pandas as  pd
import numpy as np
import warnings 
import datetime
import seaborn as sns
import matplotlib.pyplot as plt
import datetime 
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
dataset_path = 'Dataset/'
data_balance = pd.read_csv(dataset_path+"user_balance_table.csv")
bank = pd.read_csv(dataset_path+"mfd_bank_shibor.csv")
share = pd.read_csv(dataset_path+"mfd_day_share_interest.csv")
users = pd.read_csv(dataset_path+"user_profile_table.csv")

分析用户的申购赎回数据

统计申购总量和赎回总量
# 为数据集添加时间戳
data_balance['date'] = pd.to_datetime(data_balance['report_date'], format= "%Y%m%d") 
# 把report_date转换成时间戳
data_balance['day'] = data_balance['date'].dt.day # 日
data_balance['month'] = data_balance['date'].dt.month # 月
data_balance['year'] = data_balance['date'].dt.year # 年
data_balance['week'] = data_balance['date'].dt.week # 一年中的第几周
data_balance['weekday'] = data_balance['date'].dt.weekday # 一周里的第几天

# 根据date进行分组然后对total_purchase_amt和total_redeem_amt求和
total_balance = data_balance.groupby(['date'])['total_purchase_amt','total_redeem_amt'].sum().reset_index() 

得到的 total_balance 即为每天的申购总额及赎回总额:
在这里插入图片描述

利用时序图观察数据特点
# 画出每日总购买与赎回量的时间序列图
import matplotlib.pylab as plt
fig = plt.figure(figsize=(20,6))
plt.plot(total_balance['date'], total_balance['total_purchase_amt'],label='purchase')
plt.plot(total_balance['date'], total_balance['total_redeem_amt'],label='redeem')

plt.legend(loc='best')
plt.title("The lineplot of total amount of Purchase and Redeem from July.13 to Sep.14")
plt.xlabel("Time")
plt.ylabel("Amount")
plt.show()

在这里插入图片描述
从上图可以看出,申购总额与赎回总额的变化大体上呈一致的趋势,2013年7月至2013年9月申购总额与赎回总额几乎没有变化,从2013年10月起开始出现波动,从2013年11月到2014年1月申购总额与赎回总额逐渐增长,在2014年2月至2014年4月有较大波动,自2014年4月后,申购总额与赎回总额的变化趋势开始趋于稳定,并开始呈现较为规律的变化。

接下来对2014年4月份以后交易数据进行分析。

画出2014年4月份以后的时间序列图:

# 画出2014年4月份以后的时间序列图
total_balance_1 = total_balance[total_balance['date'] >= pd.to_datetime('2014-04-01')]
fig = plt.figure(figsize=(20,6))
plt.plot(total_balance_1['date'], total_balance_1['total_purchase_amt'])
plt.plot(total_balance_1['date'], total_balance_1['total_redeem_amt'])
plt.legend(['total_purchase_amt','total_redeem_amt'])
plt.title("The lineplot of total amount of Purchase and Redeem from April.14 to Sep.14")
plt.xlabel("Time")
plt.ylabel("Amount")
plt.show()

在这里插入图片描述
分别画出2014年每个月的申购赎回总额的时间序列图:

# 分别画出14年每个月的购买赎回量的时间序列图
fig = plt.figure(figsize=(15,15))

plt.subplot(4,1,1)
plt.title("The time series of total amount of Purchase and Redeem for August, July, June, May respectively")

# 14年8月
total_balance_2 = total_balance[total_balance['date'] >= pd.to_datetime('2014-08-01')] 
plt.plot(total_balance_2['date'], total_balance_2['total_purchase_amt'])
plt.plot(total_balance_2['date'], total_balance_2['total_redeem_amt'])
plt.legend(['total_purchase_amt','total_redeem_amt'])

# 14年7月
total_balance_3 = total_balance[(total_balance['date'] >= pd.to_datetime('2014-07-01')) & (total_balance['date'] < pd.to_datetime('2014-08-01'))]
plt.subplot(4,1,2)
plt.plot(total_balance_3['date'], total_balance_3['total_purchase_amt'])
plt.plot(total_balance_3['date'], total_balance_3['total_redeem_amt'])
plt.legend(['total_purchase_amt','total_redeem_amt'])

# 14年6月
total_balance_4 = total_balance[(total_balance['date'] >= pd.to_datetime('2014-06-01')) & (total_balance['date'] < pd.to_datetime('2014-07-01'))]
plt.subplot(4,1,3)
plt.plot(total_balance_4['date'], total_balance_4['total_purchase_amt'])
plt.plot(total_balance_4['date'], total_balance_4['total_redeem_amt'])
plt.legend(['total_purchase_amt','total_redeem_amt'])

# 14年5月
total_balance_5 = total_balance[(total_balance['date'] >= pd.to_datetime('2014-05-01')) & (total_balance['date'] < pd.to_datetime('2014-06-01'))]
plt.subplot(4,1,4)
plt.plot(total_balance_5['date'], total_balance_5['total_purchase_amt'])
plt.plot(total_balance_5['date'], total_balance_5['total_redeem_amt'])
plt.legend(['total_purchase_amt','total_redeem_amt'])

plt.xlabel("Time")
plt.ylabel("Amount")
plt.show()

在这里插入图片描述
在这里插入图片描述
对比2013年8月与2013年9月每日购买赎回量的时序图:

# 分别画出13年8月与9月每日购买赎回量的时序图
fig = plt.figure(figsize=(15,9))

# 13年8月
total_balance_last8 = total_balance[(total_balance['date'] >= pd.to_datetime('2013-08-01')) & (total_balance['date'] < pd.to_datetime('2013-09-01'))]
plt.subplot(2,1,1)
plt.plot(total_balance_last8['date'], total_balance_last8['total_purchase_amt'])
plt.plot(total_balance_last8['date'], total_balance_last8['total_redeem_amt'])
plt.legend(['total_purchase_amt','total_redeem_amt'])

# 13年9月
total_balance_last9 = total_balance[(total_balance['date'] >= pd.to_datetime('2013-09-01')) & (total_balance['date'] < pd.to_datetime('2013-10-01'))]
plt.subplot(2,1,2)
plt.plot(total_balance_last9['date'], total_balance_last9['total_purchase_amt'])
plt.plot(total_balance_last9['date'], total_balance_last9['total_redeem_amt'])
plt.legend(['total_purchase_amt','total_redeem_amt'])

plt.xlabel("Time")
plt.ylabel("Amount")
plt.show()

在这里插入图片描述

  • 可以看出每个月的申购总额与赎回总额的波动趋势呈现一定的相似性,由此可以推测申购与赎回行为可能存在一定关联。
  • 申购总额与赎回总额的变化大体以周为周期,每月有4个波峰、4个波谷。
  • 月末时赎回总额常常高于申购总额,可能是因为月末时更需要钱。
分析周一到周日申购总额和赎回总额的差异

画出周一到周日申购总额和赎回总额的提琴图与分布图:

# 画出周一到周日申购总额和赎回总额的提琴图与分布图
a = plt.figure(figsize=(10,10))
scatter_para = {'marker':'.', 's':3, 'alpha':0.3}
line_kws = {'color':'k'}
plt.subplot(2,2,1)
plt.title('The distrubution of total purchase')
# 2014年4月以后的申购总额在周内的分布情况
sns.violinplot(x='weekday', y='total_purchase_amt', data = total_balance_1, scatter_kws=scatter_para, line_kws=line_kws)
plt.subplot(2,2,2)
plt.title('The distrubution of total purchase')
# 2014年4月以后的申购总额的分布直方图
sns.distplot(total_balance_1['total_purchase_amt'].dropna())
plt.subplot(2,2,3)
plt.title('The distrubution of total redeem')
# 2014年4月以后的赎回总额在周内的分布情况
sns.violinplot(x='weekday', y='total_redeem_amt', data = total_balance_1, scatter_kws=scatter_para, line_kws=line_kws)
plt.subplot(2,2,4)
plt.title('The distrubution of total redeem')
# 2014年4月以后的赎回总额的分布直方图
sns.distplot(total_balance_1['total_redeem_amt'].dropna())

在这里插入图片描述
画出周一到周日申购总额和赎回总额均值的直方图:

# 按周内每一天对数据聚合后取均值
week_sta = total_balance_1[['total_purchase_amt', 'total_redeem_amt', 'weekday']].groupby('weekday', as_index=False).mean()
# 画出周内每一天的均值
plt.figure(figsize=(12, 5))
ax = plt.subplot(1,2,1)
plt.title('The barplot of average total purchase with each weekday')
ax = sns.barplot(x="weekday", y="total_purchase_amt", data=week_sta, label='Purchase')
ax.legend()
ax = plt.subplot(1,2,2)
plt.title('The barplot of average total redeem with each weekday')
ax = sns.barplot(x="weekday", y="total_redeem_amt", data=week_sta, label='Redeem')
ax.legend()

在这里插入图片描述
画出周一到周日申购总额和赎回总额的箱线图:

# 画出周一到周日申购总额和赎回总额的箱线图
plt.figure(figsize=(12, 5))
ax = plt.subplot(1,2,1)
plt.title('The boxplot of total purchase with each weekday')
ax = sns.boxplot(x="weekday", y="total_purchase_amt", data=total_balance_1)
ax = plt.subplot(1,2,2)
plt.title('The boxplot of total redeem with each weekday')
ax = sns.boxplot(x="weekday", y="total_redeem_amt", data=total_balance_1)

在这里插入图片描述
分别画出申购总额和赎回总额与周一到周日的相关关系图:

# 使用OneHot方法将周一至周日转换成OneHot编码
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder()
total_balance = total_balance.reset_index() 
# OneHot编码,把表示星期一到星期天的标签转换为一个7*1的单位向量,1在向量中的位置表示该日是一周中的第几天
week_feature = encoder.fit_transform(np.array(total_balance['weekday']).reshape(-1, 1)).toarray()
week_feature = pd.DataFrame(week_feature,columns=['weekday_onehot']*len(week_feature[0])) 
feature = pd.concat([total_balance, week_feature], axis = 1)[['total_purchase_amt', 'total_redeem_amt','weekday_onehot','date']] # 横向连接
feature.columns = list(feature.columns[0:2]) + [x+str(i) for i,x in enumerate(feature.columns[2:-1])] + ['date'] # 重新设置columns

# 画出申购总额和赎回总额与周一到周日的相关关系图
f, ax = plt.subplots(figsize = (15, 8))
plt.subplot(1,2,1)
plt.title('The spearman coleration between total purchase and each weekday')
# 画出total_purchase_amt与weekday_onehot的相关图
sns.heatmap(feature[[x for x in feature.columns if x not in ['total_redeem_amt', 'date'] ]].corr('spearman'),linewidths = 0.1,annot=True)
plt.subplot(1,2,2)
plt.title('The spearman coleration between total redeem and each weekday')
# 画出total_redeem_amt与weekday_onehot的相关图
sns.heatmap(feature[[x for x in feature.columns if x not in ['total_purchase_amt', 'date'] ]].corr('spearman'),linewidths = 0.1,annot=True)

在这里插入图片描述

  • 可以看出,申购总额与赎回总额在周内的分布都比较集中,存在少量的极端值,并且存在以周为周期的变化趋势。周一到周四的申购总额与赎回总额基本上接近,到了周五申购总额与赎回总额有所下降,周六与周日的申购总额与赎回总额最低。
  • 通过相关关系图,可以看出申购总额与赎回总额与周一到周四有极弱的正相关关系,与周五几乎不相关,与周六周日有较弱的负相关关系。
分析每月申购总额与赎回总额的分布特点

画出每个月申购与赎回总额的核密度函数:

# 画出每个月申购总额的分布估计图(kdeplot) 
plt.figure(figsize=(15,10))
plt.title('The Probability Density of total purchase amount in Each Month')
plt.ylabel('Probability')
plt.xlabel('Amount')
for i in range(7, 12):
    s1 = '2013-'+str(i)+'-1'
    s2 = '2013-'+str(i+1)+'-1'
    # 画出13年7月至13年12月的每个月申购总额的核密度函数
    sns.kdeplot(total_balance[(total_balance['date'] >= pd.to_datetime(s1)) & (total_balance['date'] < pd.to_datetime(s2))]['total_purchase_amt'],label='13Y,'+str(i)+'M')
for i in range(1, 9):
    s1 = '2014-'+str(i)+'-1'
    s2 = '2014-'+str(i+1)+'-1'
    # 画出14年1月至14年9月的每个月申购总额的核密度函数
    sns.kdeplot(total_balance[(total_balance['date'] >= pd.to_datetime(s1)) & (total_balance['date'] < pd.to_datetime(s2))]['total_purchase_amt'],label='14Y,'+str(i)+'M')

在这里插入图片描述

# 画出每个月赎回总额的分布估计图(kdeplot)
plt.figure(figsize=(15,10))
plt.title('The Probability Density of total redeem amount in Each Month')
plt.ylabel('Probability')
plt.xlabel('Amount')
for i in range(7, 12):
    s1 = '2013-'+str(i)+'-1'
    s2 = '2013-'+str(i+1)+'-1'
    # 画出13年7月至13年12月的每个月赎回总额的核密度函数
    sns.kdeplot(total_balance[(total_balance['date'] >= pd.to_datetime(s1)) & (total_balance['date'] < pd.to_datetime(s2))]['total_redeem_amt'],label='13Y,'+str(i)+'M')
for i in range(1, 9):
    s1 = '2014-'+str(i)+'-1'
    s2 = '2014-'+str(i+1)+'-1'
    # 画出14年1月至14年9月的每个月申购总额的核密度函数
    sns.kdeplot(total_balance[(total_balance['date'] >= pd.to_datetime(s1)) & (total_balance['date'] < pd.to_datetime(s2))]['total_redeem_amt'],label='14Y,'+str(i)+'M')

在这里插入图片描述
画出2014年5月至8月申购与赎回总额的核密度函数(规律波动时期):

# 画出13年八月到九月份的分布估计图
total_balance_last_7 = total_balance[(total_balance['date'] >= pd.to_datetime('2013-07-01')) & (total_balance['date'] < pd.to_datetime('2013-08-01'))]
total_balance_last_8 = total_balance[(total_balance['date'] >= pd.to_datetime('2013-08-01')) & (total_balance['date'] < pd.to_datetime('2013-09-01'))]
total_balance_last_9 = total_balance[(total_balance['date'] >= pd.to_datetime('2013-09-01')) & (total_balance['date'] < pd.to_datetime('2013-10-01'))]
total_balance_last_10 = total_balance[(total_balance['date'] >= pd.to_datetime('2013-10-01')) & (total_balance['date'] < pd.to_datetime('2013-11-01'))]

plt.figure(figsize=(12,10))
ax = plt.subplot(2,1,1)
plt.title('The Probability Density of total purchase amount from Aug.13 to Sep.13')
plt.ylabel('Probability')
plt.xlabel('Amount')
ax = sns.kdeplot(total_balance_last_7['total_purchase_amt'],label='July')
ax = sns.kdeplot(total_balance_last_8['total_purchase_amt'],label='August')
ax = sns.kdeplot(total_balance_last_9['total_purchase_amt'],color='Red',label='September')
ax = sns.kdeplot(total_balance_last_10['total_redeem_amt'],color='Black',label='October')

ax = plt.subplot(2,1,2)
plt.title('The Probability Density of total redeem amount from Aug.13 to Sep.13')
plt.ylabel('Probability')
plt.xlabel('Amount')
ax = sns.kdeplot(total_balance_last_7['total_redeem_amt'],label='July')
ax = sns.kdeplot(total_balance_last_8['total_redeem_amt'],label='August')
ax = sns.kdeplot(total_balance_last_9['total_redeem_amt'],color='Red',label='September')
ax = sns.kdeplot(total_balance_last_10['total_redeem_amt'],color='Black',label='October')

在这里插入图片描述

画出2013年7月至10月申购与赎回总额的核密度函数(平缓增长时期):

# 画出13年7-10月份的分布估计图
total_balance_last_7 = total_balance[(total_balance['date'] >= pd.to_datetime('2013-07-01')) & (total_balance['date'] < pd.to_datetime('2013-08-01'))]
total_balance_last_8 = total_balance[(total_balance['date'] >= pd.to_datetime('2013-08-01')) & (total_balance['date'] < pd.to_datetime('2013-09-01'))]
total_balance_last_9 = total_balance[(total_balance['date'] >= pd.to_datetime('2013-09-01')) & (total_balance['date'] < pd.to_datetime('2013-10-01'))]
total_balance_last_10 = total_balance[(total_balance['date'] >= pd.to_datetime('2013-10-01')) & (total_balance['date'] < pd.to_datetime('2013-11-01'))]

plt.figure(figsize=(12,10))
ax = plt.subplot(2,1,1)
plt.title('The Probability Density of total purchase amount from Aug.13 to Sep.13')
plt.ylabel('Probability')
plt.xlabel('Amount')
ax = sns.kdeplot(total_balance_last_7['total_purchase_amt'],label='July')
ax = sns.kdeplot(total_balance_last_8['total_purchase_amt'],label='August')
ax = sns.kdeplot(total_balance_last_9['total_purchase_amt'],color='Red',label='September')
ax = sns.kdeplot(total_balance_last_10['total_redeem_amt'],color='Black',label='October')

ax = plt.subplot(2,1,2)
plt.title('The Probability Density of total redeem amount from Aug.13 to Sep.13')
plt.ylabel('Probability')
plt.xlabel('Amount')
ax = sns.kdeplot(total_balance_last_7['total_redeem_amt'],label='July')
ax = sns.kdeplot(total_balance_last_8['total_redeem_amt'],label='August')
ax = sns.kdeplot(total_balance_last_9['total_redeem_amt'],color='Red',label='September')
ax = sns.kdeplot(total_balance_last_10['total_redeem_amt'],color='Black',label='October')

在这里插入图片描述

  • 从图中可以看出,2014年申购总额的分布从5月至8月逐渐集中,赎回总额的分布在5、7、8三个月较为集中,在6月变化较大。
  • 从2013年7月至10月申购与赎回总额在逐渐增长,并且可以看出申购与赎回总额的大体趋势都是由集中变得分散。
分析日期对申购总额与赎回总额造成的影响
日期特征分析

画出2014年5月至8月的申购赎回总额条形图(规律波动时期):

# 画出2014年5月至8月的申购赎回总额条形图(规律波动时期)
plt.figure(figsize=(15,20))
day_sta = total_balance_5[['total_purchase_amt', 'total_redeem_amt', 'day']].groupby('day', as_index=False).mean()
plt.subplot(4,2,1)
plt.title("The total Purchase in May.14")
ax = sns.barplot(x="day", y="total_purchase_amt", data=day_sta, label='Purchase')
ax = sns.lineplot(x="day", y="total_purchase_amt", data=day_sta, label='Purchase')
plt.subplot(4,2,2)
plt.title("The total Redeem in May.14")
bx = sns.barplot(x="day", y="total_redeem_amt", data=day_sta, label='Redeem')
bx = sns.lineplot(x="day", y="total_redeem_amt", data=day_sta, label='Redeem')
bx.legend()

day_sta = total_balance_4[['total_purchase_amt', 'total_redeem_amt', 'day']].groupby('day', as_index=False).mean()
plt.subplot(4,2,3)
plt.title("The total Purchase in June.14")
ax = sns.barplot(x="day", y="total_purchase_amt", data=day_sta, label='Purchase')
ax = sns.lineplot(x="day", y="total_purchase_amt", data=day_sta, label='Purchase')
plt.subplot(4,2,4)
plt.title("The total Redeem in June.14")
bx = sns.barplot(x="day", y="total_redeem_amt", data=day_sta, label='Redeem')
bx = sns.lineplot(x="day", y="total_redeem_amt", data=day_sta, label='Redeem')
bx.legend()

day_sta = total_balance_3[['total_purchase_amt', 'total_redeem_amt', 'day']].groupby('day', as_index=False).mean()
plt.subplot(4,2,5)
plt.title("The total Purchase in July.14")
ax = sns.barplot(x="day", y="total_purchase_amt", data=day_sta, label='Purchase')
ax = sns.lineplot(x="day", y="total_purchase_amt", data=day_sta, label='Purchase')
plt.subplot(4,2,6)
plt.title("The total Redeem in July.14")
bx = sns.barplot(x="day", y="total_redeem_amt", data=day_sta, label='Redeem')
bx = sns.lineplot(x="day", y="total_redeem_amt", data=day_sta, label='Redeem')
bx.legend()

day_sta = total_balance_2[['total_purchase_amt', 'total_redeem_amt', 'day']].groupby('day', as_index=False).mean()
plt.subplot(4,2,7)
plt.title("The total Purchase in August.14")
ax = sns.barplot(x="day", y="total_purchase_amt", data=day_sta, label='Purchase')
ax = sns.lineplot(x="day", y="total_purchase_amt", data=day_sta, label='Purchase')
plt.subplot(4,2,8)
plt.title("The total Redeem in August.14")
bx = sns.barplot(x="day", y="total_redeem_amt", data=day_sta, label='Redeem')
bx = sns.lineplot(x="day", y="total_redeem_amt", data=day_sta, label='Redeem')
bx.legend()

在这里插入图片描述
在这里插入图片描述
画出2013年7月至10月的申购赎回总额条形图(平缓增长时期):

# 画出2013年7月至10月的申购赎回总额条形图(平缓增长时期) 
plt.figure(figsize=(15,20))
day_sta = total_balance_5[['total_purchase_amt', 'total_redeem_amt', 'day']].groupby('day', as_index=False).mean()
plt.subplot(4,2,1)
plt.title("The total Purchase in May.14")
ax = sns.barplot(x="day", y="total_purchase_amt", data=day_sta, label='Purchase')
ax = sns.lineplot(x="day", y="total_purchase_amt", data=day_sta, label='Purchase')
plt.subplot(4,2,2)
plt.title("The total Redeem in May.14")
bx = sns.barplot(x="day", y="total_redeem_amt", data=day_sta, label='Redeem')
bx = sns.lineplot(x="day", y="total_redeem_amt", data=day_sta, label='Redeem')
bx.legend()

day_sta = total_balance_4[['total_purchase_amt', 'total_redeem_amt', 'day']].groupby('day', as_index=False).mean()
plt.subplot(4,2,3)
plt.title("The total Purchase in June.14")
ax = sns.barplot(x="day", y="total_purchase_amt", data=day_sta, label='Purchase')
ax = sns.lineplot(x="day", y="total_purchase_amt", data=day_sta, label='Purchase')
plt.subplot(4,2,4)
plt.title("The total Redeem in June.14")
bx = sns.barplot(x="day", y="total_redeem_amt", data=day_sta, label='Redeem')
bx = sns.lineplot(x="day", y="total_redeem_amt", data=day_sta, label='Redeem')
bx.legend()

day_sta = total_balance_3[['total_purchase_amt', 'total_redeem_amt', 'day']].groupby('day', as_index=False).mean()
plt.subplot(4,2,5)
plt.title("The total Purchase in July.14")
ax = sns.barplot(x="day", y="total_purchase_amt", data=day_sta, label='Purchase')
ax = sns.lineplot(x="day", y="total_purchase_amt", data=day_sta, label='Purchase')
plt.subplot(4,2,6)
plt.title("The total Redeem in July.14")
bx = sns.barplot(x="day", y="total_redeem_amt", data=day_sta, label='Redeem')
bx = sns.lineplot(x="day", y="total_redeem_amt", data=day_sta, label='Redeem')
bx.legend()

day_sta = total_balance_2[['total_purchase_amt', 'total_redeem_amt', 'day']].groupby('day', as_index=False).mean()
plt.subplot(4,2,7)
plt.title("The total Purchase in August.14")
ax = sns.barplot(x="day", y="total_purchase_amt", data=day_sta, label='Purchase')
ax = sns.lineplot(x="day", y="total_purchase_amt", data=day_sta, label='Purchase')
plt.subplot(4,2,8)
plt.title("The total Redeem in August.14")
bx = sns.barplot(x="day", y="total_redeem_amt", data=day_sta, label='Redeem')
bx = sns.lineplot(x="day", y="total_redeem_amt", data=day_sta, label='Redeem')
bx.legend()

在这里插入图片描述
在这里插入图片描述

  • 从图中可以看出2014年5月到8月的申购总额与赎回总额的波动都较为规律,呈现一定的周期性,每周开始时用户倾向于申购余额宝,在周中期则倾向于赎回,周末的交易量普遍较小,这可能和余额宝的收益计算方式有关。
  • 2013年的9月10月都存在一些异常的数据,如9月19日(中秋节)与10月1日(国庆节),可能的原因是节假日造成的影响。
  • 此外还可以看出,赎回总额的波动普遍略大于申购总额的波动。

分别画出2014年4月份以后每天申购、赎回总额的热力图:

# 分别画出2014年4月份以后每天申购赎回总额的热力图

# total_balance_1:2014年4月份以后的数据
test = np.zeros((max(total_balance_1['week']) - min(total_balance_1['week']) + 1, 7)) # 周数 * 7 的矩阵
test[total_balance_1['week'] - min(total_balance_1['week']), total_balance_1['weekday']] = total_balance_1['total_purchase_amt']

f, ax = plt.subplots(figsize = (10, 4))
sns.heatmap(test,linewidths = 0.1, ax=ax)
ax.set_title("Purchase")
ax.set_xlabel('weekday')
ax.set_ylabel('week')

test = np.zeros((max(total_balance_1['week']) - min(total_balance_1['week']) + 1, 7))
test[total_balance_1['week'] - min(total_balance_1['week']), total_balance_1['weekday']] = total_balance_1['total_redeem_amt']

f, ax = plt.subplots(figsize = (10, 4))
sns.heatmap(test,linewidths = 0.1, ax=ax)
ax.set_title("Redeem")
ax.set_xlabel('weekday')
ax.set_ylabel('week')

在这里插入图片描述
从热力图的结果可以看出第4周的周日和第12周的周三附近的数据存在异常现象。分别查看异常数据点对应的日期:

# 对于热力图中异常点的数据分析.1
total_balance_1[(total_balance_1['week'] == 4 + min(total_balance_1['week'])) & (total_balance_1['weekday'] == 6)]

# 对于热力图中异常点的数据分析.2
total_balance_1[(total_balance_1['week'] == 12 + min(total_balance_1['week'])) & (total_balance_1['weekday'] == 2)]

在这里插入图片描述
在这里插入图片描述

  • 可以推测第4周的周日前后的数据异常是受到劳动节的影响。
  • 通过查询新闻可知,第12周的周三附近的数据异常可能是由于余额宝收益率不断下跌,并且在6月23日跌破4.5%造成的。
节假日分析

由上文的分析可以看出,节假日会造成用户行为的反常,因此有必要对节假日的数据进行讨论。

画出节假日期间的申购、赎回总额与平时均值的对比:

# 获取节假日的数据 
qingming = total_balance[(total_balance['date'] >= '2014-04-05') & (total_balance['date'] < '2014-04-08')]
labour = total_balance[(total_balance['date'] >= '2014-05-01') & (total_balance['date'] < '2014-05-04')]
duanwu = total_balance[(total_balance['date'] >= '2014-05-31') & (total_balance['date'] < '2014-06-03')]
data618 = total_balance[(total_balance['date'] >= '2014-06-10') & (total_balance['date'] < '2014-06-20')]
# 画出节假日与平时的均值 
fig = plt.figure()
index_list = ['QM','Labour','DW','618','Mean']
label_list = [np.mean(qingming['total_purchase_amt']), np.mean(labour['total_purchase_amt']),np.mean(duanwu['total_purchase_amt']),np.mean(data618['total_purchase_amt']),np.mean(total_balance_1['total_purchase_amt'])]
plt.bar(index_list, label_list, label="Purchase")

index_list = ['QM.','Labour.','DW.','618.','Mean.']
label_list = [np.mean(qingming['total_redeem_amt']), np.mean(labour['total_redeem_amt']),np.mean(duanwu['total_redeem_amt']),np.mean(data618['total_redeem_amt']),np.mean(total_balance_1['total_redeem_amt'])]
plt.bar(index_list, label_list, label="Redeem")
plt.title("The average of different holiday")
plt.ylabel("Amount")
plt.legend()
plt.show()

在这里插入图片描述
从图中可以看出清明节、劳动节、端午节交易量明显低于平时,618的交易量略高于平时。

对于节假日周边日期的分析

分别画出节假日与周边日期申购、赎回总额的时序图:

# 画出清明节与周边日期的时序图
qingming_around = total_balance[(total_balance['date'] >= '2014-04-01') & (total_balance['date'] < '2014-04-13')]
ax = sns.lineplot(x="date", y="total_purchase_amt", data=qingming_around, label='Purchase')
ax = sns.lineplot(x="date", y="total_redeem_amt", data=qingming_around, label='Redeem', ax=ax)
ax = sns.scatterplot(x="date", y="total_purchase_amt", data=qingming, ax=ax)
ax = sns.scatterplot(x="date", y="total_redeem_amt", data=qingming, ax=ax)
plt.title("The data around Qingming Holiday")
ax.legend()

在这里插入图片描述

# 画出劳动节与周边日期的时序图
labour_around = total_balance[(total_balance['date'] >= '2014-04-25') & (total_balance['date'] < '2014-05-10')]
ax = sns.lineplot(x="date", y="total_purchase_amt", data=labour_around, label='Purchase')
ax = sns.lineplot(x="date", y="total_redeem_amt", data=labour_around, label='Redeem', ax=ax)
ax = sns.scatterplot(x="date", y="total_purchase_amt", data=labour, ax=ax)
ax = sns.scatterplot(x="date", y="total_redeem_amt", data=labour, ax=ax)
plt.title("The data around Labour holiday")
ax.legend()

在这里插入图片描述

# 画出端午节与周边日期的时序图
duanwu_around = total_balance[(total_balance['date'] >= '2014-05-25') & (total_balance['date'] < '2014-06-07')]
ax = sns.lineplot(x="date", y="total_purchase_amt", data=duanwu_around, label='Purchase')
ax = sns.lineplot(x="date", y="total_redeem_amt", data=duanwu_around, label='Redeem', ax=ax)
ax = sns.scatterplot(x="date", y="total_purchase_amt", data=duanwu, ax=ax)
ax = sns.scatterplot(x="date", y="total_redeem_amt", data=duanwu, ax=ax)
plt.title("The data around Duanwu Holiday")
ax.legend()

在这里插入图片描述

# 画出中秋与周边日期的时序图

zhongqiu = total_balance[(total_balance['date'] >= '2013-09-19') & (total_balance['date'] < '2013-09-22')]
zhongqiu_around = total_balance[(total_balance['date'] >= '2013-09-14') & (total_balance['date'] < '2013-09-28')]
ax = sns.lineplot(x="date", y="total_purchase_amt", data=zhongqiu_around, label='Purchase')
ax = sns.lineplot(x="date", y="total_redeem_amt", data=zhongqiu_around, label='Redeem', ax=ax)
ax = sns.scatterplot(x="date", y="total_purchase_amt", data=zhongqiu, ax=ax)
ax = sns.scatterplot(x="date", y="total_redeem_amt", data=zhongqiu, ax=ax)
plt.title("The data around MiddleAutumn Holiday(in 2013)")
ax.legend()

在这里插入图片描述
可以看出在节假日周边日期存在节前交易量下降,节后交易量上升的趋势。

对大额交易与小额交易进行分析

画出用户申购情况的箱型图:

# 画出用户申购情况的箱型图
sns.boxplot(data_balance['total_purchase_amt'])
plt.title("The abnormal value of total purchase")

在这里插入图片描述
进一步对该用户的行为进行分析:

# 对于购买2e8的用户的交易行为分析 
id = data_balance[data_balance['total_purchase_amt'] > 2e8]['user_id'].values[0]
data_balance[data_balance['user_id'] == id].sort_values(by = 'total_purchase_amt',axis = 0,ascending = False).head()

在这里插入图片描述

# 画出单笔交易为2e8的那天的总交易量及附近几天的交易量
e2 = total_balance[(total_balance['date'] >= '2013-11-01') & (total_balance['date'] < '2013-11-10')]
ax = sns.barplot(x="day", y="total_purchase_amt", data=e2, label='2E')
ax = sns.lineplot(x="day", y="total_purchase_amt", data=e2, label='2E')
plt.title("The influence of the big deal with 200 million purchasing(Red Bar)")
ax.legend()

在这里插入图片描述
可以看出2亿的申购额占11月4日销售总量的2/3,体现了二八原则。

画出每天单笔最大交易额的时序图 :

# 画出每天单笔最大交易额的时序图 
plt.figure(figsize=(20, 6))
ax = sns.lineplot(x="date", y="total_purchase_amt", data=data_balance[['total_purchase_amt', 'date']].groupby('date', as_index=False).max(), label='MAX_PURCHASE')
ax = sns.lineplot(x="date", y="total_redeem_amt", data=data_balance[['total_redeem_amt', 'date']].groupby('date', as_index=False).max(), label='MAX_REDEEM')
plt.title("The Biggest deal happend in each day")

在这里插入图片描述
画出每日单笔最大交易以及总交易额的时序图:

# 画出每日单笔最大交易以及总交易额的时序图
plt.figure(figsize=(20, 6))
ax = sns.lineplot(x="date", y="total_purchase_amt", data=data_balance[['total_purchase_amt', 'date']].groupby('date', as_index=False).max(), label='MAX_PURCHASE')
ax = sns.lineplot(x="date", y="total_redeem_amt", data=data_balance[['total_redeem_amt', 'date']].groupby('date', as_index=False).max(), label='MAX_REDEEM')
ax = sns.lineplot(x="date", y="total_purchase_amt", data=data_balance[['total_purchase_amt', 'date']].groupby('date', as_index=False).sum(), label='TOTAL_PURCHASE')
ax = sns.lineplot(x="date", y="total_redeem_amt", data=data_balance[['total_redeem_amt', 'date']].groupby('date', as_index=False).sum(), label='TOTAL_REDEEM')

在这里插入图片描述
画出每个月大额交易的频次直方图:

# 画出每个月大额交易的频次直方图

plt.figure(figsize=(13, 6))
big_frequancy = data_balance[(data_balance['total_purchase_amt'] > 10000000) | (data_balance['total_redeem_amt'] > 10000000)][['month','year','user_id']].groupby(['year','month'], as_index=False).count()
big_frequancy['i'] = big_frequancy['year']  + big_frequancy['month'] / 100
ax = sns.barplot(x="i", y="user_id", data=big_frequancy) 
plt.title("The frequency of super big deal(larger than 100million) in each month")

在这里插入图片描述
可以看出2013年7月至2014年8月,大额交易的数目逐渐增长然后趋于稳定。

接下来将大额交易与小额交易的行为进行对比。

画出大额交易与小额交易的时序分布图:

# 对大额交易和小额交易按天进行求和
big_purchase = data_balance[data_balance['big_purchase'] == 1].groupby(['date'], as_index=False)['total_purchase_amt'].sum()
small_purchase = data_balance[data_balance['big_purchase'] == 0].groupby(['date'], as_index=False)['total_purchase_amt'].sum()
big_redeem = data_balance[data_balance['big_redeem'] == 1].groupby(['date'], as_index=False)['total_redeem_amt'].sum()
small_redeem = data_balance[data_balance['big_redeem'] == 0].groupby(['date'], as_index=False)['total_redeem_amt'].sum()

# 画出大额交易与小额交易的时序分布图
fig = plt.figure(figsize=(20,6))
plt.plot(big_purchase['date'], big_purchase['total_purchase_amt'],label='big_purchase')
plt.plot(big_redeem['date'], big_redeem['total_redeem_amt'],label='big_redeem')

plt.plot(small_purchase['date'], small_purchase['total_purchase_amt'],label='small_purchase')
plt.plot(small_redeem['date'], small_redeem['total_redeem_amt'],label='small_redeem')
plt.legend(loc='best')
plt.title("The time series of big deal of Purchase and Redeem from July.13 to Sep.14")
plt.xlabel("Time")
plt.ylabel("Amount")
plt.show()

在这里插入图片描述
画出2014年每个月大额交易与小额交易的核密度分布:

# 画出2014年每个月大额交易与小额交易的核密度分布
plt.figure(figsize=(12,10))

plt.subplot(2,2,1)
for i in range(4, 9):
    s1 = '2014-'+str(i)+'-1'
    s2 = '2014-'+str(i+1)+'-1'
    sns.kdeplot(big_purchase[(big_purchase['date'] >= s1) & (big_purchase['date'] < s2)]['total_purchase_amt'],label='14Y,'+str(i)+'M')
plt.title('BIG PURCHASE')

plt.subplot(2,2,2)
for i in range(4, 9):
    s1 = '2014-'+str(i)+'-1'
    s2 = '2014-'+str(i+1)+'-1'
    sns.kdeplot(small_purchase[(small_purchase['date'] >= s1) & (small_purchase['date'] < s2)]['total_purchase_amt'],label='14Y,'+str(i)+'M')
plt.title('SMALL PURCHASE')

plt.subplot(2,2,3)
for i in range(4, 9):
    s1 = '2014-'+str(i)+'-1'
    s2 = '2014-'+str(i+1)+'-1'
    sns.kdeplot(big_redeem[(big_redeem['date'] >= s1) & (big_redeem['date'] < s2)]['total_redeem_amt'],label='14Y,'+str(i)+'M')
plt.title('BIG REDEEM')

plt.subplot(2,2,4)
for i in range(4, 9):
    s1 = '2014-'+str(i)+'-1'
    s2 = '2014-'+str(i+1)+'-1'
    sns.kdeplot(small_redeem[(small_redeem['date'] >= s1) & (small_redeem['date'] < s2)]['total_redeem_amt'],label='14Y,'+str(i)+'M')
plt.title('SMALL REDEEM')

在这里插入图片描述
分析大额交易与小额交易在一周中的分布情况:

# 添加时间戳 
big_purchase['weekday'] = big_purchase['date'].dt.weekday
small_purchase['weekday'] = small_purchase['date'].dt.weekday
big_redeem['weekday'] = big_redeem['date'].dt.weekday
small_redeem['weekday'] = small_redeem['date'].dt.weekday

# 分析大额交易与小额交易在一周中的分布情况 
plt.figure(figsize=(12, 10))

ax = plt.subplot(2,2,1)
ax = sns.boxplot(x="weekday", y="total_purchase_amt", data=big_purchase[big_purchase['date'] >= '2014-04-01'])
plt.title('BIG PURCHASE')

ax = plt.subplot(2,2,2)
ax = sns.boxplot(x="weekday", y="total_redeem_amt", data=big_redeem[big_redeem['date'] >= '2014-04-01'])
plt.title('BIG REDEEM')

ax = plt.subplot(2,2,3)
ax = sns.boxplot(x="weekday", y="total_purchase_amt", data=small_purchase[small_purchase['date'] >= '2014-04-01'])
plt.title('SMALL PURCHASE')

ax = plt.subplot(2,2,4)
ax = sns.boxplot(x="weekday", y="total_redeem_amt", data=small_redeem[small_redeem['date'] >= '2014-04-01'])
plt.title('SMALL REDEEM')

在这里插入图片描述
从图中可以看出:

  • 小额交易的购买大于赎回。
  • 14年5月下旬后,大额交易的赎回大于购买,可能是由于该段时间余额宝利率不断下降造成的。
  • 大额交易更不愿在周五发生。
分析用户的交易频次

画出非0交易的分布图:

# 画出非0交易的分布图
frequency = data_balance[(data_balance['direct_purchase_amt'] != 0) | (data_balance['total_redeem_amt'] != 0)][['user_id','tBalance']].groupby('user_id', as_index=False).count()
sns.distplot(frequency['tBalance'])

在这里插入图片描述
可以看出大部分用户的交易频率都不是很频繁。

画出频繁用户与非频繁用户总申购赎回总额的时序图:

# 画出频繁用户与非频繁用户总申购赎回总额的时序图
fig = plt.figure(figsize=(20,8))
plt.plot(total_balance_hotNcold[total_balance_hotNcold['is_hot_users'] == 1]['date'], total_balance_hotNcold[total_balance_hotNcold['is_hot_users'] == 1]['total_purchase_amt'],label='hot_purchase')
plt.plot(total_balance_hotNcold[total_balance_hotNcold['is_hot_users'] == 1]['date'], total_balance_hotNcold[total_balance_hotNcold['is_hot_users'] == 1]['total_redeem_amt'],label='hot_redeem')

plt.plot(total_balance_hotNcold[total_balance_hotNcold['is_hot_users'] == 0]['date'], total_balance_hotNcold[total_balance_hotNcold['is_hot_users'] == 0]['total_purchase_amt'],label='cold_purchase')
plt.plot(total_balance_hotNcold[total_balance_hotNcold['is_hot_users'] == 0]['date'], total_balance_hotNcold[total_balance_hotNcold['is_hot_users'] == 0]['total_redeem_amt'],label='cold_redeem')
plt.legend(loc='best')
plt.title("The time series of big and small user of Purchase and Redeem")
plt.xlabel("Time")
plt.ylabel("Amount")
plt.show()

在这里插入图片描述
分别画出回流用户与老用户交易的箱型图:

# 分别画出回流用户与老用户交易的箱型图
temp = data_balance[['year','month','user_id','total_purchase_amt','total_redeem_amt']].groupby(['year','month','user_id'], as_index=False).sum()
# 统计每个月每个用户的交易总额
user_old_set = set() 
plt.figure(figsize=(10,30))
for i in range(1, 9):
    newset = set(temp[(temp['year'] == 2014) & (temp['month'] == i)]['user_id']) # 新用户集合
    this_month = data_balance[(data_balance['year'] == 2014) & (data_balance['month'] == i)] # 当月的交易情况
    this_month['cold'] = 0 
    this_month.loc[this_month['user_id'].isin(newset - user_old_set), 'cold'] = 1 
    # 上个月没进行过交易,当月进行过交易的用户的冷却期为1
    plt.subplot(4,2,i)
    plt.title('This month : ' + str(i))
    sns.boxplot(x="cold", y="total_purchase_amt" , data=this_month[(this_month['direct_purchase_amt'] != 0) | (this_month['total_redeem_amt'] != 0)])
    # 
    user_old_set = user_old_set | newset # 到下一个月,本月进行过交易的用户就变成了老用户
plt.show()

在这里插入图片描述
在这里插入图片描述
可以看出老用户的购买量普遍高于回流用户。

分析用户交易纪录表中其他变量
# 截断数据集
data_balance_1 = data_balance[data_balance['date'] > datetime.datetime(2014,4,1)]

# 画出用户交易纪录表中其他变量与标签的相关系图 
feature = ['total_purchase_amt','total_redeem_amt', 'report_date', 'tBalance', 'yBalance', 
       'direct_purchase_amt', 'purchase_bal_amt', 'purchase_bank_amt',
        'consume_amt', 'transfer_amt', 'tftobal_amt',
       'tftocard_amt', 'share_amt']

sns.heatmap(data_balance_1[feature].corr(), linewidths = 0.05)   
plt.title("The coleration between each feature in User_Balance_Table")

在这里插入图片描述

对银行利率及支付宝利率进行分析

# 读取银行利率并添加时间戳
bank = pd.read_csv(dataset_path + "mfd_bank_shibor.csv")
bank = bank.rename(columns = {'mfd_date': 'date'})
bank_features = [x for x in bank.columns if x not in ['date']]
bank['date'] = pd.to_datetime(bank['date'], format= "%Y%m%d")
bank['day'] = bank['date'].dt.day
bank['month'] = bank['date'].dt.month
bank['year'] = bank['date'].dt.year
bank['week'] = bank['date'].dt.week
bank['weekday'] = bank['date'].dt.weekday

# 读取支付宝利率并添加时间戳 
share = pd.read_csv(dataset_path + 'mfd_day_share_interest.csv')
share = share.rename(columns = {'mfd_date': 'date'})
share_features = [x for x in share.columns if x not in ['date']]
share['date'] = pd.to_datetime(share['date'], format= "%Y%m%d")
share['day'] = share['date'].dt.day
share['month'] = share['date'].dt.month
share['year'] = share['date'].dt.year
share['week'] = share['date'].dt.week
share['weekday'] = share['date'].dt.weekday
share.head()

画出银行利率的时序图:

# 画出银行利率的时序图
plt.figure(figsize=(15,5))
for i in bank_features:
    plt.plot(bank['date'], bank[[i]] ,label=i)
plt.legend()
plt.title("The time series of bank rate")
plt.xlabel("Time")
plt.ylabel("Rate")

在这里插入图片描述
画出部分银行利率与购买量的时序图:

# 画出部分银行利率与购买量的时序图 
fig,ax1 = plt.subplots(figsize=(15,5))
plt.plot(bank['date'], bank['Interest_3_M'],'b',label="Interest_3_M")
plt.plot(bank['date'], bank['Interest_6_M'],'cyan',label="Interest_6_M")
plt.plot(bank['date'], bank['Interest_9_M'],'skyblue',label="Interest_9_M")

plt.legend()

ax2=ax1.twinx()
plt.plot(total_balance['date'], total_balance['total_purchase_amt'],'g',label="Total purchase")

plt.legend(loc=2)
plt.title("The time series of bank rate and purchase")
plt.xlabel("Time")
plt.ylabel("Rate & Amount")
plt.show()

在这里插入图片描述
画出部分银行利率与赎回量的时序图:

# 画出部分银行利率与赎回量的时序图 
fig,ax1 = plt.subplots(figsize=(15,5))
plt.plot(bank['date'], bank['Interest_3_M'],'b',label="Interest_3_M")
plt.plot(bank['date'], bank['Interest_6_M'],'cyan',label="Interest_6_M")
plt.plot(bank['date'], bank['Interest_9_M'],'skyblue',label="Interest_9_M")

plt.legend()

ax2=ax1.twinx()
plt.plot(total_balance['date'], total_balance['total_redeem_amt'],'g',label="Total redeem")

plt.legend(loc=2)
plt.title("The time series of bank rate and redeem")
plt.xlabel("Time")
plt.ylabel("Rate & Amount")
plt.show()

在这里插入图片描述
画出支付宝利率与购买量的时序图:

# 画出支付宝利率与购买量的时序图
fig,ax1 = plt.subplots(figsize=(15,5))
for i in share_features:
    plt.plot(share['date'], share[i],'b',label=i)
    break
plt.legend()
ax2=ax1.twinx()
plt.plot(total_balance['date'], total_balance['total_purchase_amt'],'g',label="Total purchase")
plt.legend(loc=2)
plt.show()

在这里插入图片描述
画出支付宝利率与赎回量的时序图:

# 画出支付宝利率与赎回量的时序图 
fig,ax1 = plt.subplots(figsize=(15,5))
for i in share_features:
    plt.plot(share['date'], share[i],'b',label=i)
    break
plt.legend()
ax2=ax1.twinx()
plt.plot(total_balance['date'], total_balance['total_redeem_amt'],'g',label="Total redeem")
plt.legend(loc=2)
plt.show()

在这里插入图片描述
画出银行利率与支付宝利率的时序图:

# 画出银行利率与支付宝利率的时序图 
fig,ax1 = plt.subplots(figsize=(15,5))
plt.plot(bank['date'], bank['Interest_3_M'],c='g',label= 'BANK')

plt.legend()
ax2=ax1.twinx()
plt.plot(share['date'], share['mfd_daily_yield'],label='SHARE')
plt.legend(loc=2)
plt.show()

在这里插入图片描述
可以看出:

  • 支付宝利率对用户申购行为的影响较大,银行利率对用户赎回行为的影响较大。
  • 支付宝利率对用户行为的影响生效较快,银行利率对用户的行为的影响生效较慢。

分析用户的其他属性

统计每个城市用户的日总交易额的区别并绘制其核密度函数:

# 添加城市、星座、性别
data_balance = pd.merge(data_balance, users, on='user_id')

# 统计每个城市用户的日总交易额的区别并绘制分布估计图
fig = plt.figure(figsize=(10,5))
for i in np.unique(data_balance['city']):
    temp = data_balance.groupby(['date','city'], as_index=False)['total_purchase_amt','total_redeem_amt'].sum()
    ax = sns.kdeplot( temp[temp['city'] == i]['total_purchase_amt'],label=i)
plt.legend(loc='best')
plt.title("The time series of different city of Purchase and Redeem")
plt.xlabel("Time")
plt.ylabel("Amount")
plt.show()

fig = plt.figure(figsize=(10,5))
for i in np.unique(data_balance['city']):
    temp = data_balance.groupby(['date','city'], as_index=False)['total_purchase_amt','total_redeem_amt'].sum()
    ax = sns.kdeplot( temp[temp['city'] == i]['total_redeem_amt'],label=i)
plt.legend(loc='best')
plt.title("The time series of different city of Purchase and Redeem")
plt.xlabel("Time")
plt.ylabel("Amount")
plt.show()

在这里插入图片描述
在这里插入图片描述
统计每个性别用户的日总交易额的区别,并绘制时序图:

# 统计每个性别用户的日总交易额的区别,并绘制时序图
temp = data_balance.groupby(['date','sex'], as_index=False)['total_purchase_amt','total_redeem_amt'].sum()

fig = plt.figure(figsize=(20,6))
plt.plot(temp[temp['sex'] == 1]['date'], temp[temp['sex'] == 1]['total_purchase_amt'],label='Male')
plt.plot(temp[temp['sex'] == 0]['date'], temp[temp['sex'] == 0]['total_purchase_amt'],label='Female')
plt.legend(loc='best')
plt.title("The time series of two gender user of Purchase")
plt.xlabel("Time")
plt.ylabel("Amount")
plt.show()

fig = plt.figure(figsize=(20,6))
plt.plot(temp[temp['sex'] == 1]['date'], temp[temp['sex'] == 1]['total_purchase_amt'],label='Male')
plt.plot(temp[temp['sex'] == 0]['date'], temp[temp['sex'] == 0]['total_redeem_amt'],label='Female')
plt.legend(loc='best')
plt.title("The time series of two gender user of Redeem")
plt.xlabel("Time")
plt.ylabel("Amount")
plt.show()

在这里插入图片描述
在这里插入图片描述
统计每个星座用户的日总交易额的差别并画出核密度函数:

# 统计每个星座用户的日总交易额的差别并画出核密度函数
fig = plt.figure(figsize=(10,5))
for i in np.unique(data_balance['constellation']):
    temp = data_balance.groupby(['date','constellation'], as_index=False)['total_purchase_amt','total_redeem_amt'].sum()
    ax = sns.kdeplot( temp[temp['constellation'] == i]['total_purchase_amt'],label=i)
plt.legend(loc='best')
plt.title("The time series of small deal of Purchase and Redeem")
plt.xlabel("Time")
plt.ylabel("Amount")
plt.show()

fig = plt.figure(figsize=(10,5))
for i in np.unique(data_balance['constellation']):
    temp = data_balance.groupby(['date','constellation'], as_index=False)['total_purchase_amt','total_redeem_amt'].sum()
    ax = sns.kdeplot( temp[temp['constellation'] == i]['total_redeem_amt'],label=i)
plt.legend(loc='best')
plt.title("The time series of small deal of Purchase and Redeem")
plt.xlabel("Time")
plt.ylabel("Amount")
plt.show()

在这里插入图片描述
在这里插入图片描述
可以看出不同城市、性别、星座的用户的交易行为存在一定的差别。

  • 2
    点赞
  • 34
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值