京东数据分析——以Python为主要工具
- 数据探索分析
- 用户购买行为建模
数据基本情况
1.数据来源
2.数据概况
3.本文内容
1.探索分析
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus']=False
df = pd.read_csv('JDdata.csv')
df.head()
customer_id | product_id | action_date | action_id | type | age_range | gender | customer_register_date | customer_level | city_level | brand | shop_id | category | product_market_date | vender_id | fans_number | vip_number | shop_register_date | shop_category | shop_score |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1068082 | 5924 | 2018/4/1 | 3853455 | PageView | 6.0 | W | 2012/8/13 | 1 | 4.0 | Li-Ning | 8002 | Badminton | 2013/6/14 | 1457 | 213610 | 357580 | 2013/5/23 | Outdoor Sports |
1 | 1068082 | 10723 | 2018/4/1 | 3853455 | PageView | 6.0 | W | 2012/8/13 | 1 | 4.0 | Li-Ning | 8002 | Badminton | 2011/7/19 | 1457 | 213610 | 357580 | 2013/5/23 | Outdoor Sports |
2 | 735392 | 4559 | 2018/2/6 | 9434940 | PageView | 6.0 | M | 2009/9/15 | 7 | 5.0 | Other | 6944 | Cookie | 2015/11/16 | 355 | 63969 | 85597 | 2014/10/31 | Food |
3 | 1516193 | 374293 | 2018/3/17 | 2105553 | PageView | 6.0 | M | 2016/5/25 | 1 | 4.0 | illuma | 3830 | Milk Power | 2017/12/25 | 25 | 11576 | 20927 | 2016/12/16 | Mother and Infant |
5 | 1160556 | 285165 | 2018/2/4 | 8914730 | PageView | 6.0 | M | 2012/3/11 | 7 | 4.0 | Other | 744 | Coat | 2016/5/16 | 8627 | 180250 | 52606 | 2016/5/16 | Clothes |
#数据清洗与数据类型变换,将时间变为标准时间格式
df=df.dropna(how='any',axis=0)
df['action_date']=pd.to_datetime(df.action_date)
df['customer_register_date']=pd.to_datetime(df.customer_register_date)
df['shop_register_date']=pd.to_datetime(df.shop_register_date)
df['product_market_date']=pd.to_datetime(df.product_market_date)
#查询用户购买行为情况
order_comment=df.query("type=='Order'| type=='Comment' " )
order_comment.sort_values(by='action_date',ascending=True)
order_comment['month']=order_comment.action_date.values.astype('datetime64[M]')
#根据用户购物的月份生成数据透视表,汇总每月购物的用户
order_month=order_comment.groupby(by='month')
order_count_by_month=order_month.customer_id.count()
plt.figure(1,figsize=(10,4)) #生成图表,生成两张子图
plt.subplot(1,2,1)
order_count_by_month.plot() #子图1 每月购物的用户数
plt.subplot(1,2,2)
order_count_by_month.