(Adventure项目)自行车业务数据分析报告(一)

(Adventure项目)自行车业务数据分析报告(一)

项目背景
  • Adventure Works Cycles是Adventure Works样本数据库所虚构的公司,这是一家大型跨国制造公司。该公司生产和销售自行车到北美,欧洲和亚洲的商业市场。虽然其基地业务位于华盛顿州博塞尔,拥有290名员工,但几个区域销售团队遍布整个市场。

2019年11月自行车业务分析报告

目录:

  • 一、自行车整体销售表现
  • 二、2019年11月自行车地域销售表现
  • 三、2019年11月自行车产品销售表现
  • 四、用户行为分析
  • 五、2019年11月热品销售分析

本文主要介绍第一部分:自行车整体销售表现,其他章节可访问本专栏Adventure自行车项目,建议订阅收藏

  • 自行车整体销售表现:pt_overall_sale_performance_1
  • 2019年11月自行车地域销售表现:pt_bicy_november_territory_2、pt_bicy_november_october_city_3
  • 2019年11月自行车产品销售表现:pt_bicycle_product_sales_month_4、pt_bicycle_product_sales_order_month_4、pt_bicycle_product_sales_order_month_11
  • 用户行为分析:pt_user_behavior_november
  • 2019年11月热品销售分析:pt_hot_products_november
#导入模块
import pandas as pd
import numpy as np
import pymysql
pymysql.install_as_MySQLdb()
import sqlalchemy

一、自行车整体销售表现

1.1、从数据库读取源数据:dw_customer_order

#读取源数据。不同城市,每天产品销售信息
#创建数据库引擎
engine = sqlalchemy.create_engine('mysql+pymysql://ID:*****@localhost:3306/database?charset=gbk')
sql_query='select * from dw_customer_order'
gather_customer_order=pd.read_sql_query(sql=sql_query,con=engine)
#查看源数据前5行,观察数据,判断数据是否正常识别
gather_customer_order.head()
create_dateproduct_namecpzl_zwcplb_zworder_numcustomer_numsum_amountis_current_yearis_last_yearis_yesterdayis_todayis_current_monthis_current_quarterchinese_provincechinese_citychinese_territory
02019-01-02AWC Logo Cap帽子服装118.99000000直辖市潍坊市华东
12019-01-02AWC Logo Cap帽子服装118.99000000福建省三明市华东
22019-01-02AWC Logo Cap帽子服装118.99000000福建省福州市华东
32019-01-02AWC Logo Cap帽子服装118.99000000辽宁省青岛市华东
42019-01-02AWC Logo Cap帽子服装118.99000000湖南省长沙市华中

1.1.2 查看源数据类型:dw_customer_order

gather_customer_order.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 518361 entries, 0 to 518360
Data columns (total 16 columns):
create_date           518361 non-null object
product_name          518361 non-null object
cpzl_zw               518361 non-null object
cplb_zw               518361 non-null object
order_num             518361 non-null int64
customer_num          518361 non-null int64
sum_amount            518361 non-null float64
is_current_year       518361 non-null object
is_last_year          518361 non-null object
is_yesterday          518361 non-null object
is_today              518361 non-null object
is_current_month      518361 non-null object
is_current_quarter    518361 non-null object
chinese_province      518361 non-null object
chinese_city          518361 non-null object
chinese_territory     518361 non-null object
dtypes: float64(1), int64(2), object(13)
memory usage: 63.3+ MB

1.1.3 利用create_date字段增加create_year_month月份字段

#增加create_year_month月份字段。按月维度分析时使用
gather_customer_order['create_year_month']=gather_customer_order.create_date.apply(lambda x:x.strftime('%Y-%m'))
gather_customer_order['create_year_month']
0         2019-01
1         2019-01
2         2019-01
3         2019-01
4         2019-01
           ...   
518356    2021-02
518357    2021-02
518358    2021-02
518359    2021-02
518360    2021-02
Name: create_year_month, Length: 518361, dtype: object

1.1.4 筛选产品类型cplb_zw中的自行车作为新gather_customer_order

#筛选产品类别为自行车的数据
gather_customer_order = gather_customer_order.loc[gather_customer_order['cplb_zw']=='自行车']
gather_customer_order.head()
create_dateproduct_namecpzl_zwcplb_zworder_numcustomer_numsum_amountis_current_yearis_last_yearis_yesterdayis_todayis_current_monthis_current_quarterchinese_provincechinese_citychinese_territorycreate_year_month
1522019-01-02Mountain-100 Silver山地自行车自行车113399.9900000000江苏省盐城市华东2019-01
1532019-01-02Mountain-200 Black山地自行车自行车112294.9900000000海南省焦作市华南2019-01
1542019-01-02Mountain-200 Black山地自行车自行车112294.9900000000陕西省阜阳市西北2019-01
1552019-01-02Mountain-200 Black山地自行车自行车112294.9900000000贵州省贵阳市西南2019-01
1562019-01-02Mountain-200 Black山地自行车自行车112049.0982000000贵州省铜仁市西南2019-01

1.2、自行车整体销售量表现

1.2.1 聚合每月订单数量和销售金额,具体groupby创建一个新的对象,需要将order_num、sum_amount求和,对日期降序排序,记得重置索引

#每月订单数量和销售金额,用groupby创建一个新的对象,需要将order_num、sum_amount求和
overall_sales_performance = gather_customer_order.groupby('create_year_month').agg({'order_num':sum,
                                                                                    'sum_amount':sum})\
.sort_values('create_year_month',ascending=False).reset_index()
#取消科学计数法
pd.set_option('display.float_format', lambda x: '%f' % x)
#按日期降序排序,方便计算环比
overall_sales_performance.head()
create_year_monthorder_numsum_amount
02021-02606111348244.679600
12021-01792514857125.340001
22020-12780714743275.933900
32020-11632211969145.536300
42020-101363226025173.289898

1.2.2 新增一列order_num_diff,此为每月自行车销售订单量环比,本月与上月对比,例如本期2019-02月销售额与上一期2019-01月销售额做对比

你需要一步步完成。

提示:

1.利用diff()函数

2.使用列表形式方便加工

3.加工形成同样长度的列表,转为Series或DataFrame与本身数据合并

overall_sales_performance.order_num-overall_sales_performance.order_num.shift()
0              nan
1      1864.000000
2      -118.000000
3     -1485.000000
4      7310.000000
5      -491.000000
6     -5451.000000
7     13155.000000
8    -12732.000000
9     -2599.000000
10    -2222.000000
11     -228.000000
12     -139.000000
13      232.000000
14     -123.000000
15      404.000000
16     -426.000000
17       20.000000
18      204.000000
19     -167.000000
20      -73.000000
21       90.000000
22     -106.000000
23      115.000000
24     -276.000000
25       85.000000
Name: order_num, dtype: float64
overall_sales_performance.order_num.diff()
0              nan
1      1864.000000
2      -118.000000
3     -1485.000000
4      7310.000000
5      -491.000000
6     -5451.000000
7     13155.000000
8    -12732.000000
9     -2599.000000
10    -2222.000000
11     -228.000000
12     -139.000000
13      232.000000
14     -123.000000
15      404.000000
16     -426.000000
17       20.000000
18      204.000000
19     -167.000000
20      -73.000000
21       90.000000
22     -106.000000
23      115.000000
24     -276.000000
25       85.000000
Name: order_num, dtype: float64
#求每月自行车销售订单量环比,观察最近一年数据变化趋势
#环比是本月与上月的对比,例如本期2019-02月销售额与上一期2019-01月销售额做对比
order_num_diff = list(-(overall_sales_performance.order_num.diff())/overall_sales_performance.order_num)
order_num_diff.pop(0) #删除列表中第一个元素
order_num_diff.append(0) #将0新增到列表末尾
order_num_diff
[-0.235205047318612,
 0.015114640707057769,
 0.23489402087946853,
 -0.5362382629107981,
 0.03736397534434213,
 0.7088426527958388,
 -0.6310865915087551,
 1.5693331689880439,
 0.47134566557852736,
 0.6749696233292831,
 0.07441253263707572,
 0.04752136752136752,
 -0.07348748812163447,
 0.04054054054054054,
 -0.11751018033740547,
 0.14143426294820718,
 -0.006596306068601583,
 -0.0630407911001236,
 0.054415118931247966,
 0.024365821094793058,
 -0.02916396629941672,
 0.03557046979865772,
 -0.03715670436187399,
 0.09790705924086555,
 -0.029269972451790634,
 0]
#将环比转化为DataFrame
overall_sales_performance = pd.concat([overall_sales_performance,pd.DataFrame({'order_num_diff':order_num_diff})],axis=1)
overall_sales_performance
create_year_monthorder_numsum_amountorder_num_diff
02021-02606111348244.679600-0.235205
12021-01792514857125.3400010.015115
22020-12780714743275.9339000.234894
32020-11632211969145.536300-0.536238
42020-101363226025173.2898980.037364
52020-091314124648450.6711990.708843
62020-08769014111182.146000-0.631087
72020-072084538859851.6098011.569333
82020-06811315175107.9059010.471346
92020-05551410354835.8388000.674970
102020-0432926202452.0528000.074413
112020-0330645702518.2068000.047521
122020-0229255301279.160000-0.073487
132020-0131575965646.8076000.040541
142019-1230345735361.733800-0.117510
152019-1134386568254.9704000.141434
162019-1030125573722.429800-0.006596
172019-0930325688701.623900-0.063041
182019-0832366013788.6936000.054415
192019-0730695750549.2445000.024366
202019-0629965621673.571500-0.029164
212019-0530865734630.4628000.035570
222019-0429805489733.524800-0.037157
232019-0330955776688.5282000.097907
242019-0228195315683.034400-0.029270
252019-0129045441653.4585000.000000

1.2.3 新增一列sum_amount_diff,此为每月自行车销售金额环比,原理一样,但是所需字段不同,最后形成按照日期升序排列

#求每月自行车销售金额环比
sum_amount_diff = list(-(overall_sales_performance.sum_amount.diff())/overall_sales_performance.sum_amount)
sum_amount_diff.pop(0) #删除列表中第一个元素
sum_amount_diff.append(0) #将0新增到列表末尾
sum_amount_diff
#将环比转化为DataFrame
overall_sales_performance = pd.concat([overall_sales_performance,pd.DataFrame({'sum_amount_diff':sum_amount_diff})],axis=1)
#销量环比字段名order_diff,销售金额环比字段名amount_diff
#按照日期排序,升序
overall_sales_performance = overall_sales_performance.rename(columns={'sum_amount_diff':'amount_diff'})\
.sort_values('create_year_month')
#查看每月自行车订单量、销售金额、环比、前5行
overall_sales_performance.head(5)
create_year_monthorder_numsum_amountorder_num_diffamount_diff
252019-0129045441653.4585000.0000000.000000
242019-0228195315683.034400-0.029270-0.023149
232019-0330955776688.5282000.0979070.086726
222019-0429805489733.524800-0.037157-0.049675
212019-0530865734630.4628000.0355700.044610

字段注释:

create_year_month:时间,order_num:本月累计销售数量,sum_amount:本月累计销售金额,order_diff:本月销售数量环比,

sum_amount_diff:本月销售金额环比,dw_customer_order:用户订单表

1.2.4 将最终的overall_sales_performance的DataFrame存入Mysql的pt_overall_sale_performance_1当中,请使用追加存储。

#将数据存入数据库
engine = sqlalchemy.create_engine('mysql://id:password@xxx.xx.xx.xxx;3306/db?charset=gbk')
overall_sales_performance.to_sql('name',con=engine,if_exists='replace',index=False)
  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值