20200915电商数据分析 - 偏逻辑思维(一)

这是一个电商数据分析题目,可以用excel做,可以用sql做,可以用python做

题目如下:
在这里插入图片描述
数据源如下:
链接:https://pan.baidu.com/s/1gM2NoS_1SxUCgPbf-9mgYg
提取码:9zrs

Excel篇:其他人用数据透析表做的
SQL篇:我遇到一些问题,重新用pyhton做
Python篇:如下

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.read_csv('数据源.csv')
C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3057: DtypeWarning: Columns (1,2,3,4,7,8) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
df = data.copy()
data.dropna(how="all",inplace=True)  #删除全部是Null值的行
data.drop_duplicates(inplace=True)  #删除重复行
data.head()
idweekdateorder_dorder_equantityamountphoneaccount
01.0NaN2017/6/8GU20170608923396-1GU201706089233961.02750.01500026566penny.li@cn.gucci.com
12.0NaN2017/6/8GU20170608923428-1GU201706089234281.02250.016201251750vincent.qiu@gmail.com
23.0NaN2017/6/8GU20170608923430-1GU201706089234301.03900.015010529091412526292@qq.com
34.0NaN2017/6/8GU20170608923431-1GU201706089234311.03300.0126216016251941327726@qq.com
45.0NaN2017/6/8GU20170608923435-1GU201706089234351.02250.012917062526aliutina@163.com
data.date = pd.to_datetime(df.date,format='%Y-%m-%d')
data.phone = df.phone.astype('str')
data.describe()
idquantityamount
count293657.000000293657.000000293657.000000
mean146829.0000001.0080504533.377958
std84771.6183370.2597763904.373436
min1.0000001.000000350.000000
25%73415.0000001.0000001750.000000
50%146829.0000001.0000003650.000000
75%220243.0000001.0000006450.000000
max293657.00000062.00000071500.000000
len(data)
293657
data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 293657 entries, 0 to 293656
Data columns (total 9 columns):
id          293657 non-null float64
week        5491 non-null object
date        293657 non-null datetime64[ns]
order_d     293657 non-null object
order_e     293657 non-null object
quantity    293657 non-null float64
amount      293657 non-null float64
phone       293657 non-null object
account     254716 non-null object
dtypes: datetime64[ns](1), float64(3), object(5)
memory usage: 22.4+ MB
data.isnull().sum()  #查看列缺失值
id               0
week        288166
date             0
order_d          0
order_e          0
quantity         0
amount           0
phone            0
account      38941
dtype: int64
  • 列account有缺失值,Week只有本周有数值,之前没有数值
len(data.account.value_counts()) #唯一account一共有199179
199179
len(data.phone.value_counts()) #唯一phone一共有234490,这里phone表示下单,account表示注册,下单用户大于注册用户正常
234490
df_week = data[data.week == 'Sep W36']   #Sep W36一共有5491条数据
len(df_week.phone.value_counts())  #Sep W36品牌周,一共有5054个用户购买,这里以一个手机号是一个用户来统计用户个数
5054
len(df_week)
5491
len(df_history)
288166
df_history = data[data.week != 'Sep W36']#品牌周之前的数据,一共288166条消费记录
len(df_history.phone.value_counts())
230260

第一部分:

  • 本周新客人数(开店至今本周第一次购买):4230人
  • 本周一共有5491条消费记录,用户5054人,新客4230,老客824人,新客消费4483条,老客消费1008条
  • 本周新客消费记录保存在df_week,本周老客消费记录保存在df_history
df2 = pd.merge(df_week,df_history,how='left',on='phone') #_x为week的数据,_y为history数据
df2.head()
id_xweek_xdate_xorder_d_xorder_e_xquantity_xamount_xphoneaccount_xid_yweek_ydate_yorder_d_yorder_e_yquantity_yamount_yaccount_y
0288167.0Sep W362020-08-30GU202008291446552-1GU2020082914465521.06900.01512000254215120002542NaNNaNNaTNaNNaNNaNNaNNaN
1288168.0Sep W362020-08-30GU202008291446404-1GU2020082914464041.010200.01594171656615714177539NaNNaNNaTNaNNaNNaNNaNNaN
2288169.0Sep W362020-08-30GU202008291447049-1GU2020082914470491.06900.01272126677912721236779NaNNaNNaTNaNNaNNaNNaNNaN
3288170.0Sep W362020-08-30GU202008291446576-1GU2020082914465761.02900.01609166642713091333427NaNNaNNaTNaNNaNNaNNaNNaN
4288171.0Sep W362020-08-30GU202008291446393-1GU2020082914463931.014300.0176276005591762760055995877.0NaN2019-08-26GU201908261110787-1GU2019082611107871.010500.017627600559
len(df2)
10519
df2.id_y.isnull().sum()
4483
  • 把于本周第一次消费的数据提取出来,保存在df_sep_first
  • 本周合计销售金额:25653800
  • 本周新客销售金额:20929940
  • 本周新客销售占比 81.59%
df_week.amount.sum()
25653800.0
df_sep_first = df2[df2.id_y.isnull()]
len(df_sep_first)
#df_sep_first = df_sep_first[['id_x','week_x','date_x','order_d_x','order_e_x','amount_x','phone','account_x']]
#df_sep_first.head()
4483
df_sep_first.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4483 entries, 0 to 10518
Data columns (total 17 columns):
id_x          4483 non-null float64
week_x        4483 non-null object
date_x        4483 non-null datetime64[ns]
order_d_x     4483 non-null object
order_e_x     4483 non-null object
quantity_x    4483 non-null float64
amount_x      4483 non-null float64
phone         4483 non-null object
account_x     3961 non-null object
id_y          0 non-null float64
week_y        0 non-null object
date_y        0 non-null datetime64[ns]
order_d_y     0 non-null object
order_e_y     0 non-null object
quantity_y    0 non-null float64
amount_y      0 non-null float64
account_y     0 non-null object
dtypes: datetime64[ns](2), float64(6), object(9)
memory usage: 630.4+ KB
df_sep_first.amount_x.sum()
20929940.0
  • 本周新客会员人数,即在df_sep_first中,account有多少个,3961
df_sep_first.account_x.count()
3961

第二部分:

  • 近一年累计消费金额≥¥68,000的总人数:202 #创建满足1的数据框df_oneyear2
  • 满足1条件的本周产生购买的新产生客人数:
  • 满足1条件的本周产生购买的原达标客人数:
  • 满足上述2个条件的人数消费总金额:
data.date.max()
Timestamp('2020-09-05 00:00:00')
df_oneyear1 = data[(data.date < '2019-09-05')]
(df_oneyear1.pivot_table(index='phone',values='amount',aggfunc='sum').amount > 68000).sum()
202
  • 下面把202条数据提取出来到df_oneyear2
df_oneyear2 = df_oneyear1.pivot_table(index='phone',values='amount',aggfunc='sum')
df_oneyear2 = df_oneyear2[df_oneyear2.amount > 68000]
len(df_oneyear2)
202
思路:
  • 难点在于,这里是聚合求和,满足近1年消费大于68000的聚合求和,已经存到df_oneyear2
  • 本周数据df_week,其中新客df_sep_first,减去之后即为本周消费的老客,这里我们先看本周之内全部满足1条件的人数
  • 先对df_week进行聚合求和,然后再和df_oneyear2相交,查到本周内全部累计消费》68000的用户
  • 这里易错点,拿df_week直接和df_oneyear2相交,结果不对,因为没有聚合求和
# df_week.date 结果是2020-08-30至2020-09-05,不需要考虑1年以内这个条件
df_week.pivot_table(index='phone',values='amount',aggfunc='sum').sort_values(by='amount',ascending=False).head()
amount
phone
1272126502767600.0
1261199624649000.0
1562645246442500.0
1677116216637000.0
1626150121234500.0
  • 结果发现本周之内累计消费最高67600,都不满足68000,所以第2和第3都是0人

第三部分

  • 近一年有多次消费,且至少有一笔消费金额≥¥18,000
  • 近一年仅一次消费,且该笔消费金额≥¥18,000
  • 近一年累计消费金额<¥68,000

  • 满足上述条件的总人数:
  • 满足1条件的本周产生购买的新产生客人数:
  • 满足1条件的本周产生购买的原达标客人数:
  • 满足上述2个条件的人数消费总金额:

思路:第一题的总人数,先求出3个条件每一个满足条件的人数,再求和

df_oneyear1.head()
idweekdateorder_dorder_equantityamountphoneaccount
01.0NaN2017-06-08GU20170608923396-1GU201706089233961.02750.01500026566penny.li@cn.gucci.com
12.0NaN2017-06-08GU20170608923428-1GU201706089234281.02250.016201251750vincent.qiu@gmail.com
23.0NaN2017-06-08GU20170608923430-1GU201706089234301.03900.015010529091412526292@qq.com
34.0NaN2017-06-08GU20170608923431-1GU201706089234311.03300.0126216016251941327726@qq.com
45.0NaN2017-06-08GU20170608923435-1GU201706089234351.02250.012917062526aliutina@163.com
a = df_oneyear1[df_oneyear1.amount > 18000]
a.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1438 entries, 51 to 98105
Data columns (total 9 columns):
id          1438 non-null float64
week        0 non-null object
date        1438 non-null datetime64[ns]
order_d     1438 non-null object
order_e     1438 non-null object
quantity    1438 non-null float64
amount      1438 non-null float64
phone       1438 non-null object
account     1096 non-null object
dtypes: datetime64[ns](1), float64(3), object(5)
memory usage: 112.3+ KB
a.phone.value_counts() > 1
#遇到问题,不知道怎么计算大于1的数量,待查
16262121224     True
17621541994     True
12647119726     True
16129000006     True
16652119121     True
12064295506     True
12520221500     True
16964522200     True
16521512766     True
16612052679     True
16996176222     True
12500226762     True
15222227475     True
16201967491     True
12405291559     True
16570712222     True
15267120909     True
15210924674     True
16606609226     True
16552005664     True
16552401027     True
16752720064     True
15625676110     True
16916976602     True
15210951271     True
12676106654     True
16916055229     True
16264169746     True
12610649665     True
12662560267     True
               ...  
16466210740    False
12600679612    False
12001729520    False
12420052226    False
15907952196    False
15625229997    False
16921950066    False
12696912090    False
16979201079    False
12100221766    False
16201609226    False
16261646692    False
16611225626    False
15927546424    False
15912621004    False
17600446922    False
12260064657    False
16727962992    False
16962049492    False
19921426110    False
16217566946    False
12210562700    False
16296267629    False
15144694469    False
16166279955    False
12970221916    False
16477045216    False
15292277991    False
15207910610    False
15157020292    False
Name: phone, Length: 1329, dtype: bool
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值