电商用户行为数据分析

最新推荐文章于 2024-02-20 09:36:47 发布

what colour

最新推荐文章于 2024-02-20 09:36:47 发布

阅读量2.6k

点赞数

本文链接：https://blog.csdn.net/qq_48201996/article/details/108312589

版权

通过对淘宝2017年11月25日至12月3日间百万用户行为数据的分析，发现用户复购率为66.01%，跳失率为5.74%。用户转化漏斗显示，31.32%的用户转化为购买者。时间维度分析显示，用户在特定时段活跃度高，如18:00-21:00。RFM模型揭示，重要挽留客户是最大群体。商品销售分析表明，部分商品有高转化率，而多数类别购买率较低，建议针对用户画像优化商品推送和活动策略。

摘要由CSDN通过智能技术生成

电商用户行为数据分析

项目背景

根据淘宝APP平台2017年11月25日至2017年12月3日之间，有行为的约一百万随机用户的所有行为（行为包括点击、购买、加购、收藏），对淘宝用户行为进行分析，从而探索淘宝用户的行为模式。

数据来源

https://tianchi.aliyun.com/dataset/dataDetail?dataId=649&userId=1

分析目标

用户行为分析
用户复购分析
漏斗分析
用户价值分类
销售品类分析

数据读取与展示

#导入库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus']=False
from pyecharts.globals import CurrentConfig, NotebookType
from pyecharts import options as opts
from pyecharts.charts import Line
from pyecharts.charts import Bar

在这里插入图片描述

数据预处理

import datetime

# 时区转换to_datetime的默认时区不是中国,所以要加8小时
df['time']=pd.to_datetime(df['timestamp'],unit='s')+datetime.timedelta(hours=8)

# 保留2017.11.25-2017.12.3期间的数据
startTime = datetime.datetime.strptime("2017-11-25 00:00:00","%Y-%m-%d %H:%M:%S")
endTime = datetime.datetime.strptime("2017-12-03 23:59:59","%Y-%m-%d %H:%M:%S")

df = df[(df.time>=startTime)&(df.time<=endTime)]
df = df.reset_index(drop=True)

# 按照日期和小时进行时间拆分
df['date'] = df.time.dt.date
df['hour'] = df.time.dt.hour

# 删除时间戳节约内存
df.drop('timestamp', inplace=True, axis=1)

在这里插入图片描述

数据分析与可视化

用户行为分析

总体分析

# 9天总用户量，有操作的商品及商品类目
total_unique_users = df.userid.nunique()                          # 独立访客数UV
total_unique_itemid = df.itemid.nunique()                         # 有操作的商品
total_unique_categoryid = df.categoryid.nunique()                 # 有操作的商品类目
user_bought_count = df[df['type']=='buy'].userid.nunique() # 付费用户数
user_nobought_count = total_unique_users - user_bought_count             # 非付费用户数
print(f"          UV：{total_unique_users}")
print(f"      商品数：{total_unique_itemid}")
print(f"      类目数：{total_unique_categoryid}")
print(f"  付费用户数：{user_bought_count}")
print(f"付费用户占比：{user_bought_count/total_unique_users*100:.2f}%")

         UV：97810
      商品数：1560525
      类目数：7966
   付费用户数：66452
 付费用户占比：67.94%

在2017年11月25日至2017年12月3日之间的九天内，从中挑选了10000000万行数据，其中在这九天内的用户浏览量为97810次，一共浏览了1560525件商品，可分为7966类。
其中付费的用户数为66452，占所有用户数的67.92%。

操作类型分析

#用户操作类型
type_series = df.type.value_counts()
plt.figure()
plt.pie(x=type_series,labels=type_series.index,autopct='%1.2f%%')
plt.show()

在这里插入图片描述

# 9日内各个行为的操作总数，每日平均操作数，每日平均操作用户数记录
type_df=pd.DataFrame([type_series,type_series/9,type_series/total_unique_users],
                     index=['total','avg_day','avg_user'])

# 付费用户行为记录
type_df.loc['paying_user']=dataframe[dataframe['userid'].isin(dataframe[dataframe['type']=='buy']['userid'])].type.value_counts()

在这里插入图片描述

跳失率和复购率

跳失率=只有点击行为的用户/总用户数

其实真正的跳失率应该是只浏览一个页面就离开的访问次数 / 该页面的全部访问次数
这边只是为了突出这些有待发展的客户

复购率=购买2次及以上用户数/总购买用户数
复购率可以分为按客户计算和按交易计算，这里我采用的是按客户计算（周期为9天）

groupby_userid = df.groupby(by='userid')
user_type = groupby_userid.type.value_counts().unstack()  # 使用unstack进行不堆叠操作，列方向上的索引转成行方向的索引

在这里插入图片描述

# 跳失率
# sum(axis=1)对DataFrame进行横向相加，如果一个userid的pv值==横向相加的和，那就表明他只有点击行为
only_pv_users = user_type[user_type['pv']==user_type.

最低0.47元/天解锁文章

what colour

关注

0
点赞
踩
19

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫