网站数据用户行为分析 ---- A/B测试

最新推荐文章于 2021-11-17 17:02:04 发布

qq_32811823

最新推荐文章于 2021-11-17 17:02:04 发布

阅读量665

点赞数 2

分类专栏：数据分析机器学习文章标签： A/B测试

本文链接：https://blog.csdn.net/qq_32811823/article/details/95109615

版权

分析A/B测试结果

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 294478 entries, 0 to 294477
Data columns (total 5 columns):
user_id         294478 non-null int64
timestamp       294478 non-null object
group           294478 non-null object
landing_page    294478 non-null object
converted       294478 non-null int64
dtypes: int64(2), object(3)
memory usage: 11.2+ MB

df.shape

(294478, 5)

c. 数据集中独立用户的数量。

df['user_id'].nunique()

d. 用户转化的比例。

df[df['converted'] == 1].count()[0] / df.count()[0]

0.11965919355605512

e. new_page 与 treatment 不一致的次数。

df[(df['landing_page'] == 'old_page') & (df['group'] == 'treatment')].count()[0]

f. 是否有任何行存在缺失值？

df.isnull().sum()

user_id         0
timestamp       0
group           0
landing_page    0
converted       0
dtype: int64

2. 对于 treatment 不与 new_page 一致的行或 control 不与 old_page 一致的行，我们不能确定该行是否真正接收到了新的或旧的页面。删除不确定是否接收的数据，在分析。

a. 将删除不确定是否接收后的数据存储在 df2 中。

df_nc = df[(df['landing_page'] == 'new_page') & (df['group'] == 'control')]

df_ot = df[(df['landing_page'] == 'old_page') & (df['group'] == 'treatment')]

df_notclear = pd.concat([df_nc, df_ot])

df2 = df.drop(index=df_notclear.index)

# Double Check all of the correct rows were removed - this should be 0
df2[((df2['group'] == 'treatment') == (df2['landing_page'] == 'new_page')) == False].shape[0]

df2.head()

	user_id	timestamp	group	landing_page	converted
0	851104	2017-01-21 22:11:48.556739	control	old_page	0
1	804228	2017-01-12 08:01:45.159739	control	old_page	0
2	661590	2017-01-11 16:55:06.154213	treatment	new_page	0
3	853541	2017-01-08 18:28:03.143765	treatment	new_page	0
4	864975	2017-01-21 01:52:26.210827	control	old_page	1

3. 使用 df2 与下面的单元格来回答课堂中的 测试3 。

a. df2 中有290584唯一的 user_id?

df2["user_id"].nunique()

b. df2 中有一个重复的 773192 的ID

dup_user_id = df2["user_id"][df2["user_id"].duplicated()].get_values()[0]

print("df2 中有一个重复的 {} 的ID".format(dup_user_id))

df2 中有一个重复的 773192 的ID

c. 观察这个重复的 user_id 的行信息。

df2[df2["user_id"] == dup_user_id]

	user_id	timestamp	group	landing_page	converted
1899	773192	2017-01-09 05:37:58.781806	treatment	new_page	0
2893	773192	2017-01-14 02:55:59.590927	treatment	new_page	0

d. 删除一个含有重复的 user_id 的行，但需要确保你的 dataframe 为 df2。

df2.drop(index=df2[df2["user_id"] == dup_user_id].index[0], inplace=True)

4.

a. 不管它们收到什么页面，单个用户的转化率是多少？

def tran_nub(data):
    tran = data[data['converted'] == 1].count()[0] / data.count()[0]    
    return tran

最低0.47元/天解锁文章

qq_32811823

关注

2
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
网站数据用户行为分析 ---- A/B测试

分析A/B测试结果目录简介I - 概率II - A/B 测试III - 回归简介为了得出电子商务网站运行的 A/B 测试的结果,帮助公司弄清楚是否应该使用新的页面，保留旧的页面，或者应该将测试时间延长，之后再做出决定。I - 概率import pandas as pdimport numpy as npimport randomimport matplotlib.p...
复制链接

扫一扫