数据分析AB-Test

最新推荐文章于 2023-06-15 09:04:18 发布

jackwang1780

最新推荐文章于 2023-06-15 09:04:18 发布

阅读量490

点赞数

分类专栏： A/B Test 文章标签： python 数据分析

本文链接：https://blog.csdn.net/jackwang1780/article/details/107702295

版权

A/B Test 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

博客分析了在AB测试中，经过数据清理后从294478条记录减少到290585条。通过计算user_id的唯一值，发现去重后仍有290584条记录。进一步计算得出Z值为1.31，P值为0.905，结果显示新老版本之间没有显著差异，因此建议停止当前试验并优化方案。

摘要由CSDN通过智能技术生成

import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline
#We are setting the seed to assure you get the same answers on quizzes as we set up
random.seed(42)

df=pd.read_csv(r'D:\数据分析\ABTEST\ab_data.csv')
df.head()

在这里插入图片描述

# 下面这句代码，展示的就是group=treatment且landing_page=old_page和group=control且landing_page=new_page，这样的错误行；
num_error = df[((df.group == "treatment")!=(df.landing_page == "new_page"))]
num_error.head(3)

在这里插入图片描述

# 去掉错误行后，再次查看是否还存在错误行
print("没有删除错误行之前的记录数：", df.shape[0])
#删除landing_page == "new_page")&(df.group == "control")的行，1928行
df2 = df[~((df.landing_page == "new_page")&(df.group == "control"))]  
# print(df2)    
#删除(df2.landing_page == "old_page")&(df2.group == "treatment")的行，1965行
df3 = df2[~((df2.landing_page == "old_page")&(df2.group == "treatment"))]
# print(df3)    
print("删除错误行之后的记录数：", df3.shape[0])
print("错误行共有",str(df.shape[0]-df3.shape[0]),"条记录")
num_error2 = df3[((df3.group == "treatment")!=(df3.landing_page == "new_page"))].shape[0]
num_error2

没有删除错误行之前的记录数： 294478
删除错误行之后的记录数： 290585
错误行共有 3893 条记录

# 查看是否有重复行
print("数据的记录数为：", df3.user_id.shape[0])
print("将user_id去重计数后的记录数为：", df3.user_id.nunique())

数据的记录数为： 290585
将user_id去重计数后的记录数为： 290584

# 查看重复的行
print(df3[df3.user_id.duplicated(keep=False)])
#  去除重复的行
df4 = df3.drop_duplicates(subset=["user_id"],keep="first")
df4.shape[0]

在这里插入图片描述

# 我们来看一下control组的转化率
control_converted = df4.query('group=="control"').converted.mean()
control_converted

0.1203863045004612

# 再来看一下treatment组的转化率
treatment_converted = df4.query('group=="treatment"').converted.mean()
treatment_converted

0.11880806551510564

# 进行独立两样本的假设检验
import statsmodels.stats.proportion as ssp

converted_old = df4[df4.landing_page == "old_page"].converted.sum()
print(converted_old)
converted_new = df4[df4.landing_page == "new_page"].converted.sum()
print(converted_new)
n_old = len(df4[df4.landing_page == "old_page"])
print(n_old)
n_new = len(df4[df4.landing_page == "new_page"])
print(n_new)
data = pd.DataFrame({"converted":[converted_old, converted_new],
                     "total":[n_old ,n_new]
                     })
display(data)

在这里插入图片描述

z_score, p_value = ssp.proportions_ztest(count=data.converted, nobs=data.total, alternative="smaller")
print("Z值为：", z_score)
print("P值为：", p_value)

Z值为： 1.3109241984234394
P值为： 0.9050583127590245
结果分析：通过上述的结果发现，P值为0.9，远大于0.05，也就是说，我们没有理由拒绝原假设，即只能接受原假设，也就是新老版本之间没有太大的差别。我们接下来要做的就是终止这次试验，继续优化自身的方案。

jackwang1780

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
数据分析AB-Test

import pandas as pdimport numpy as npimport randomimport matplotlib.pyplot as plt%matplotlib inline#We are setting the seed to assure you get the same answers on quizzes as we set uprandom.seed(42)df=pd.read_csv(r'D:\数据分析\ABTEST\ab_data.csv')df.he
复制链接

扫一扫

专栏目录