【论文阅读】Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework.

一.论文信息

论文题目: Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework.【基于强化学习框架的A/B测试中的动态因果效应评估(Causal Effects Evaluation)】

发表年份: 2021

期刊/会议: Journal of the American Statistical Association(中科院SCI期刊1区,影响因子:4.369)

论文链接: https://www.tandfonline.com/doi/full/10.1080/01621459.2022.2027776

作者团队: Chengchun, Xiaoyu Wang, Shikai Luo, Hongtu Zhu, Jieping Ye, Rui Song

二.论文内容

Abstract

A/B testing, or online experiment is a standard business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries. Major challenges arise in online experiments of two-sided marketplace platforms (e.g., Uber) where there is only one unit that receives a sequence of treatments over time. In those experiments, the treatment at a given time impacts current outcome as well as future outcomes. The aim of this article is to introduce a reinforcement learning framework for carrying A/B testing in these experiments, while characterizing the long-term treatment effects. Our proposed testing procedure allows for sequential monitoring and online updating. It is generally applicable to a variety of treatment designs in different industries. In addition, we systematically investigate the theoretical properties (e.g., size and power) of our testing procedure. Finally, we apply our framework to both simulated data and a real-world data example obtained from a technological company to illustrate its advantage over the current practice. A Python implementation of our test is available at https://github.com/callmespring/CausalRL. Supplementary materials for this article are available online.

摘要

A/B 测试或在线实验是一种标准的商业策略,用于将制药、技术和传统行业的新产品与旧产品进行比较。主要挑战出现在双边市场平台(例如优步)的在线实验中,其中只有一个单位随着时间的推移接受一系列治疗。在这些实验中,给定时间的治疗会影响当前结果以及未来结果。本文的目的是介绍一种强化学习框架,用于在这些实验中进行 A/B 测试,同时描述长期治疗效果。本文提议的测试程序允许顺序监控和在线更新。普遍适用于不同行业的多种处理设计。此外,本文系统地研究了测试程序的理论特性(例如,尺寸和功率)。最后,将此框架应用于模拟数据和从一家技术公司获得的真实数据示例,以说明其相对于当前实践的优势。

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

北下关吴中生

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值