A/B Test Sensitivity Improvement by Using Post-Stratification

本文链接：https://blog.csdn.net/ebay/article/details/46548665

本文介绍了在线控制A/B测试中提高实验敏感性的一种方法——后分层技术。针对eBay市场中高价位商品购买者较少、低价位商品购买者众多的情况，这种长尾分布增加了检测A/B测试效果的噪声。通过使用后分层调整，可以减少方差，提高测试敏感性。该方法基于采样理论中的分层抽样，通过对用户进行分组，根据用户的历史购买行为预测其在实验期间的行为，从而降低方差并提高测试效率。在实践中，该技术已成功应用于eBay的实验平台，显著提高了测试效率和新功能的推出速度。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Author: WeiMin, Jason Wang

Summary

Online controlled A/B testing is a common practice for companies likeMicrosoft, Amazon, Google and Yahoo to evaluate the effectiveness of featuresimprovement. This business strategy is also widely used in eBay Searchscience, Merchandizing, Shipping and other domains to infer the causalrelationship between algorithm changesand financial gain. As the name implies, two equal size groups of user, onegroup is assigned to version A, usually the existing algorithm (called controlgroup), and the other is exposed to version B, the new algorithm (called treatmentgroup), while other variables are identical. Feature launch decision is made ifthe new algorithm significantly increases mean of Gross Mechanize Bought (GMB).

However, in ebay market, there are a small number of users who shophigh-end products, and a very large numbers of users who purchase low priceproducts, even make no purchase at all. This long – tailed distribution of GMB increasesthe magnitude of noise when detecting the treatment effect of A/B test. Toimprove the test sensitivity, variance reduction techniques such as capped mean and Toso-Tailhad been applied on mean of GMB estimation previously. This paper introducesanother variance reduction technique called post-stratification to furtherimprove test sensitivity.

Post-stratification inspires from stratification in sampling theory.Stratified sampling outperforms simple random sampling when units from the samestrata are similar to each other regarding with the interest of measurement. As users arrive over time in live trafficexperimentation, though it is impossible to sample user from a pre-formedstrata, sensitivity of hypothesis testing still benefits from stratificationafter data collection. This is called post-stratification adjustment. To adjust mean of GMB during experimentationperiod, users’ pre-experimentation period GMB are collected to bucket users.The underlying assumption is that users’ purchase behavior is predictable giventheir historical behavior, says, a frequent high-end purchaser beforeexperiment period is also likely to be a heavy buyer during experiment period.In implementation, to further improve the magnitude of variance reduction, GMBlift is decomposed to the combination GMB per purchaser lift and percentage ofpurchaser out of users lift, so that purchasers can be modeled separately. Thereason is obvious, it is difficult to track non-purchasers if they do not signin.

The Post-Stratified metrics GMB were rolled out on EP, a central placeof experimentation platform on Nov 2014. Since then, more experiments go from insignificant to significantand more new algorithms are launch-able. In sum, the post-stratificationadjusted metrics is a valuable improvement, which saves experimentationresources, speeds up testing pace and supports launching more profitable newfeatures on eBay.

Post-stratification VarianceReduction Technique

In this section, we will theoretically show that the variance isreduced using post-stratification adjustment. Let’s denote Y as the target, GMBin our case. is the sample size . X is the auxiliaryvariable that is known. t and c represent treatment group and control grouprespectively. And represents treatment effect. T test based onsample