Author: WeiMin, Jason Wang
Summary
Online controlled A/B testing is a common practice for companies likeMicrosoft, Amazon, Google and Yahoo to evaluate the effectiveness of featuresimprovement. This business strategy is also widely used in eBay Searchscience, Merchandizing, Shipping and other domains to infer the causalrelationship between algorithm changesand financial gain. As the name implies, two equal size groups of user, onegroup is assigned to version A, usually the existing algorithm (called controlgroup), and the other is exposed to version B, the new algorithm (called treatmentgroup), while other variables are identical. Feature launch decision is made ifthe new algorithm significantly increases mean of Gross Mechanize Bought (GMB).
However, in ebay market, there are a small number of users who shophigh-end products, and a very large numbers of users who purchase low priceproducts, even make no purchase at all. This long – tailed distribution of GMB increasesthe magnitude of noise when detecting the treatment effect of A/B test. Toimprove the test sensitivity, variance reduction techniques such as capped mean and Toso-Tailhad been applied on mean of GMB estimation previously. This paper introducesanother variance reduction technique called post-stratification to furtherimprove test sensitivity.
Post-stratification inspires from stratification in sampling theory.Stratified sampling outperforms simple random sampling when units from the samestrata are similar to each other regarding with the interest of measurement. As users arrive over time in live trafficexperimentation, though it is impossible to sample user from a pre-formedstrata, sensitivity of hypothesis testing still benefits from stratificationafter data collection. This is called post-stratification adjustment. To adjust mean of GMB during experimentationperiod, users’ pre-experimentation period GMB are collected to bucket users.The underlying assumption is that users’ purchase behavior is predictable giventheir historical behavior, says, a frequent high-end purchaser beforeexperiment period is also likely to be a heavy buyer during experiment period.In implementation, to further improve the magnitude of variance reduction, GMBlift is decomposed to the combination GMB per purchaser lift and percentage ofpurchaser out of users lift, so that purchasers can be modeled separately. Thereason is obvious, it is difficult to track non-purchasers if they do not signin.
The Post-Stratified metrics GMB were rolled out on EP, a central placeof experimentation platform on Nov 2014. Since then, more experiments go from insignificant to significantand more new algorithms are launch-able. In sum, the post-stratificationadjusted metrics is a valuable improvement, which saves experimentationresources, speeds up testing pace and supports launching more profitable newfeatures on eBay.
Post-stratification VarianceReduction Technique
In this section, we will theoretically show that the variance isreduced using post-stratification adjustment. Let’s denote Y as the target, GMBin our case. is the sample size . X is the auxiliaryvariable that is known. t and c represent treatment group and control grouprespectively. And represents treatment effect. T test based onsample