Medium:How to check the correctness of the AB test?


通常,type 1 error is more important!因此我们type 2 error就是在“委曲求全”:

The probability of Type II error can be adjusted to the desired value by changing the size of the groups or by reducing the variance in the data.

定律:The larger the group size, the lower the variance, the smaller the probability of Type II error. 


接下来进入checking for correctness的具体步骤:

Estimate the required group size:(code如下)

By conducting 1000 experiments and calculating the proportion of type II errors, we obtain a point estimate of the probability of type II error.

Then, using numerical synthetic A/A and A/B experiments, we will estimate error probabilities and construct confidence intervals.

根据输出结果:Estimates of error probabilities are approximately equal to 0.1 and 0.2, as they should be. Everything is correct, the Student’s test on this data works correctly.

接下来我们看另一个指标:Distribution of p-values,定义如下:

任何significance level都应该遵循上图和以下的情况:

Answer:NO NO NO NO NO!


We obtained an estimate of the probability of type I error of about 0.25, which is much higher than the significance level of 0.1. The graph shows that the distribution of p-values for synthetic A/A tests is not uniform and deviates from the diagonal. In this example, the Student’s t-test is incorrect because the data are dependent (the costs of purchases by one person are dependent). If we had not immediately realized the dependence of the data, the estimation of error probabilities would have helped us understand that such a test is incorrect.

最终的大总结:(acceptable probability & p-value)

  • 0
  • 0
    觉得还不错? 一键收藏
  • 0
To interpret the set of individual think-aloud results, you would typically follow a systematic process that involves the following steps: 1. Transcription: The first step is to transcribe the think-aloud sessions into a written format that can be easily analyzed. This involves listening to the recorded sessions and writing down the participant's verbalizations as accurately as possible. 2. Coding: The next step is to code the transcriptions by identifying themes or categories that represent the different aspects of the participant's performance. The coding scheme should be developed based on the research questions or objectives of the study. 3. Analysis: Once the coding scheme has been established, the transcriptions can be analyzed by reviewing each instance of a particular code and identifying patterns or trends. For example, if one of the codes is "comprehension," the analysis would involve reviewing all instances where the participant demonstrated good or poor comprehension skills. 4. Interpretation: The final step is to interpret the analysis by drawing conclusions from the data. This involves identifying the strengths and weaknesses of the participants based on the patterns observed in the data. For example, if the research question is to evaluate the effectiveness of a new educational software program, the coding scheme might include categories such as "correctness of responses," "time to complete tasks," and "ease of use." The analysis would involve reviewing each instance of these categories and identifying patterns in the data. The interpretation would then involve drawing conclusions about the strengths and weaknesses of the software program based on the patterns observed in the data. Overall, interpreting think-aloud results requires a careful and systematic approach that involves transcription, coding, analysis, and interpretation. By following this process, researchers can gain valuable insights into the cognitive processes of participants and identify areas for improvement in educational or training programs.




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


