Types of Research & Two Sample t, Randomization, and Bootstrapping Tests

Intro

When we get the data collected from our research and ask a research question with some hypothesis (triggered by some initial analysis of our data), we always want to use statistical tests to see if the data supports the hypothesis so that we can make an inference (statistical or non-statistical) about the population of our interest.
Reference: Andrew S. Zieffler, Jeffrey R. Harring, Jeffrey D. Long - Comparing Groups_ Randomization and Bootstrap Methods Using R (2011, Wiley)

Types of Research

  • It is worth noting that not all research can generalize its sample to some population, and this will prevent the researchers from making any inference about any population.
  • Morover, even if the research you did allows you to test your hypothesis pertaining to some population, you may not be able to make a statistical inference about the population of interest.
  • Below are two qualities we use to categorize researches.

Random Sampling (RS)

  • If the research uses random sampling, then we can make statistical inferences about the population the sample is drawn from.

Random Assignment (RA)

  • Some experiment may have to recruit volunteers to participate. Such situation and other possible situatuions would automatically render the research to have a nonrandom sample.
  • In such an experiment, to prevent the effect of all variables that are out of the researchers’ control in the sample, the researcher can randomly assign treatments (i.e.conditions) to the individuals in the voluntary sample so that no matter what might be these “confounding variables” in the sample , individuals in each conditional group will have, on average, equivalent attributes, characteristics, and variables other than the treatment itself. If a research solely uses random assignment, then causal (but not statistical) inference about the population (where the sample is logically drawn) is still allowed.

The Four Types of Research

  • Depending on whether the research satisfies the two qualities below (RS and RA), we have four types of research:
  1. RS only-Generalizable Research
  2. RA only- Randomized Experimental Research
  3. Both-Generalizable, Randomized Experimental Research
  4. None-Nongeneralizable, Nonrandomized Experimental Research

The goal of this note is to give a brief summary about some of the test methodologies pertaining to the first two types of researches mentioned above.

Four Methodologies of Hypothesis Test

Two Independent Sample t-test for Sample Mean

  • Also called unpaired t-test

Goal (When to use)

  • Applicable to both of the two types of research, but it can be used only when you have two samples from two independent groups (such as treated group vs. control group in a randomized experiment, or you draw two random samples each from a distinct population e.g. male or female; the two samples should be unrelated) and want to test if there is a significant difference between the two populations’ parameter (so you have to choose a sample statistic by which you attemp to differentiate the two populations).
  • It is a hypothesis test, and the null hypothesis is usually no difference in the mean values of the two populations (hypothesis is always about the population of interest).

Key value: p-value.

  • Small p value rejects the null hypothesis.

Conclusion

  • Statistical in nature; speaks about if there is any statistically significant differences between the average value of the two groups.

Conditions

  • This test relies on CLT (Central Limit Theorem) which states that the sampling distribution of the sample mean follows N ( m e a n = μ , s d = σ / n ) N(mean = \mu, sd=\sigma/\sqrt n) N(mean=μ,sd=σ/n ) (where μ \mu μ is the population mean, and σ \sigma σ is the population standard deviation) when the sample size is large enough. Therefore, it only works when we are interested in the difference in the the mean statistic (or any statistics derived from the sample sum like the sample proportion, which is essentially s u m   o f   t h e   n u m b e r   o f   1   o r   0 n = s a m p l e   s u m   o r   ( n − s a m p l e   s u m ) n \frac{\mathbb{sum~of~the~number~of~1~or~0}}{n}=\frac{\mathbb{sample~sum~or~(n-sample~sum)}}{n} nsum of the number of 1 or 0=nsample sum or (nsample sum)). This means if you are interested in the difference in sample statistics like variance or standard deviation, you cannot use this test.

Implementation in R

# This is not a rigorous example, just a snippet.
# R code for two independent sample t-test.
# a is a vector containing the data from sample 1, 
# b is the vector for sample 2.

# Assuming a and b are from two independent samples
# drawn from population A and B respectively, 
# we are interested in whether the mean value 
# for A is different from that of B.
a <- c(1,2,3,4,5)
b <- c(6,7,8)
t.test(x = a, y = b, paired = FALSE)

Randomization Test

Goal (When to use)

  • Applicable to Randomized Experimental Research. Here, we only consider experiments with two groups: treatment and control.
  • Typically, when your question is (or equilavent to) “whether some treatment has a certain effect” (the “effect” is measured by the value of the attribute you collected data on), and random assignment is used in the experiment, you can use the randomization test.
  • While if you do not have a random sample, this test won’t allow you to make any statistical inference about a population, you can always make casual inference (which is non-statistical) about some population, given that the population is where, logically, the sample is drawn.

Key Value: Empirical p-value

  • This value is obtained by computing the proportion of random permutations of the data that provide a result as extreme or more extreme than the one observed, and smaller p-values provide stronger evidence against the null hypothesis. (Zieffler, 123)
  • It represents the probability that, given the null hypothesis of no treatment effect, random assignment alone produces a result as extreme or more extrem than what we observed.

Conclusion

  • Based on whether the conclusion will be whether the treatment has the effect of interest pertain to some proper population. Again, this inference is not a statistical inference about the population of interest.

Conditions (explained)

  • It requires random assignment because it assumes exchangeability to hold. Exchangeability means for the data you obtained, every possible permutation is equally likely.
  • The merit of the randomization test is that if the assignment of treatment is truly random (which should be), then you can actually build a distribution of your statistic using only your experiment data, and this distribution simulates the sampling distribution under the null hypothesis. This way, you can test your observed statistic against this “null distribution”.
  • Why does this work? Well, if the null hypothesis of no treatment effect is true, then under the assumption of exchangeability, your observed result is no more than one of the possible results generated by the random assignment process itself (i.e. the values in each group are what they appreared to be not because they were affected by the treatment’s presence or absence, but merely because of the random grouping; since there is no treatment effect, you would always have observed these values given that you have picked these individuals as your sample), and since the assignment is random, we can group the same observed values into two groups in many other equally likely ways.
  • We then permute the data and group them into treatment and control groups for each permutation (notice the sizes of the two groups must be consistent to those in your experiment). From each pair, we find the statsitic of interest, then we repeat this process for many times (e.g. 10000 10000 10000 times) and form a distribution of the statistic.
  • Notice that “each permuation is equally likely” is equivalent to “each of the ‘compositions’ of the two groups is equally likely”; if your observed data is ( ( 1 , 2 ) , ( 3 , 4 ) ) ((1,2),(3,4)) ((1,2),(3,4)), a possible distinct “composition” of the two groups is ( ( 1 , 3 ) , ( 2 , 4 ) ) ((1,3),(2,4)) ((1,3),(2,4)), which is equivalent to ( ( 3 , 1 ) , ( 4 , 2 ) ) , ( ( 1 , 3 ) , ( 4 , 2 ) ) , ((3,1),(4,2)), ((1,3),(4,2)), ((3,1),(4,2)),((1,3),(4,2)), and ( ( 3 , 1 ) , ( 2 , 4 ) ) ((3,1),(2,4)) ((3,1),(2,4)). The random assignment process is the same as arranging n n n distinct objects(your n n n observed data points) into two groups: one has k k k objects and the other n − k n-k nk. The total number of ways to arrange these n n n objects is C ( n , k ) ⋅ k ! ⋅ C ( n − k , n − k ) ⋅ ( n − k ) ! = n ! C(n,k)\cdot k!\cdot C(n-k,n-k)\cdot (n-k)!=n! C(n,k)k!C(nk,nk)(nk)!=n!, where the LHS stands for all possible ‘composition’ the two group, and the RHS stands for the permutation of the data.

Implementation in R

  • Here we simulate the lady tasting tea experiment (R.A. Fisher). A lady claimed that a cup of tea tastes different depending on whether milk is added to the cup first. Therefore, the null hypothesis is derived: the lady does not have the ability to identify if milk or tea is added first. We prepare 8 cups of tea, in which in 4 cups milk is added first and the other 4 cups tea is added first and holding all other things equal. We present the 8 cups of tea to the lady randomly (i.e. in a random order) and tell her there are 4 cups in which milk is added first. The lady tastes the tea and identifies which 4 cups are “milk first”. We record the response as follows (the data is made up by myself):
tea <- matrix(c(3,1,1,3), 
	nrow = 2, ncol = 2, byrow = TRUE)
dimnames(tea) <- list(c("identified as milk first", "identified as tea first"), 
	c("Milk added first", "Tea added first"))
tea
# each column is a group: treatment or control
> tea
                         Milk added first Tea added first
identified as milk first                3               1
identified as tea first                 1               3
  • Firstly, we examine the random assignment condition: since we assigned the tea to the “milk” group and “tea” group randomly while controling the other non-varying attributes of the cups as identical, the condition for randomization test is satisfied.
  • Notice that the null hypothesis is that in both groups (“Milk added first” and “Tea added first”), the proportion of “identified as milk first” is the same. The alternative hypothesis should state that the proportion in the first group is larger (i.e. the lady (tends to) have the ability to identify which is added first).
# Firstly, we prepare the observed data. 
# Let 1 represents identifying as milk first
# and 0 represents identifying as tea first
lady_guess <- c(rep(1,3),rep(0,1),rep(1,1),rep(0,3))

# Then, we set a seed of 1 to do repeat
# random permutation of the data for 10000 times.
set.seed(1)
simulate <- rep(NA,10000)
for(i in seq_along(simulate)){
  sample <- sample(lady_guess)
  milk_first_group <- sample[1:4]
  tea_first_group <- sample[5:8]
  simulate[i] <- mean(milk_first_group) - mean(tea_first_group)
}
empirical_p <- mean(simulate >= obs_diff)
empirical_p
> empirical_p
[1] 0.2393

The p-value is large, which is weak evidence against the null hypothesis, so we fail to reject the null hypothesis, and the data suggests that the lady does not have the ability to identify which is added first.

Interlude-Sampling Distribution (of a certain statistic)

Idea

  • This is the foundation of statistical inference. The idea is to draw infinitely many random sample (i.e. all possible samples) of the same size and thus obtain all the possible values of the sample statistic of interest, and these values form a distribution.

Key use

  • Using such a distribution (if we can somehow obtain or simulate it), we can examine the center and variation of the statistic of interest.
  • We can also do hypothesis tests, which is essentially testing the probability, given the null hypothesis, that a random sampling process could produce an as extreme or more extreme sample.

Limit

  • CLT works only for the sample sum or any statistic derived from the sample sum (as mentioned above in t-test), so we can only assume the normal distribution when the statistic of interest is related to the sample sum. Other statistics rely on special distributions (like sample variance ∼ \sim Chi-Square), and sometimes we are not familiar with them.
  • This limit of the CLT motivates us to use bootstrapping to simulate the sampling distribution (rather than describing one using a closed-form mathematical formula).

Parametric Bootstrap Test

Goal (When to use)

  • Applicable when you want to carry out a hypothesis test but do not know which probability model to model the sampling distribution but can make assumptions about the population distribution in terms of its shape, mean, standard deviation, and restrictions on what values (integer, fraction, rounding, within what interval, etc.) can appear in the population.
  • Since you are making assumptions of these parameters of the population, the process is called “parametric”.
  • A common research question will be “is population A different from population B (when being measured by some parameter)”. The parameter of interest can be mean, proportion, variance, etc.
  • Here, we take the parametric bootstrap test for group difference as an example.

Key Value: Empirical p-value

  • Similar to the p-value in randomization test. If this is large, then there is weak evidence agains the null hypothesis that there is no difference in the two populations.

Conclusion

  • Based on the empirical p-value, you conclude whether to reject the null hypothesis. The inference you draw is statistical in this case since the research must satisfy random sampling.

Condition

  • Random sampling must be satisfied.
  • In parameteric bootstrap test, since the two observed samples need to be two random samples, we can safely assume that each sample is representative of some population.
  • We then parametrically describe a population distribution with the assumption that there is no difference between the two populations where the two samples were drawn (so that the two samples were all drawn from this parametrized population).
  • Then, we draw many pairs of random samples (of the corresponding sizes as our two observed groups of sample) from this “null distribution”, calculate the statistic of interest for each pair of sample and calculate the difference and form a sampling distribution of these sample differences. (This is called the bootstrap distribution of the test statistic of interest, where the test statistic is the sample difference in the sample statistic of interest.)
  • Finally, we can compare our observed difference against this “null distribution” to find the empirical p-value. Again, random sampling must be satisfied, or we will not be able to make inference about certain two populations.

Non-Parametric Bootstrap Test

  • This method is used when it is hard to make theortical assumptions of the population under the null hypothesis.
  • This test can be used to examine the validity of the parametric assumptions by checking if there is a large discrepancy between the results.
  • The only difference of non-parametric bootstrap test from the parametric one is that in the former, there is no parametric assumption made about the population under the null hypothesis. Instead, the observed data is thought as being representative of the population under the null hypothesis, so we can randomly sample with replacement the observed data to obtain another sample of the same size and think this sample just as another possible sample drawn from the population.
  • This way, we can form the sampling distribution of the test statistic of interest and draw the conclusion (similar to the parametric version)
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值