Chapter 6-7

最新推荐文章于 2023-11-11 16:32:23 发布

The Well-Built City

最新推荐文章于 2023-11-11 16:32:23 发布

阅读量205

点赞数

分类专栏： Statistics Misc

本文链接：https://blog.csdn.net/Bill_Wang_01/article/details/117654798

版权

Statistics Misc 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

Factorial Design

$2^k$ Factorial Design

Such design provides the smallest number of runs in which $k$ factors can be studied in a complete factorial design.

Assumptions

In each replicate of the design, the experiment runs are completely randomized
The factors are fixed
Residuals (as estimates of random erros) are normally distributed with mean = 0 and constant variance $\sigma^2$ (can be checked by normal qq plot (“QQ” refers to Sample Quantiles vs Theoretical Quantiles), residual-vs-fitted-value plot, and sqr-root-of-abs-val-of-standardized-residuals-vs-fitted-value respectively). Response is also assumed to be normally distributed.
Because there are only two levels per factor, assume the response is linear over the range of the chosen levels

Notations & Definitions

$\bf k$ denotes the number of (basic) factors.
$\bf n$ denotes the number of replicate for each treatment (equivalently, $n$ means how many times the same experiment deign is run).
$\bf p$ denotes the number of generators
$\bf{m=k+p}$
$\bf{A,B,C,...}$ (capital letters): the factorial effects of our interest
$\bf {AB,ABC,...}$ : interaction effects among factors; we include these columns in the design table merely for calculation and analysis purpose, and the design of the experiment do not explicitly consider these factors.
$\bf {(1)}$ : the sum of treatment responses across all $n$ replicates of the treatment where all factors are at low level
$\bf a$ : the sum of treatment responses across all $n$ replicates of the treatment where factor $A$ is at high level and other factors are at low level
$\bf {ab}$ : the sum of treatment responses across all $n$ replicates of the treatment where factors $A$ and $B$ are at high level and other factors are at low level
run size= $2^k$ This means the number of experimental runs in each complete replicate of the same design. This is also the number of treatments in the design.
$\sigma^2$ the (assumed) constant variance of response values, which is also the variance of the random errors.

Contrast

Also called the total effect (as opposed to main effect, which the the average of the total effect).
Can be obtained from the table of contrast coefficients.
A contrast is a linear combination of parameters or statistics whose coefficients add up to zero, allowing comparison of different treatments. In $2^k$ factorial designs, the “statistic” will be the sum of all $n$ replicates of the same treatment, and the coefficients are $\pm1$ .
In general, $\text{contrast}_{AB....K}=(a\pm1)(b\pm1)....(k\pm1)$ where if the factor is included in the factorial effect $A B . . . K$ , take the negative sign (O.W. take the positive sign). In the expansion, $1$ is replaced by $(1)$ . For example, in $A, B, C$ design, $\text{contrast(AB)}=(a-1)(b-1)(c+1)$

Total Number of Factorial Effects

This is given by $C(k,k)+C(k,k-1)+...+C(k,1)=2^k-1$ (from binomial theorem, $1 = C (k, 0)$ ).

Factorial effects

Factorial effect: main effects or interaction effects can all be called factorial effects. Any factorial effect can also be given by $\theta = \overline {\mu_+} -\overline {\mu_-}[\text{For effect estimates: }\hat \theta = \overline {y_+}-\overline {y_-} ]$ where $\overline {\mu_+}$ is the average of the treatment means corresponding to the treatments with the factor at high level and $\overline {\mu_-}$ corresponds to that at the low level. This calculation fits how we interpret the effect.
Main effect (of $A$ ; this term only applies to non-interation factors): the average effect of a factor ( $A$ ) over all conditions of the other factors. Interpreted as the change in the mean response caused by changing factor $A$ from low level to high level. ${\hat \theta_A=\frac{1}{2^{k-1}\cdot n}\cdot (\text{contrast(A}))}$
Interaction effect (of $A B$ ; this term only applies to interaction factors): ${\hat \theta_{AB}=\frac{1}{2^{k-1}\cdot n}\cdot (\text{contrast(AB}))}$ This can be interpreted as the average difference between $A$ 's effect at $B +$ and $A$ 's effect at $B -$
For example, in a complete design with $k = 3$ factors $A, B, C$ , $\hat \theta_A=\frac{a+ab+abc+ac}{2^{3-1}\cdot n }-\frac{((1)+b+c+bc)}{2^{3-1}\cdot n}$
Notice that the $2^{k-1}$ in the denominator is due to the fact that there are $2^k$ treatments in a complete replicate, and for every factor in the design table, half of the treatments always corresponding to the factor’s $+$ signs, and the other half the $-$ signs, so we cut the treatments in half if we just take one sign. The $n$ in the denominator is due to the fact that the notations $a,b,c,...,\text {etc.}$ all represent treatment sums, and we need to take the average of each to get the treatment means.

Linear Regression Model

We can use linear regression model to obtain factorial effects
If you fit a linear model, then the estimates for the slopes will be half of the estimates for the corresponding factorial effect i.e. $\hat \beta = \hat \theta /2$ .
The estimate for the intercept will be the grand average of the treatment response, which is given by $\hat \beta_0= \frac{\sum \text {response}}{2^k\cdot n}$ , and notice that the coefficient vector for the intercept only contains $1$ (no $- 1$ ), so in fact, it is aliasing with quadratic term(s), say $x^2$ , in the model, which corresponds to a column of all $1$ 's the design table (i.e. the $I$ column).

Calculating Sum of Squares

$SS_{fac}=\frac{1}{2^k\cdot n}(\text{contrast(fac)}^2)$ where “ $f a c$ ” can be $A, B, C, . . .$ or interactions $A B, B C, . . .$ .
The total sum of squares, $SS_T$ , is calculated by $\frac{\sum_{i=1}^{2^k\cdot n}(\text {the } i\text{th run's response - the grand average})^2}{2^k\cdot n}$ where the grand average is $\frac{\sum \text {response}}{2^kn}$

Orthogonality

In any $2^k$ factorial design, due to how the treatments are arranged, the contrasts for each of all factorial effects (including all main effects and all interaction effects) are orthogonal to each other [i.e. the dot product of the coefficient vectors is $0$ for every pair of contrasts of factorial effects].
For example, in a $2^2$ design, the contrasts of effects of $A, B, A B$ are pairwise orthogonal.

Conclusion

To draw conclusion and/or give suggestion based on the analysis, we usually use main effect or interaction plots.

Complete Designs: Two Cases in Total

Two cases: $n \geq 2$ or $n = 1$ (single replicated)

The $n \geq 2$ Case, with or without Blocks

ANOVA Approach

In the ANOVA table, there are two sources of variation: variation within-group (the Error) and the variation between-group (the $2^k-1$ factorial effects, $A, B, C, . . .$ etc.)
$d f$ : $= 1$ for each of the factorial effects; $2^{k}n-1$ for total $S S$
$M S$ : $S S / d f$
$F:MS_{fac}/MS_{eror}$
Pay attention to Effect Hierarchy Principle: if you find an interaction effect significant, you must also include all the non-interaction factors involved in the interaction in your final model.

Reduced Model

Based on $F$ values, we delete the insignificant factorial effects from the model to obtain a reduced model (i.e. we consider the $S S$ of the insignificant factors to be mere error). This model can help us verify the result of the ANOVA approach.

What If We Do Have Blocks?

We only discuss the case where the number of blocks is equal to the number of replicates i.e. #blocks $= n$ , and $n \geq 2$ , so in each block, there will be a full replicate of $2^k$ runs (i.e. the block size is equal to the run size).
In this case, the only extra thing we will have in the ANOVA table is the $S S$ for the blocking effect. This number is calculated by $\sum_{i=1}^n\frac{B_i^2}{2^k}-\frac{\text{Grand Avg}}{2^k\cdot n}$ , and the degree of freedom for the blocking effect is $n - 1$ , and we do not care about the blocking effect’s $F$ value (so in any cases, we don’t include a blocking effect into the linear model).
If we “blocked” the experiment but did not actually blocked the design when analyzing the data, $MS_{error}$ and/or the $M S$ for other factors will increase, potentially causing us fail to identify significant effects.

The $n = 1$ (Single Replicate) Case: with or without Blocks

In this case, there is in no $d f$ for the error part in the ANOVA Table because the $d f$ for $SS_T$ is too small given the single replicate, so we cannot obtain an estimate for $\sigma^2$ (i.e. cannot divide by 0) where $\sigma^2$ is the population variance i.e. the variance of the random error.

3 Ways to Obtain Estimates for $\sigma^2$

Negligible Effects
- In this method, we firstly choose the negligible effects, and then we estimate $\tau^2$ by the variance of the estimates of these negligible factorial effects assuming a mean of $0$ . $\tau^2=(se(\hat\theta))^2=\frac{2\hat\sigma^2}{N}$ where $N=2^k$ .
- The confidence interval is constructed using $t$ statistic where the degree of freedom is the number of negligible effects (see hw4 problem 2, also chapter 6 p93 for using Combination formula to calculate $d f$ )
- $t=\frac{\hat \theta_j-0}{\tau}$ (the denominator, is not $\tau^2$ , you need to take the square root!)
Lenth’s Method
- The assumption is based on the Sparsity of Effects Principle; that is, most systems are dominated by some of the main effects and low-order interactions, and most high-order interaction effects out of a total of $2^k-1$ effects are not significant. As a result, most of the effect estimates satisfies $\hat \theta \sim ^{iid} N(0,\tau^2)$
- The initial estimate: $\tilde \tau_0=1.5\cdot median(|\hat \theta_j|)$ where $j=1,...,2^k-1$ . This estimate is based on the assumption that all the factorial effects are negligible i.e. $N(0,\tau^2)$
- Final Estimate: $\tilde \tau =1.5\cdot median(|\hat \theta_j|:|\hat \theta_j|<2.5\tilde {\tau_0}),j=1,...,2^k-1$ This is an attempt to remove a few non zero $\hat \theta_j$ 's.
- The degree of freedom of $t_0$ , when conducting a $t$ test or constructing confidence intervals for effect estimates, we have $df=\frac{2^k-1-a}{3}$ for Lenth’s Method where $a:=\text{number of factorial effects that are cut out by the inequality}$
- median is used because it is robust against potential extreme values of the effect estimates.
- $t=\frac{\hat \theta_j}{\tau}$ . Assuming a two tailed test, if $t>t_0$ , then we reject the null hypothesis and conclude that the factorial effect corresponding to $j$ is significant.
Normal Probability Plot
- Factorial effects along the straight line are not significant.

Design Projection & Hidden Replication

This can be done after you have found out which factors are significant and which are not. By leaving out some factorial effects, you gain more replicate for the significant effect, and now you can have an estimate of error based on such “hidden replication”.

What If Blocking?

Motivation: The number of experimental runs that can be completely each time (or within each block) is limited i.e. each time we do the experiment, we cannot complete a full replicate.
Suppose we have $k$ factors, and we block it into $2^p,p<k$ blocks i.e. the block size is smaller than $2^k$ .
Notice that $2^{\text {number of blocking factors}} = \text{number of blocks}$

Procedure

Pay attention to the effect hierarchical principle: you must choose the factorial effect(s) of the highest order(s) to be confounded with the blocking effect. For example, if you only need to split your experiment runs into two blocks, you choose the factorial effect of the highest order.
Pick the value of $p$ depending on the available resource and construct the design table for the complete $2^k$ factorial design
Choose the columns in the design table whose signs will be used to block the experimental runs . If $p = 1$ , then we just choose one column.
Collect data and construct ANOVA table as if we are doing a non-blocking single replicate design. However, the factorial effects that are used to block the design are confounded with the blocking effect i.e. $\hat \theta_{\text {chosen fac}}=\hat \beta +\hat \theta_{\text {chosen fac}}$ . Alternatively, $\text{Block Effect}=\bar y_{block1} -\bar y_{block2}$

$p \geq 4$ Blocks

Be careful with how many factorial effects are confounded: the number of confounded factorial effects = $2^p-1$ where $p$ is the number of chosen blocking factors.

Checking whether a blocking strategy is correct

The way to check this is to write out the full blocking strategy to see if we get any column of $1$ 's. If so then this strategy is not correct. For a strategy to be correct, the chosen blocking factors need to be independent

The Well-Built City

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Chapter 6-7

2k2^k2k Factorial DesignSuch design provides the smallest number of runs in which kkk factors can be studied in a complete factorial design.AssumptionsIn each replicate of the design, the experiment runs are completely randomizedThe factors are fixed
复制链接

扫一扫