Chapter 6-7

2 k 2^k 2k Factorial Design

Such design provides the smallest number of runs in which k k k factors can be studied in a complete factorial design.

Assumptions

  • In each replicate of the design, the experiment runs are completely randomized
  • The factors are fixed
  • Residuals (as estimates of random erros) are normally distributed with mean = 0 and constant variance σ 2 \sigma^2 σ2(can be checked by normal qq plot (“QQ” refers to Sample Quantiles vs Theoretical Quantiles), residual-vs-fitted-value plot, and sqr-root-of-abs-val-of-standardized-residuals-vs-fitted-value respectively). Response is also assumed to be normally distributed.
  • Because there are only two levels per factor, assume the response is linear over the range of the chosen levels

Notations & Definitions

  • k \bf k k denotes the number of (basic) factors.
  • n \bf n n denotes the number of replicate for each treatment (equivalently, n n n means how many times the same experiment deign is run).
  • p \bf p p denotes the number of generators
  • m = k + p \bf{m=k+p} m=k+p
  • A , B , C , . . . \bf{A,B,C,...} A,B,C,... (capital letters): the factorial effects of our interest
  • A B , A B C , . . . \bf {AB,ABC,...} AB,ABC,...: interaction effects among factors; we include these columns in the design table merely for calculation and analysis purpose, and the design of the experiment do not explicitly consider these factors.
  • ( 1 ) \bf {(1)} (1): the sum of treatment responses across all n n n replicates of the treatment where all factors are at low level
  • a \bf a a: the sum of treatment responses across all n n n replicates of the treatment where factor A A A is at high level and other factors are at low level
  • a b \bf {ab} ab: the sum of treatment responses across all n n n replicates of the treatment where factors A A A and B B B are at high level and other factors are at low level
  • run size= 2 k 2^k 2k This means the number of experimental runs in each complete replicate of the same design. This is also the number of treatments in the design.
  • σ 2 \sigma^2 σ2 the (assumed) constant variance of response values, which is also the variance of the random errors.

Contrast

  • Also called the total effect (as opposed to main effect, which the the average of the total effect).
  • Can be obtained from the table of contrast coefficients.
  • A contrast is a linear combination of parameters or statistics whose coefficients add up to zero, allowing comparison of different treatments. In 2 k 2^k 2k factorial designs, the “statistic” will be the sum of all n n n replicates of the same treatment, and the coefficients are ± 1 \pm1 ±1.
  • In general, contrast A B . . . . K = ( a ± 1 ) ( b ± 1 ) . . . . ( k ± 1 ) \text{contrast}_{AB....K}=(a\pm1)(b\pm1)....(k\pm1) contrastAB....K=(a±1)(b±1)....(k±1) where if the factor is included in the factorial effect A B . . . K AB...K AB...K, take the negative sign (O.W. take the positive sign). In the expansion, 1 1 1 is replaced by ( 1 ) (1) (1). For example, in A , B , C A,B,C A,B,C design, contrast(AB) = ( a − 1 ) ( b − 1 ) ( c + 1 ) \text{contrast(AB)}=(a-1)(b-1)(c+1) contrast(AB)=(a1)(b1)(c+1)

Total Number of Factorial Effects

  • This is given by C ( k , k ) + C ( k , k − 1 ) + . . . + C ( k , 1 ) = 2 k − 1 C(k,k)+C(k,k-1)+...+C(k,1)=2^k-1 C(k,k)+C(k,k1)+...+C(k,1)=2k1 (from binomial theorem, 1 = C ( k , 0 ) 1=C(k,0) 1=C(k,0)).

Factorial effects

  • Factorial effect: main effects or interaction effects can all be called factorial effects. Any factorial effect can also be given by θ = μ + ‾ − μ − ‾ [ For effect estimates:  θ ^ = y + ‾ − y − ‾ ] \theta = \overline {\mu_+} -\overline {\mu_-}[\text{For effect estimates: }\hat \theta = \overline {y_+}-\overline {y_-} ] θ=μ+μ[For effect estimates: θ^=y+y] where μ + ‾ \overline {\mu_+} μ+ is the average of the treatment means corresponding to the treatments with the factor at high level and μ − ‾ \overline {\mu_-} μ corresponds to that at the low level. This calculation fits how we interpret the effect.
  • Main effect (of A A A; this term only applies to non-interation factors): the average effect of a factor ( A A A) over all conditions of the other factors. Interpreted as the change in the mean response caused by changing factor A A A from low level to high level. θ ^ A = 1 2 k − 1 ⋅ n ⋅ ( contrast(A ) ) {\hat \theta_A=\frac{1}{2^{k-1}\cdot n}\cdot (\text{contrast(A}))} θ^A=2k1n1(contrast(A))
  • Interaction effect (of A B AB AB; this term only applies to interaction factors): θ ^ A B = 1 2 k − 1 ⋅ n ⋅ ( contrast(AB ) ) {\hat \theta_{AB}=\frac{1}{2^{k-1}\cdot n}\cdot (\text{contrast(AB}))} θ^AB=2k1n1(contrast(AB)) This can be interpreted as the average difference between A A A's effect at B + B+ B+ and A A A's effect at B − B- B
  • For example, in a complete design with k = 3 k=3 k=3 factors A , B , C A,B,C A,B,C, θ ^ A = a + a b + a b c + a c 2 3 − 1 ⋅ n − ( ( 1 ) + b + c + b c ) 2 3 − 1 ⋅ n \hat \theta_A=\frac{a+ab+abc+ac}{2^{3-1}\cdot n }-\frac{((1)+b+c+bc)}{2^{3-1}\cdot n} θ^A=231na+ab+abc+ac231n((1)+b+c+bc)
    Notice that the 2 k − 1 2^{k-1} 2k1 in the denominator is due to the fact that there are 2 k 2^k 2k treatments in a complete replicate, and for every factor in the design table, half of the treatments always corresponding to the factor’s + + + signs, and the other half the − - signs, so we cut the treatments in half if we just take one sign. The n n n in the denominator is due to the fact that the notations a , b , c , . . . , etc. a,b,c,...,\text {etc.} a,b,c,...,etc. all represent treatment sums, and we need to take the average of each to get the treatment means.

Linear Regression Model

  • We can use linear regression model to obtain factorial effects
  • If you fit a linear model, then the estimates for the slopes will be half of the estimates for the corresponding factorial effect i.e. β ^ = θ ^ / 2 \hat \beta = \hat \theta /2 β^=θ^/2.
  • The estimate for the intercept will be the grand average of the treatment response, which is given by β ^ 0 = ∑ response 2 k ⋅ n \hat \beta_0= \frac{\sum \text {response}}{2^k\cdot n} β^0=2knresponse, and notice that the coefficient vector for the intercept only contains 1 1 1 (no − 1 -1 1), so in fact, it is aliasing with quadratic term(s), say x 2 x^2 x2, in the model, which corresponds to a column of all 1 1 1's the design table (i.e. the I I I column).

Calculating Sum of Squares

  • S S f a c = 1 2 k ⋅ n ( contrast(fac) 2 ) SS_{fac}=\frac{1}{2^k\cdot n}(\text{contrast(fac)}^2) SSfac=2kn1(contrast(fac)2) where “ f a c fac fac” can be A , B , C , . . . A,B,C,... A,B,C,... or interactions A B , B C , . . . AB,BC,... AB,BC,....
  • The total sum of squares, S S T SS_T SST, is calculated by ∑ i = 1 2 k ⋅ n ( the  i th run’s response - the grand average ) 2 2 k ⋅ n \frac{\sum_{i=1}^{2^k\cdot n}(\text {the } i\text{th run's response - the grand average})^2}{2^k\cdot n} 2kni=12kn(the ith run’s response - the grand average)2 where the grand average is ∑ response 2 k n \frac{\sum \text {response}}{2^kn} 2knresponse

Orthogonality

  • In any 2 k 2^k 2k factorial design, due to how the treatments are arranged, the contrasts for each of all factorial effects (including all main effects and all interaction effects) are orthogonal to each other [i.e. the dot product of the coefficient vectors is 0 0 0 for every pair of contrasts of factorial effects].
  • For example, in a 2 2 2^2 22 design, the contrasts of effects of A , B , A B A,B,AB A,B,AB are pairwise orthogonal.

Conclusion

  • To draw conclusion and/or give suggestion based on the analysis, we usually use main effect or interaction plots.

Complete Designs: Two Cases in Total

  • Two cases: n ≥ 2 n≥2 n2 or n = 1 n=1 n=1 (single replicated)

The n ≥ 2 n≥2 n2 Case, with or without Blocks

ANOVA Approach
  • In the ANOVA table, there are two sources of variation: variation within-group (the Error) and the variation between-group (the 2 k − 1 2^k-1 2k1 factorial effects, A , B , C , . . . A,B,C,... A,B,C,...etc.)
  • d f df df: = 1 =1 =1 for each of the factorial effects; = 2 k n − 1 =2^{k}n-1 =2kn1 for total S S SS SS
  • M S MS MS: S S / d f SS/df SS/df
  • F : M S f a c / M S e r o r F:MS_{fac}/MS_{eror} F:MSfac/MSeror
  • Pay attention to Effect Hierarchy Principle: if you find an interaction effect significant, you must also include all the non-interaction factors involved in the interaction in your final model.
Reduced Model
  • Based on F F F values, we delete the insignificant factorial effects from the model to obtain a reduced model (i.e. we consider the S S SS SS of the insignificant factors to be mere error). This model can help us verify the result of the ANOVA approach.
What If We Do Have Blocks?
  • We only discuss the case where the number of blocks is equal to the number of replicates i.e. #blocks = n = n =n, and n ≥ 2 n≥2 n2, so in each block, there will be a full replicate of 2 k 2^k 2k runs (i.e. the block size is equal to the run size).
  • In this case, the only extra thing we will have in the ANOVA table is the S S SS SS for the blocking effect. This number is calculated by ∑ i = 1 n B i 2 2 k − Grand Avg 2 k ⋅ n \sum_{i=1}^n\frac{B_i^2}{2^k}-\frac{\text{Grand Avg}}{2^k\cdot n} i=1n2kBi22knGrand Avg, and the degree of freedom for the blocking effect is n − 1 n-1 n1, and we do not care about the blocking effect’s F F F value (so in any cases, we don’t include a blocking effect into the linear model).
  • If we “blocked” the experiment but did not actually blocked the design when analyzing the data, M S e r r o r MS_{error} MSerror and/or the M S MS MS for other factors will increase, potentially causing us fail to identify significant effects.

The n = 1 n=1 n=1 (Single Replicate) Case: with or without Blocks

  • In this case, there is in no d f df df for the error part in the ANOVA Table because the d f df df for S S T SS_T SST is too small given the single replicate, so we cannot obtain an estimate for σ 2 \sigma^2 σ2 (i.e. cannot divide by 0) where σ 2 \sigma^2 σ2 is the population variance i.e. the variance of the random error.
3 Ways to Obtain Estimates for σ 2 \sigma^2 σ2
  • Negligible Effects
    • In this method, we firstly choose the negligible effects, and then we estimate τ 2 \tau^2 τ2 by the variance of the estimates of these negligible factorial effects assuming a mean of 0 0 0. τ 2 = ( s e ( θ ^ ) ) 2 = 2 σ ^ 2 N \tau^2=(se(\hat\theta))^2=\frac{2\hat\sigma^2}{N} τ2=(se(θ^))2=N2σ^2 where N = 2 k N=2^k N=2k.
    • The confidence interval is constructed using t t t statistic where the degree of freedom is the number of negligible effects (see hw4 problem 2, also chapter 6 p93 for using Combination formula to calculate d f df df)
    • t = θ ^ j − 0 τ t=\frac{\hat \theta_j-0}{\tau} t=τθ^j0 (the denominator, is not τ 2 \tau^2 τ2, you need to take the square root!)
  • Lenth’s Method
    • The assumption is based on the Sparsity of Effects Principle; that is, most systems are dominated by some of the main effects and low-order interactions, and most high-order interaction effects out of a total of 2 k − 1 2^k-1 2k1 effects are not significant. As a result, most of the effect estimates satisfies θ ^ ∼ i i d N ( 0 , τ 2 ) \hat \theta \sim ^{iid} N(0,\tau^2) θ^iidN(0,τ2)
    • The initial estimate: τ ~ 0 = 1.5 ⋅ m e d i a n ( ∣ θ ^ j ∣ ) \tilde \tau_0=1.5\cdot median(|\hat \theta_j|) τ~0=1.5median(θ^j) where j = 1 , . . . , 2 k − 1 j=1,...,2^k-1 j=1,...,2k1. This estimate is based on the assumption that all the factorial effects are negligible i.e. N ( 0 , τ 2 ) N(0,\tau^2) N(0,τ2)
    • Final Estimate: τ ~ = 1.5 ⋅ m e d i a n ( ∣ θ ^ j ∣ : ∣ θ ^ j ∣ < 2.5 τ 0 ~ ) , j = 1 , . . . , 2 k − 1 \tilde \tau =1.5\cdot median(|\hat \theta_j|:|\hat \theta_j|<2.5\tilde {\tau_0}),j=1,...,2^k-1 τ~=1.5median(θ^j:θ^j<2.5τ0~),j=1,...,2k1 This is an attempt to remove a few non zero θ ^ j \hat \theta_j θ^j's.
    • The degree of freedom of t 0 t_0 t0, when conducting a t t t test or constructing confidence intervals for effect estimates, we have d f = 2 k − 1 − a 3 df=\frac{2^k-1-a}{3} df=32k1a for Lenth’s Method where a : = number of factorial effects that are cut out by the inequality a:=\text{number of factorial effects that are cut out by the inequality} a:=number of factorial effects that are cut out by the inequality
    • median is used because it is robust against potential extreme values of the effect estimates.
    • t = θ ^ j τ t=\frac{\hat \theta_j}{\tau} t=τθ^j. Assuming a two tailed test, if t > t 0 t>t_0 t>t0, then we reject the null hypothesis and conclude that the factorial effect corresponding to j j j is significant.
  • Normal Probability Plot
    • Factorial effects along the straight line are not significant.
Design Projection & Hidden Replication
  • This can be done after you have found out which factors are significant and which are not. By leaving out some factorial effects, you gain more replicate for the significant effect, and now you can have an estimate of error based on such “hidden replication”.
What If Blocking?
  • Motivation: The number of experimental runs that can be completely each time (or within each block) is limited i.e. each time we do the experiment, we cannot complete a full replicate.
  • Suppose we have k k k factors, and we block it into 2 p , p < k 2^p,p<k 2p,p<k blocks i.e. the block size is smaller than 2 k 2^k 2k.
  • Notice that 2 number of blocking factors = number of blocks 2^{\text {number of blocking factors}} = \text{number of blocks} 2number of blocking factors=number of blocks
Procedure
  • Pay attention to the effect hierarchical principle: you must choose the factorial effect(s) of the highest order(s) to be confounded with the blocking effect. For example, if you only need to split your experiment runs into two blocks, you choose the factorial effect of the highest order.
  • Pick the value of p p p depending on the available resource and construct the design table for the complete 2 k 2^k 2k factorial design
  • Choose the columns in the design table whose signs will be used to block the experimental runs . If p = 1 p=1 p=1, then we just choose one column.
  • Collect data and construct ANOVA table as if we are doing a non-blocking single replicate design. However, the factorial effects that are used to block the design are confounded with the blocking effect i.e. θ ^ chosen fac = β ^ + θ ^ chosen fac \hat \theta_{\text {chosen fac}}=\hat \beta +\hat \theta_{\text {chosen fac}} θ^chosen fac=β^+θ^chosen fac. Alternatively, Block Effect = y ˉ b l o c k 1 − y ˉ b l o c k 2 \text{Block Effect}=\bar y_{block1} -\bar y_{block2} Block Effect=yˉblock1yˉblock2
p ≥ 4 p≥4 p4 Blocks
  • Be careful with how many factorial effects are confounded: the number of confounded factorial effects = 2 p − 1 2^p-1 2p1 where p p p is the number of chosen blocking factors.
Checking whether a blocking strategy is correct
  • The way to check this is to write out the full blocking strategy to see if we get any column of 1 1 1's. If so then this strategy is not correct. For a strategy to be correct, the chosen blocking factors need to be independent
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值