(Section 10.2)
- X X X: pretreatment covariate
- D D D: binary treatment indicator
- Y Y Y: observed outcome with two potential outcome
- Samples: { X , D , Y ( 1 ) , Y ( 0 ) } \{X, D, Y(1), Y(0) \} {X,D,Y(1),Y(0)}
Casual effects of interest
【下面这三个都是想要的东西】
1. Average Causal Effect (ACE)
τ
=
E
{
Y
(
1
)
−
Y
(
0
)
}
\tau = E\{Y(1) - Y(0) \}
τ=E{Y(1)−Y(0)}
2. Average Causal Effect on the treated units
τ
T
=
E
{
Y
(
1
)
−
Y
(
0
)
∣
D
=
1
}
\tau_{T} = E\{Y(1) - Y(0)|D=1 \}
τT=E{Y(1)−Y(0)∣D=1}
3. Average Causal Effect on the control units
τ
C
=
E
{
Y
(
1
)
−
Y
(
0
)
∣
D
=
0
}
\tau_{C} = E\{Y(1) - Y(0)|D=0 \}
τC=E{Y(1)−Y(0)∣D=0}
- 能观测到的有: E ( Y ( 1 ) ∣ D = 1 ) E(Y(1)|D=1) E(Y(1)∣D=1)和 E ( Y ( 0 ) ∣ D = 0 ) E(Y(0)|D=0) E(Y(0)∣D=0)
- 观测不到的有: E ( Y ( 0 ) ∣ D = 1 ) E(Y(0)|D=1) E(Y(0)∣D=1)和 E ( Y ( 1 ) ∣ D = 0 ) E(Y(1)|D=0) E(Y(1)∣D=0),这些是反事实
【由于反事实的存在,上面这些都没法直接算】
【那最直观的考虑,不管反事实呢,就只用观测到的那两组结果】
初步因果效应(prima facie causal effect)
(它好算,但是有偏)
τ
P
F
=
E
{
Y
∣
D
=
1
}
−
E
{
Y
∣
D
=
0
}
=
E
(
Y
(
1
)
∣
D
=
1
)
−
E
(
Y
(
0
)
∣
D
=
0
)
\tau_{PF} = E\{Y|D=1 \} - E\{Y|D=0 \} \\ = E(Y(1)|D=1) - E(Y(0)|D=0)
τPF=E{Y∣D=1}−E{Y∣D=0}=E(Y(1)∣D=1)−E(Y(0)∣D=0)
选择偏差(selection bias)
τ
P
F
−
τ
T
=
E
(
Y
(
0
)
∣
D
=
1
)
−
E
(
Y
(
0
)
∣
D
=
0
)
\tau_{PF} - \tau_{T} = E(Y(0)|D=1) - E(Y(0)|D=0)
τPF−τT=E(Y(0)∣D=1)−E(Y(0)∣D=0)
τ P F − τ C = E ( Y ( 1 ) ∣ D = 1 ) − E ( Y ( 1 ) ∣ D = 0 ) \tau_{PF} - \tau_{C} = E(Y(1)|D=1) - E(Y(1)|D=0) τPF−τC=E(Y(1)∣D=1)−E(Y(1)∣D=0)
They measure the differences in the means of the potential outcomes across the treatment and control groups.
【如果是随机化实验的话, D ⊥ { Y ( 1 ) , Y ( 0 ) } D\bot\{Y(1), Y(0)\} D⊥{Y(1),Y(0)},那么就有 τ = τ T = τ C = τ P F \tau = \tau_{T} = \tau_{C} = \tau_{PF} τ=τT=τC=τPF,那就不需要因果效应了,就容易多了】
随机化实验的好处:the fundamental benefit of randomization is to balance the distributions of the potential outcomes across the treatment and control groups, which is more important than to balance the distributions of the observed covariates. (翻译:平衡治疗组和对照组之间潜在结果的分布)
需要一大堆基本假设,SUVTA那些
ignorability: Y ( d ) ⊥ D ∣ X Y(d)\bot D | X Y(d)⊥D∣X
两种简单方法,但是有limitation
利用离散协变量分层处理
outcome regression
run the OLS with an additive model of the observed outcome on the treatment indicator and covariates
E
(
Y
∣
D
,
X
)
=
β
0
+
β
d
D
+
β
x
X
E(Y|D,X) = \beta_0 + \beta_d D + \beta_x X
E(Y∣D,X)=β0+βdD+βxX
如果上述model是correctly specified的,
τ
(
X
)
=
E
(
Y
∣
D
=
1
,
X
)
−
E
(
Y
∣
D
=
0
,
X
)
=
(
β
0
+
β
d
+
β
x
X
)
−
(
β
0
+
β
x
X
)
=
β
d
\tau(X) = E(Y|D=1,X) - E(Y|D=0,X) \\= (\beta_0 + \beta_d + \beta_x X) - (\beta_0 + \beta_x X) \\ = \beta_d
τ(X)=E(Y∣D=1,X)−E(Y∣D=0,X)=(β0+βd+βxX)−(β0+βxX)=βd
【治疗效果和协变量没关,(同质的,homogeneous)】
于是, τ = E ( τ ( X ) ) = β d \tau = E(\tau(X)) = \beta_d τ=E(τ(X))=βd
【这个想法valid only under two strong assumptions: ignorability and the linear model.】
【缺陷:忽略了协变量和治疗效果的异质 treatment effect heterogeneity induced by the covariates】
另一种
也可以用
τ
(
X
)
=
f
^
1
(
X
)
−
f
^
2
(
X
)
\tau(X) = \hat{f}_1(X) - \hat{f}_2(X)
τ(X)=f^1(X)−f^2(X)
where
f
^
1
\hat{f}_{1}
f^1 and
f
^
2
\hat{f}_2
f^2 是predictors based on the treated and control data
算完
f
^
1
\hat{f}_{1}
f^1 and
f
^
2
\hat{f}_2
f^2估计,带进去算
τ
\tau
τ.
也叫outcome imputation estimator
【缺陷是:对模型的选择极其敏感】(有点像instrumental用的那个因果估计)
The biggest problem of the above approach based on outcome regressions is its sensitivity to the specification of the outcome model. Depending on the incentive of empirical research and pub- lications, people sometimes reported their favorable causal effects estimates after searching over a wide set of candidate models, without confessing this searching process. This is a major source of p-hacking in causal inference.