Causal Forest Theory

因果森林总结:基于树模型的异质因果效应估计

Uplift model with multiple treatments

1. Estimation and Inference of Heterogeneous Treatment Effects using Random Forest

二元干预情形下估计 τ ( x ) = E [ Y 1 − Y 0 ∣ X = x ] \tau(x)=E[Y^1-Y^0|X=x] τ(x)=E[Y1Y0X=x]

1.1 Asymptotic analysis

定理1

  • Under some condition,
    ( τ ^ ( x ) − τ ( x ) ) / Var ⁡ [ τ ^ ( x ) ] ⇒ N ( 0 , 1 ) (\hat{\tau}(x)-\tau(x)) / \sqrt{\operatorname{Var}[\hat{\tau}(x)]} \Rightarrow \mathcal{N}(0,1) (τ^(x)τ(x))/Var[τ^(x)] N(0,1)

  • Var ⁡ [ τ ^ ( x ) ] \operatorname{Var}[\hat{\tau}(x)] Var[τ^(x)]可以用infinitesimal jackknife估计 V ^ I J ( x ) / Var ⁡ [ τ ^ ( x ) ] → 1 \widehat{V}_{I J}(x) / \operatorname{Var}[\hat{\tau}(x)] \rightarrow 1 V IJ(x)/Var[τ^(x)]1
    V ^ I J ( x ) = n − 1 n ( n n − s ) 2 ∑ i = 1 n Cov ⁡ ∗ [ τ ^ b ∗ ( x ) , N i b ∗ ] 2 \widehat{V}_{I J}(x)=\frac{n-1}{n}\left(\frac{n}{n-s}\right)^2 \sum_{i=1}^n \operatorname{Cov}_*\left[\hat{\tau}_b^*(x), N_{i b}^*\right]^2 V IJ(x)=nn1(nsn)2i=1nCov[τ^b(x),Nib]2
    其中,系数项 ( n − 1 ) n / ( n − s ) 2 (n-1)n/(n-s)^2 (n1)n/(ns)2只能对无放回的子抽样做修正

证明过程分为两步:

  • 先证明偏差 E [ μ ^ n ( x ) − μ ( x ) ] E[\hat \mu_n(x)-\mu(x)] E[μ^n(x)μ(x)]的bound

在这里插入图片描述
在这里插入图片描述

  • 再证明 μ ^ n ( x ) − E [ μ ^ n ( x ) ] \hat \mu_n(x)-E[\hat \mu_n(x)] μ^n(x)E[μ^n(x)]近似正态

利用Hajek projection和k-PNN先证明T is ν-incremental
T ∘ = E [ T ] + ∑ i = 1 n ( E [ T ∣ Z i ] − E [ T ] ) \stackrel{\circ}{T}=\mathbb{E}[T]+\sum_{i=1}^n\left(\mathbb{E}\left[T \mid Z_i\right]-\mathbb{E}[T]\right) T=E[T]+i=1n(E[TZi]E[T])

在这里插入图片描述

1.2 Double-Sample Trees

算法

回归树T分裂准则为最小化MSE, μ ^ ( x ) = 1 ∣ { i : X i ∈ L ( x ) } ∣ ∑ { i : X i ∈ L ( x ) } Y i = Y ˉ L \hat{\mu}(x)=\frac{1}{\left|\left\{i: X_i \in L(x)\right\}\right|} \sum_{\left\{i: X_i \in L(x)\right\}} Y_i=\bar Y_L μ^(x)={ i:XiL(x)}1{ i:XiL(x)}Yi=YˉL

∑ i ∈ J ( μ ^ ( X i ) − Y i ) 2 = ∑ i ∈ J Y i 2 − ∑ i ∈ J μ ^ ( X i ) 2 \sum_{i \in \mathcal{J}}\left(\hat{\mu}\left(X_i\right)-Y_i\right)^2=\sum_{i \in \mathcal{J}} Y_i^2-\sum_{i \in \mathcal{J}} \hat{\mu}\left(X_i\right)^2 iJ(μ^(Xi)Yi)2=iJYi2iJμ^(Xi)2

考虑到 ∑ i ∈ J μ ^ ( X i ) = ∑ i ∈ J Y i \sum_{i \in \mathcal{J}} \hat{\mu}\left(X_i\right)=\sum_{i \in \mathcal{J}} Y_i iJμ^(Xi)=iJYi,上式等价于最大化 μ ^ ( X i ) \hat{\mu}(X_i) μ^(Xi)的方差

2. Generalized Random Forests

2.1 Algorithm

1. Forest-based local estimation

目的:给定 ( O i , X i ) (O_i, X_i) (Oi,Xi), 估计 θ ( ⋅ ) \theta(\cdot) θ(),如估计HTE时, O i = ( Y i , W i ) O_i=(Y_i, W_i) Oi=(Yi,Wi)
方法:求解方程 E [ ψ θ ( x ) , v ( x ) ( O i ) ∣ X i = x ] = 0 \mathbb{E}\left[\psi_{\theta(x), v(x)}\left(O_i\right) \mid X_i=x\right]=0 E[ψθ(x),v(x)(Oi)Xi=x]=0,其中, θ ( x ) , v ( x ) \theta(x), v(x) θ(x),v(x)分别是感兴趣的参数和无关参数

  • 权重估计阶段: α i ( x ) \alpha_i(x) αi(x)衡量 x i x_i xi x x x的相似程度,将同一叶子结点中的"“共现频率”"作为其权重
    α b i ( x ) = 1 ( { X i ∈ L b ( x ) } ) ∣ L b ( x ) ∣ , α i ( x ) = 1 B ∑ b = 1 B α b i ( x ) \alpha_{b i}(x)=\frac{\mathbf{1}\left(\left\{X_i \in L_b(x)\right\}\right)}{\left|L_b(x)\right|}, \quad \alpha_i(x)=\frac{1}{B} \sum_{b=1}^B \alpha_{b i}(x) αbi(x)=Lb(x)1({ XiLb(x)}),αi(x)=B1b=1Bαbi(x)
    其中 L b ( x ) L_b(x) Lb(x)为第b棵树 x x x所在叶子结点的所有数据
  • 加权求解
    ( θ ^ ( x ) , v ^ ( x ) ) ∈ argmin ⁡ θ , v { ∥ ∑ i = 1 n α i ( x ) ψ θ , v ( O i ) ∥ 2 } (\hat{\theta}(x), \hat{v}(x)) \in \underset{\theta, v}{\operatorname{argmin}}\left\{\left\|\sum_{i=1}^n \alpha_i(x) \psi_{\theta, v}\left(O_i\right)\right\|_2\right\} (θ^(x),v^(x))θ,vargmin{ i=1nαi(x)ψθ,v(Oi) 2}
    例子:求解 μ ( x ) = E [ Y i ∣ X i = x ] = 0 \mu(x)=\mathbb{E}\left[Y_i \mid X_i=x\right]=0 μ(x)=E[YiXi=x]=0,取 ψ u ( x ) ( Y i ) = Y i − μ ( x ) \psi_{u(x)}\left(Y_i\right)=Y_i-\mu(x) ψu(x)(Yi)=Yiμ(x),则有 ∑ i = 1 n 1 B ∑ α b i ( x ) ( Y i − μ ^ ( x ) ) = 0 \sum_{i=1}^n \frac{1}{B} \sum \alpha_{b i}(x)\left(Y_i-\hat{\mu}(x)\right)=0 i=1nB1αbi(
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值