R语言实现PSM（2）

weixin_49320263

已于 2023-10-05 22:24:36 修改

阅读量227

点赞数

分类专栏：常用方法文章标签： r语言

于 2023-10-05 19:52:47 首次发布

常用方法专栏收录该内容

21 篇文章 2 订阅

订阅专栏

library(MatchIt)
help(package="MatchIt")
data("lalonde")
str(lalonde)
table(lalonde$treat)
# 默认1:1匹配
table(lalonde$treat)
m.out1 <- matchit(treat ~ age + educ + race + nodegree +
                    married + re74 + re75, data = lalonde)
m.out1
summary(m.out1)
data=match.data(m.out1)

#使用"mahalanobis"距离计算方法
m.out2 <- matchit(treat ~ age + educ + race + nodegree +
                    married + re74 + re75, data = lalonde,
                  distance = "mahalanobis", #distance距离：glm,gam,mahalanobis等计算
                  replace = TRUE,#对照组可以匹配多个试验组
                  exact = ~ married + race)#精确匹配结婚和种族
m.out2
summary(m.out2, un = TRUE)
data=match.data(m.out2)
Table(data=data,Factor="age+educ+race+married",
      Group="treat",file=NULL)

#设置最大匹配距离caliper
m.out3 <- matchit(treat ~ age + educ + race + nodegree +
                    married + re74 + re75, data = lalonde,
                  distance = "glm", link = "probit",
                  mahvars = ~ age + educ + re74 + re75,
                  caliper = 0.1, #匹配的最大距离为 0.1
                  ratio = 2)#对照组：试验组
m.out3
summary(m.out3, un = TRUE)
data=match.data(m.out3)
Table(data=data,Factor="age+educ+race+married",
      Group="treat",file=NULL)

#使用"full"距离计算方法
m.out4 <- matchit(treat ~ age + educ + race + nodegree +
                    married + re74 + re75, data = lalonde,
                  method = "full",#匹配方法
                  estimand = "ATE",
                  caliper = c(.1, age = 2, educ = 1),#分别设置不同变量的最大匹配距离
                  std.caliper = c(TRUE, FALSE, FALSE))#是否将 caliper 参数标准化
m.out4
summary(m.out4, un = TRUE)
data=match.data(m.out4)
Table(data=data,Factor="age+educ+race+married",
      Group="treat",file=NULL)

#使用subclass匹配方法
s.out1 <- matchit(treat ~ age + educ + race + nodegree +
                    married + re74 + re75, data = lalonde,
                  method = "subclass", #匹配方法为 "subclass"
                  distance = "glm",
                  discard = "control",#只有处理组的观测值会保留，未匹配到的对照组将被丢弃
                  subclass = 20)#分成了 10 个子类来进行匹配。
s.out1
summary(s.out1, un = TRUE)
data=match.data(s.out1)
Table(data=data,Factor="age+educ+race+married",
      Group="treat",file=NULL)

常见问题：

1、为什么匹配后对照组和实验组仍存在差异？

可以尝试设置caliper，及设定最大匹配距离，但是这会导致样本量减少。

2、计算距离包括哪些方法？

glm、gam、gbm、lasso、ridge等。

3、匹配方法包括哪些？

nearest、optimal、full、exact、subclass等

4、常遇到匹配后因素仍存在差异，如何解决？

（1）换不同的匹配方法和距离计算方法。（2）不一定要把所有有差异的因素纳入匹配方程式内，调换因素往往可以得到意想不到的结果。（3）不断调整caliper，一般设置为0.1就可以得到较好的结果。