评分算法_R倾向评分匹配算法——R实例学习

最新推荐文章于 2023-01-10 00:00:00 发布

weixin_39831242

最新推荐文章于 2023-01-10 00:00:00 发布

阅读量1.1k

点赞数

文章标签：评分算法

本文链接：https://blog.csdn.net/weixin_39831242/article/details/112616166

版权

本文通过R语言介绍倾向评分匹配（PSM）算法，首先使用随机数据学习算法，然后分析去教会学校和公共学校学生成绩差异。PSM旨在校正混淆因子，确保在比较样本时具有相似特征，以估计干预效果。文章详细展示了数据准备、匹配、协变量平衡检查和治疗效应评估的过程。

摘要由CSDN通过智能技术生成

倾向评分算法用于校正模型中的混淆因子，这里我们先使用随机生成的数据学习该算法，然后实际分析一下去教会学校和公共学校上学学生的成绩差异。

学习

According to Wikipedia, propensity score matching (PSM) is a “statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment”. In a broader sense, propensity score analysis assumes that an unbiased comparison between samples can only be made when the subjects of both samples have similar characteristics. Thus, PSM can not only be used as “an alternative method to estimate the effect of receiving treatment when random assignment of treatments to subjects is not feasible” (Thavaneswaran 2008). It can also be used for the comparison of samples in epidemiological studies.

创建两个随机数据框

数据框#1：

library(wakefield)
set.seed(1234)
df.patients                             age(x = 30:78, 
                                name = 'Age'), 
                            sex(x = c("Male", "Female"), 
                                prob = c(0.70, 0.30), 
                                name = "Sex"))
df.patients$Sample 'Patients')

查看下描述性统计量：

summary(df.patients)
##       Age           Sex           Sample   
##  Min.   :30.0   Male  :173   Patients:250  
##  1st Qu.:42.0   Female: 77                 
##  Median :54.0                              
##  Mean   :53.7                              
##  3rd Qu.:66.0                              
##  Max.   :78.0

the mean age of the patient sample is 53.7 and roughly 70% of the patients are male (69.2%).

数据框#2：

set.seed(1234)
df.population                               age(x = 18:80, 
                                  name = 'Age'), 
                              sex(x = c("Male", "Female"), 
                                  prob = c(0.50, 0.50), 
                                  name = "Sex"))
df.population$Sample 'Population')

上面这个数据框用来模拟总体情况。

summary(df.population)
##       Age           Sex             Sample    
##  Min.   :18.0   Male  :485   Population:1000  
##  1st Qu.:34.0   Female:515                    
##  Median :50.0                                 
##  Mean   :49.5                                 
##  3rd Qu.:65.0                                 
##  Max.   :80.0

融合数据框

mydata mydata$Group $Sample == 'Patients')
mydata$Distress $Sex == 'Male', age(nrow(mydata), x = 0:42, name = 'Distress'),
                                                age(nrow(mydata), x = 15:42, name = 'Distress'))

当我们两个样本中比较年龄和性别时会发现差异：

pacman::p_load(tableone)
table1 'Age', 'Sex', 'Distress'), 
                         data = mydata, 
                         factorVars = 'Sex', 
                         strata = 'Sample')
table1 print(table1, 
                printToggle = FALSE, 
                noSpaces = TRUE)
knitr::kable(table1[,1:3],  
      align = 'c', 
      caption = 'Comparison of unmatched samples')

	Patients	Population	p
n	250	1000
Age (mean (sd))	53.71 (13.88)	49.46 (18.33)	0.001
Sex = Female (%)	77 (30.8)	515 (51.5)	<0.001
Distress (mean (sd))	22.86 (11.38)	25

最低0.47元/天解锁文章

weixin_39831242

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
评分算法_R倾向评分匹配算法——R实例学习

倾向评分算法用于校正模型中的混淆因子，这里我们先使用随机生成的数据学习该算法，然后实际分析一下去教会学校和公共学校上学学生的成绩差异。学习According to Wikipedia, propensity score matching (PSM) is a “statistical matching technique that attempts to estimate the eff...
复制链接

扫一扫