多维IRT模型的EM估计

多维IRT模型的EM估计

MIRT (Multidimensional Item Response Theory)多维项目反应理论。与一维项目反应理论的区别只是在于对于潜在变量的 θ i \theta_{i} θi 的建模,一个是unidimensional latent trait θ i \theta_{i} θi,一个是multidimensional θ i = c ( θ i 1 , θ i 2 , … , θ i K ) \boldsymbol{\theta_{i}}=c(\theta_{i1},\theta_{i2},…,\theta_{iK}) θi=c(θi1,θi2,,θiK),这里 K K K 就是MIRT 中所指代的维数。以下是M2PL(Multidimensional Two Parameter Logistic) 模型的似然函数。
-0. Notation 记号如下,
假设有 N N N 个被试参加了一个有 J J J 个题目的测试,获得对应的二值反应数据 Y = ( Y i j ) i = 1 , j = 1 N , J Y=(Y_{ij})_{i=1, j=1}^{N,J} Y=(Yij)i=1,j=1N,J,其中 Y i j = 1 Y_{ij}=1 Yij=1 表示第 i i i 个人答对了第 j j j 个题目,否则 Y i j = 0 Y_{ij}=0 Yij=0。假设这个测试中,每个题目 j j j 测试了 K K K 种能力,对应了 K K K 个区分度参数 a j ′ = c ( a j 1 , a j 2 , … , a j K ) \boldsymbol{a_{j}^{'}}=c(a_{j1},a_{j2},…,a_{jK}) aj=c(aj1,aj2,ajK),一个难度参数的等价参数 d j d_{j} dj,相对应的,每个被试 i i i 在作答题目的时候就对应了 K K K 种能力,也就是 θ i = c ( θ i 1 , θ i 2 , … , θ i K ) \boldsymbol{\theta_{i}}=c(\theta_{i1},\theta_{i2},…,\theta_{iK}) θi=c(θi1,θi2,,θiK)。记项目参数为 A = ( a j ′ ) j = 1 J A=(\boldsymbol{a_j^{'}})_{j=1}^{J} A=(aj)j=1J, d = ( d j ) j = 1 J \boldsymbol{d}=(d_{j})_{j=1}^{J} d=(dj)j=1J, Θ = ( θ i ′ ) i = 1 N \varTheta=(\boldsymbol{\theta_{i}^{'}})_{i=1}^{N} Θ=(θi)i=1N.

-1. Likelihood function of M2PLM
L ( Y ; A , d , Θ ) = ∏ i = 1 N ∏ j = 1 J e x p ( Y i j ( a j ′ θ i + d j ) ) 1 + e x p ( a j ′ θ i + d j ) . L(Y;A,\boldsymbol{d},\varTheta)=\prod_{i=1}^{N}\prod_{j=1}^{J}\frac{exp(Y_{ij}(\boldsymbol{a_j^{'}}\boldsymbol{\theta_{i}}+d_{j}))}{1+exp(\boldsymbol{a_j^{'}}\boldsymbol{\theta_{i}}+d_{j})}. L(Y;A,d,Θ)=i=1Nj=1J1+exp(ajθi+dj)exp(Yij(ajθi+dj)).

-2. Identification for MIRT
在IRT模型中,对于不同的人,一般假设答题独立,并且能力 θ \theta θ 服从相同的或者一元或者多元正态分布。在一维IRT模型中,一般为了模型可识别,只需要限制人的能力 θ i \theta_{i} θi 服从均值和方差都已知的正态分布。一般使用 N ( 0 , 1 ) N(0,1) N(0,1)。在MIRT模型中也需要限制人的多维度能力 θ i \boldsymbol{\theta_{i}} θi 服从均值向量和方差协方差矩阵都是已知的多元正态分布, N ( μ , Σ ) N(\boldsymbol{\mu},\boldsymbol{\Sigma}) N(μ,Σ)。除此之外,对于 ∑ k = 1 K a j k θ i k + d j \sum_{k=1}^{K}a_{jk}\theta_{ik}+d_{j} k=1Kajkθik+dj,考虑区分度参数 a j = ( a j 1 , a j 2 , … a j K ) ′ \boldsymbol{a_{j}}=(a_{j1},a_{j2},…a_{jK})^{'} aj=(aj1,aj2,ajK) 是可以旋转rotation,可以得到相同的量。因此在多维IRT模型中,为了模型的可识别,除了对于能力参数的限制,还要限制项目参数。
常用的限制有两种,Beguin and Glas (2001) 总结的有以下两种,

  • Fraser, 1988, 提出的在探索性分析中,设置能力参数均值为0,方差为单位矩阵。项目参数满足限制 α j k = 0 \alpha_{jk}=0 αjk=0,对于 j = 1 , 2 , … , K − 1 j=1,2,…,K-1 j=1,2,,K1 k = j + 1 , … , K k=j+1,…,K k=j+1,,K。也就是对于前K-1个题目的区分度参数,限制为下三角矩阵。
  • 将能力参数 θ \theta θ看成是未知的被估计量(estimands)。可识别性的限制是设置前 K K K个题的 d j = 0 d_{j}=0 dj=0 j = 1 , 2 , … , K j=1,2,…,K j=1,2,,K。而且对于前 K K K个题的区分度参数 α j k \alpha_{jk} αjk j = 1 , 2 , … , K j=1,2,…,K j=1,2,,K,如果 j = k j=k j=k,则 α j k = 1 \alpha_{jk}=1 αjk=1,否则 α j k = 0 \alpha_{jk}=0 αjk=0。也就是说,前 K K K个题的区分度参数是单位矩阵,并且是难度参数等价参数也是0。

Beguin and Glas (2001) 也推算出这两种可识别性的方法实际上是等价的,相差一个矩阵乘积的rotation。以上两种方法具体可以参考,Beguin and Glas (2001); Fraser & McDonald (1988); Jorekog (1969); Bolt & Lall (2003); Cai (2010); Jianan Sun, Yunxiao Chen et. al. (2016)。

-3. EM algorithm for M2PL

  • Marginal Likelihood function is as follows,
    l o g ∫ Θ L ( Y ; A , d , Θ ) d Θ = ∑ i = 1 N l o g ∫ θ i L ( Y i . ; A , d , θ i ) g ( θ i ) d θ i , log\int_{\varTheta}L(Y;A,\boldsymbol{d},\varTheta)d\varTheta=\sum_{i=1}^{N}log\int_{\boldsymbol{\theta_{i}}}L(Y_{i.};A,\boldsymbol{d},\boldsymbol{\theta_{i}})g(\boldsymbol{\theta_{i}})d\boldsymbol{\theta_{i}}, logΘL(Y;A,d,Θ)dΘ=i=1NlogθiL(Yi.;A,d,θi)g(θi)dθi,
    其中, g ( θ i ) ∼ N ( 0 , I K × K ) g(\boldsymbol{\theta_{i}})\sim N(\boldsymbol{0},\boldsymbol{I}_{K\times K}) g(θi)N(0,IK×K) θ i \boldsymbol{\theta_{i}} θi 的先验函数。

  • Gradient of parameter a j ′ \boldsymbol{a_{j}^{'}} aj is as follows,
    ∂ ∂ a j ′ l o g ∫ Θ L ( Y ; A , d , Θ ) d Θ \frac{\partial}{\partial \boldsymbol{a_{j}^{'}}}log\int_{\varTheta}L(Y;A,\boldsymbol{d},\varTheta)d\varTheta ajlogΘL(Y;A,d,Θ)dΘ
    = ∑ i = 1 N ∫ θ i ∏ j = 1 J L ( Y i j ; a j , d j , θ i ) g ( θ i ) ∫ θ i ∏ j = 1 J L ( Y i j ; a j , d j , θ i ) g ( θ i ) d θ i ∂ ∂ a j ′ l o g ∏ j = 1 J L ( Y i j ; a j , d j , θ i ) d θ i =\sum_{i=1}^{N}\int_{\boldsymbol{\theta_{i}}}\frac{\prod_{j=1}^{J}L(Y_{ij};\boldsymbol{a_{j}},d_{j},\boldsymbol{\theta_{i}})g(\boldsymbol{\theta_{i}})}{\int_{\boldsymbol{\theta_{i}}}\prod_{j=1}^{J}L(Y_{ij};\boldsymbol{a_{j}},d_{j},\boldsymbol{\theta_{i}})g(\boldsymbol{\theta_{i}})d\boldsymbol{\theta_{i}}}\frac{\partial}{\partial \boldsymbol{a_{j}^{'}}}log\prod_{j=1}^{J}L(Y_{ij};\boldsymbol{a_{j}},d_{j},\boldsymbol{\theta_{i}})d\boldsymbol{\theta_{i}} =i=1Nθiθij=1JL(Yij;aj,dj,θi)g(θi)dθij=1JL(Yij;aj,dj,θi)g(θi)ajlogj=1JL(Yij;aj,dj,θi)dθi
    因为 θ i \boldsymbol{\theta_{i}} θi 的后验概率 P ( θ i ∣ Y i . , A , d ) = ∏ j = 1 J L ( Y i j ; a j , d j , θ i ) g ( θ i ) ∫ θ i ∏ j = 1 J L ( Y i j ; a j , d j , θ i ) g ( θ i ) d θ i P(\boldsymbol{\theta_{i}}|Y_{i.},A,\boldsymbol{d})=\frac{\prod_{j=1}^{J}L(Y_{ij};\boldsymbol{a_{j}},d_{j},\boldsymbol{\theta_{i}})g(\boldsymbol{\theta_{i}})}{\int_{\boldsymbol{\theta_{i}}}\prod_{j=1}^{J}L(Y_{ij};\boldsymbol{a_{j}},d_{j},\boldsymbol{\theta_{i}})g(\boldsymbol{\theta_{i}})d\boldsymbol{\theta_{i}}} P(θiYi.,A,d)=θij=1JL(Yij;aj,dj,θi)g(θi)dθij=1JL(Yij;aj,dj,θi)g(θi),并且导数可求。从而上式就变为,
    = ∑ i = 1 N ∫ θ i P ( θ i ∣ Y i . , A , d ) ( Y i j θ i − e x p ( a j ′ θ i + d j ) 1 + e x p ( a j ′ θ i + d j ) θ i ) d θ i =\sum_{i=1}^{N}\int_{\boldsymbol{\theta_{i}}}P(\boldsymbol{\theta_{i}}|Y_{i.},A,\boldsymbol{d})(Y_{ij}\boldsymbol{\theta_{i}}-\frac{exp(\boldsymbol{a_{j}^{'}}\boldsymbol{\theta_{i}}+d_{j})}{1+exp(\boldsymbol{a_{j}^{'}}\boldsymbol{\theta_{i}}+d_{j})}\boldsymbol{\theta_{i}})d\boldsymbol{\theta_{i}} =i=1NθiP(θiYi.,A,d)(Yijθi1+exp(ajθi+dj)exp(ajθi+dj)θi)dθi

  • Substitute for integration using nodes X i k 1 , X i k 2 , X i k 3 X_{ik_{1}},X_{ik_{2}},X_{ik_{3}} Xik1,Xik2,Xik3, for k 1 , k 2 , k 3 = 1 , 2 , … , q k_{1}, k_{2},k_{3}=1,2,…,q k1,k2,k3=1,2,q这里假设考察的能力 K = 3 K=3 K=3, 则上述积分变为,
    = ∑ i = 1 N ∑ k 1 = 1 q ∑ k 2 = 1 q ∑ k 3 = 1 q P ( θ i = c ( X i k 1 , X i k 2 , X i k 3 ) ∣ Y i . , A , d ) ( Y i j ∗ c ( X i k 1 , X i k 2 , X i k 3 ) ′ =\sum_{i=1}^{N}\sum_{k_{1}=1}^{q}\sum_{k_{2}=1}^{q}\sum_{k_{3}=1}^{q}P(\boldsymbol{\theta_{i}}=c(X_{ik_{1}},X_{ik_{2}},X_{ik_{3}})|Y_{i.},A,\boldsymbol{d})(Y_{ij}*c(X_{ik_{1}},X_{ik_{2}},X_{ik_{3}})^{'} =i=1Nk1=1qk2=1qk3=1qP(θi=c(Xik1,Xik2,Xik3)Yi.,A,d)(Yijc(Xik1,Xik2,Xik3)
    − e x p ( a j ′ c ( X i k 1 , X i k 2 , X i k 3 ) + d j ) 1 + e x p ( a j ′ c ( X i k 1 , X i k 2 , X i k 3 ) + d j ) ∗ c ( X i k 1 , X i k 2 , X i k 3 ) ′ ) -\frac{exp(\boldsymbol{a_{j}}^{'}c(X_{ik_{1}},X_{ik_{2}},X_{ik_{3}})+d_{j})}{1+exp(\boldsymbol{a_{j}}^{'}c(X_{ik_{1}},X_{ik_{2}},X_{ik_{3}})+d_{j})}*c(X_{ik_{1}},X_{ik_{2}},X_{ik_{3}})^{'}) 1+exp(ajc(Xik1,Xik2,Xik3)+dj)exp(ajc(Xik1,Xik2,Xik3)+dj)c(Xik1,Xik2,Xik3))

-4. EM program for simulation data

  • Generate simulated data
    假设 N = 3000 N=3000 N=3000个人作答了 J = 20 J=20 J=20题,测试了 K = 3 K=3 K=3个能力,项目区分度参数 a j ∼ r u n i f ( K , 0 , 2 ) \boldsymbol{a_{j}}\sim runif(K, 0, 2) ajrunif(K,0,2), 难度参数等价参数 d j ∼ r u n i f ( 1 , − 1 , 1 ) d_{j}\sim runif(1,-1,1) djrunif(1,1,1), 能力参数 θ i 1 ∼ r n o r m ( 1 , 0 , 1 ) \theta_{i1}\sim rnorm(1,0,1) θi1rnorm(1,0,1), θ i 2 ∼ r n o r m ( 1 , 0 , 1 ) \theta_{i2}\sim rnorm(1,0,1) θi2rnorm(1,0,1), θ i 3 ∼ r n o r m ( 1 , 0 , 1 ) \theta_{i3}\sim rnorm(1,0,1) θi3rnorm(1,0,1)。根据M2PL生成数据,得到的前5个人的反应如下,
    在这里插入图片描述
    生成代码如下,
####generate simulated data####
N <- 3000;J <- 20;K <- 3
theta_1 <- rnorm(N);theta_2 <- rnorm(N);theta_3 <- rnorm(N);theta_true <- cbind(rep(1,N),theta_1,theta_2,theta_3)
A_true <- matrix(0,J,K);A_true[1:10,1] <- runif(10,1,2);A_true[11:20,1] <- A_true[1:10,1]*1.5;A_true[1:10,2] <- runif(10,1.5,2);A_true[11:20,2] <- runif(10,0,0.5);A_true[1:10,3] <- runif(10,0,0.5);A_true[11:20,3] <- runif(10,1.5,2);
d_true <- runif(J,0,1)
D_true <- cbind(d_true,A_true)
P_true <- 1/(1+exp(-theta_true%*%t(D_true)))
response <- matrix(rbinom(N*J,1,P_true),nrow=N,ncol=J);#response
response[1:5,]
  • Estimating the parameters in M2PL model
    – Using the function “mirt” from R package “mirt” as follows,
###mirt analysis###
library(mirt)
m2pl_mod <- 'F1 = 1-20
F2 = 1-19
F3= 1-18
COV = F1 * F2*F3'
m2pl_fit <- mirt(data = response, model = m2pl_mod,
                 itemtype = "2PL", method = "EM",
                 SE = TRUE)
m2pl_params <- coef(m2pl_fit, simplify = TRUE)
bb <- cbind(D_true,m2pl_params$items[,4],m2pl_params$items[,1:3])
colnames(bb)<-c("d_true","a1_true","a2_true","a3_true","d_estimator","a1_estimator","a2_estimator","a3_estimator")

以下是真值和估计的结果,MAE是0.2794661, d \boldsymbol{d} d 的MAE是0.05971843, a . 1 \boldsymbol{a}_{.1} a.1 的MAE是0.5641373, a . 2 \boldsymbol{a}_{.2} a.2 的MAE是0.3748544, a . 3 \boldsymbol{a}_{.3} a.3 的MAE是0.1191544。
在这里插入图片描述

– 以下是我用GD算法估计的R code 和结果。MAE是0.1221731, d \boldsymbol{d} d 的MAE是0.05969582, a . 1 \boldsymbol{a}_{.1} a.1 的MAE是0.1148704, a . 2 \boldsymbol{a}_{.2} a.2 的MAE是0.1575343, a . 3 \boldsymbol{a}_{.3} a.3 的MAE是0.1565919。结果大致一样。

###my code###
Q <- matrix(1,J,K);Q[J,2:3] <- 0;Q[(J-1),3] <- 0;
##initial value###
A_initial <- matrix(0,J,K);A_initial[,1] <- runif(J,1,2);A_initial[,2] <- runif(J,1,2);A_initial[,3] <- runif(J,1,2);
A_initial <- A_initial*Q;
d_initial <- rnorm(J,0,1);D_initial <- cbind(d_initial,A_initial);
KK <- 20;theta_min <- -4;theta_max <- 4;mm1 <- seq(theta_min,theta_max,(theta_max-theta_min)/KK);mm <- mm1[-1]
THETA_tuta <- matrix(0,nrow=KK*KK*KK,ncol=3);THETA_tuta[,3] <- rep(mm,KK*KK);
THETA_tuta[,2] <-rep(c(rep(1,KK)%*%t(mm)),KK);THETA_tuta[,1] <-c(rep(1,KK*KK)%*%t(mm))#针对K <- 3的theta分块,获取theta的分块
THETA_tuta <- cbind(rep(1,nrow(THETA_tuta)),THETA_tuta)

theta_tmp <- rowSums(THETA_tuta[,2:4]*THETA_tuta[,2:4])/2
response <- t(response);A_0 <- t(D_initial)
cc <- exp(THETA_tuta%*%A_0%*%response-theta_tmp)/apply(1+exp((THETA_tuta%*%A_0)),1,prod);
theta_post <- sweep(cc, 2, colSums(cc), "/") 
log_lik_0 <- sum(log(colSums(cc)))
disA <- 0.5;m <- A_0
timstart <- Sys.time()
for(j in 1:J){
  A_grad <- c(c(theta_post%*%response[j,]-rowSums(theta_post*c(1/(1+exp(-THETA_tuta%*%A_0[,j])))))%*%THETA_tuta)
  step <- 1/50
  A_tuta <- A_0[,j]+step*A_grad;
  aa <- A_0;aa[,j] <- A_tuta;
  log_lik_1 <- sum(log(colSums(exp(THETA_tuta%*%aa%*%response-theta_tmp)/apply(1+exp((THETA_tuta%*%aa)),1,prod))));
  while(log_lik_1<log_lik_0+step*sum(A_grad*A_grad)*0.2){
    step <- step*disA;
    A_tuta <- A_0[,j]+step*A_grad;
    aa <- A_0;aa[,j] <- A_tuta;
    log_lik_1 <- sum(log(colSums(exp(THETA_tuta%*%aa%*%response-theta_tmp)/apply(1+exp((THETA_tuta%*%aa)),1,prod))));
  }
  m[,j] <- A_tuta
}
A_0 <- m*t(cbind(rep(1,J),Q))
log_lik_1 <- sum(log(colSums(exp(THETA_tuta%*%A_0%*%response-theta_tmp)/apply(1+exp((THETA_tuta%*%A_0)),1,prod))));
eps <- log_lik_1-log_lik_0
while(eps>0.001){
  log_lik_0 <- log_lik_1;
  cc <- exp(THETA_tuta%*%A_0%*%response-theta_tmp)/apply(1+exp((THETA_tuta%*%A_0)),1,prod);
  theta_post <- sweep(cc, 2, colSums(cc), "/") 
  for(j in 1:J){
    A_grad <- c(c(theta_post%*%response[j,]-rowSums(theta_post*c(1/(1+exp(-THETA_tuta%*%A_0[,j])))))%*%THETA_tuta)
    step <- 1/50
    A_tuta <- A_0[,j]+step*A_grad;
    aa <- A_0;aa[,j] <- A_tuta;
    log_lik_1 <- sum(log(colSums(exp(THETA_tuta%*%aa%*%response-theta_tmp)/apply(1+exp((THETA_tuta%*%aa)),1,prod))));
    while(log_lik_1<log_lik_0+step*sum(A_grad*A_grad)*0.2){
      step <- step*disA;
      A_tuta <- A_0[,j]+step*A_grad;
      aa <- A_0;aa[,j] <- A_tuta;
      log_lik_1 <- sum(log(colSums(exp(THETA_tuta%*%aa%*%response-theta_tmp)/apply(1+exp((THETA_tuta%*%aa)),1,prod))));
    }
    m[,j] <- A_tuta
  }
  A_0 <- m*t(cbind(rep(1,J),Q))
  log_lik_1 <- sum(log(colSums(exp(THETA_tuta%*%A_0%*%response-theta_tmp)/apply(1+exp((THETA_tuta%*%A_0)),1,prod))));
  eps <- log_lik_1-log_lik_0
}
timend <- Sys.time()
tt <- timend-timstart
tt

在这里插入图片描述

[1]:Béguin, A. A., & Glas, C. A. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541-561.
[2]:Fraser, C., & McDonald, R. P. (1988). NOHARM: Least squares item factor analysis. Multivariate behavioral research, 23(2), 267-269.
[3]:Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34(2), 183-202.
[4]:Bolt, D. M., & Lall, V. F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 27(6), 395-414.
[5]:Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 75(1), 33-57.
[6]:Sun, J., Chen, Y., Liu, J., Ying, Z., & Xin, T. (2016). Latent Variable Selection for Multidimensional Item Response Theory Models via L1 Regularization. psychometrika, 81(4), 921-939.

  • 5
    点赞
  • 29
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值