多维IRT模型的EM估计
MIRT (Multidimensional Item Response Theory)多维项目反应理论。与一维项目反应理论的区别只是在于对于潜在变量的
θ
i
\theta_{i}
θi 的建模,一个是unidimensional latent trait
θ
i
\theta_{i}
θi,一个是multidimensional
θ
i
=
c
(
θ
i
1
,
θ
i
2
,
…
,
θ
i
K
)
\boldsymbol{\theta_{i}}=c(\theta_{i1},\theta_{i2},…,\theta_{iK})
θi=c(θi1,θi2,…,θiK),这里
K
K
K 就是MIRT 中所指代的维数。以下是M2PL(Multidimensional Two Parameter Logistic) 模型的似然函数。
-0. Notation 记号如下,
假设有
N
N
N 个被试参加了一个有
J
J
J 个题目的测试,获得对应的二值反应数据
Y
=
(
Y
i
j
)
i
=
1
,
j
=
1
N
,
J
Y=(Y_{ij})_{i=1, j=1}^{N,J}
Y=(Yij)i=1,j=1N,J,其中
Y
i
j
=
1
Y_{ij}=1
Yij=1 表示第
i
i
i 个人答对了第
j
j
j 个题目,否则
Y
i
j
=
0
Y_{ij}=0
Yij=0。假设这个测试中,每个题目
j
j
j 测试了
K
K
K 种能力,对应了
K
K
K 个区分度参数
a
j
′
=
c
(
a
j
1
,
a
j
2
,
…
,
a
j
K
)
\boldsymbol{a_{j}^{'}}=c(a_{j1},a_{j2},…,a_{jK})
aj′=c(aj1,aj2,…,ajK),一个难度参数的等价参数
d
j
d_{j}
dj,相对应的,每个被试
i
i
i 在作答题目的时候就对应了
K
K
K 种能力,也就是
θ
i
=
c
(
θ
i
1
,
θ
i
2
,
…
,
θ
i
K
)
\boldsymbol{\theta_{i}}=c(\theta_{i1},\theta_{i2},…,\theta_{iK})
θi=c(θi1,θi2,…,θiK)。记项目参数为
A
=
(
a
j
′
)
j
=
1
J
A=(\boldsymbol{a_j^{'}})_{j=1}^{J}
A=(aj′)j=1J,
d
=
(
d
j
)
j
=
1
J
\boldsymbol{d}=(d_{j})_{j=1}^{J}
d=(dj)j=1J,
Θ
=
(
θ
i
′
)
i
=
1
N
\varTheta=(\boldsymbol{\theta_{i}^{'}})_{i=1}^{N}
Θ=(θi′)i=1N.
-1. Likelihood function of M2PLM
L
(
Y
;
A
,
d
,
Θ
)
=
∏
i
=
1
N
∏
j
=
1
J
e
x
p
(
Y
i
j
(
a
j
′
θ
i
+
d
j
)
)
1
+
e
x
p
(
a
j
′
θ
i
+
d
j
)
.
L(Y;A,\boldsymbol{d},\varTheta)=\prod_{i=1}^{N}\prod_{j=1}^{J}\frac{exp(Y_{ij}(\boldsymbol{a_j^{'}}\boldsymbol{\theta_{i}}+d_{j}))}{1+exp(\boldsymbol{a_j^{'}}\boldsymbol{\theta_{i}}+d_{j})}.
L(Y;A,d,Θ)=i=1∏Nj=1∏J1+exp(aj′θi+dj)exp(Yij(aj′θi+dj)).
-2. Identification for MIRT
在IRT模型中,对于不同的人,一般假设答题独立,并且能力
θ
\theta
θ 服从相同的或者一元或者多元正态分布。在一维IRT模型中,一般为了模型可识别,只需要限制人的能力
θ
i
\theta_{i}
θi 服从均值和方差都已知的正态分布。一般使用
N
(
0
,
1
)
N(0,1)
N(0,1)。在MIRT模型中也需要限制人的多维度能力
θ
i
\boldsymbol{\theta_{i}}
θi 服从均值向量和方差协方差矩阵都是已知的多元正态分布,
N
(
μ
,
Σ
)
N(\boldsymbol{\mu},\boldsymbol{\Sigma})
N(μ,Σ)。除此之外,对于
∑
k
=
1
K
a
j
k
θ
i
k
+
d
j
\sum_{k=1}^{K}a_{jk}\theta_{ik}+d_{j}
∑k=1Kajkθik+dj,考虑区分度参数
a
j
=
(
a
j
1
,
a
j
2
,
…
a
j
K
)
′
\boldsymbol{a_{j}}=(a_{j1},a_{j2},…a_{jK})^{'}
aj=(aj1,aj2,…ajK)′ 是可以旋转rotation,可以得到相同的量。因此在多维IRT模型中,为了模型的可识别,除了对于能力参数的限制,还要限制项目参数。
常用的限制有两种,Beguin and Glas (2001) 总结的有以下两种,
- Fraser, 1988, 提出的在探索性分析中,设置能力参数均值为0,方差为单位矩阵。项目参数满足限制 α j k = 0 \alpha_{jk}=0 αjk=0,对于 j = 1 , 2 , … , K − 1 j=1,2,…,K-1 j=1,2,…,K−1, k = j + 1 , … , K k=j+1,…,K k=j+1,…,K。也就是对于前K-1个题目的区分度参数,限制为下三角矩阵。
- 将能力参数 θ \theta θ看成是未知的被估计量(estimands)。可识别性的限制是设置前 K K K个题的 d j = 0 d_{j}=0 dj=0, j = 1 , 2 , … , K j=1,2,…,K j=1,2,…,K。而且对于前 K K K个题的区分度参数 α j k \alpha_{jk} αjk, j = 1 , 2 , … , K j=1,2,…,K j=1,2,…,K,如果 j = k j=k j=k,则 α j k = 1 \alpha_{jk}=1 αjk=1,否则 α j k = 0 \alpha_{jk}=0 αjk=0。也就是说,前 K K K个题的区分度参数是单位矩阵,并且是难度参数等价参数也是0。
Beguin and Glas (2001) 也推算出这两种可识别性的方法实际上是等价的,相差一个矩阵乘积的rotation。以上两种方法具体可以参考,Beguin and Glas (2001); Fraser & McDonald (1988); Jorekog (1969); Bolt & Lall (2003); Cai (2010); Jianan Sun, Yunxiao Chen et. al. (2016)。
-3. EM algorithm for M2PL
-
Marginal Likelihood function is as follows,
l o g ∫ Θ L ( Y ; A , d , Θ ) d Θ = ∑ i = 1 N l o g ∫ θ i L ( Y i . ; A , d , θ i ) g ( θ i ) d θ i , log\int_{\varTheta}L(Y;A,\boldsymbol{d},\varTheta)d\varTheta=\sum_{i=1}^{N}log\int_{\boldsymbol{\theta_{i}}}L(Y_{i.};A,\boldsymbol{d},\boldsymbol{\theta_{i}})g(\boldsymbol{\theta_{i}})d\boldsymbol{\theta_{i}}, log∫ΘL(Y;A,d,Θ)dΘ=i=1∑Nlog∫θiL(Yi.;A,d,θi)g(θi)dθi,
其中, g ( θ i ) ∼ N ( 0 , I K × K ) g(\boldsymbol{\theta_{i}})\sim N(\boldsymbol{0},\boldsymbol{I}_{K\times K}) g(θi)∼N(0,IK×K) 是 θ i \boldsymbol{\theta_{i}} θi 的先验函数。 -
Gradient of parameter a j ′ \boldsymbol{a_{j}^{'}} aj′ is as follows,
∂ ∂ a j ′ l o g ∫ Θ L ( Y ; A , d , Θ ) d Θ \frac{\partial}{\partial \boldsymbol{a_{j}^{'}}}log\int_{\varTheta}L(Y;A,\boldsymbol{d},\varTheta)d\varTheta ∂aj′∂log∫ΘL(Y;A,d,Θ)dΘ
= ∑ i = 1 N ∫ θ i ∏ j = 1 J L ( Y i j ; a j , d j , θ i ) g ( θ i ) ∫ θ i ∏ j = 1 J L ( Y i j ; a j , d j , θ i ) g ( θ i ) d θ i ∂ ∂ a j ′ l o g ∏ j = 1 J L ( Y i j ; a j , d j , θ i ) d θ i =\sum_{i=1}^{N}\int_{\boldsymbol{\theta_{i}}}\frac{\prod_{j=1}^{J}L(Y_{ij};\boldsymbol{a_{j}},d_{j},\boldsymbol{\theta_{i}})g(\boldsymbol{\theta_{i}})}{\int_{\boldsymbol{\theta_{i}}}\prod_{j=1}^{J}L(Y_{ij};\boldsymbol{a_{j}},d_{j},\boldsymbol{\theta_{i}})g(\boldsymbol{\theta_{i}})d\boldsymbol{\theta_{i}}}\frac{\partial}{\partial \boldsymbol{a_{j}^{'}}}log\prod_{j=1}^{J}L(Y_{ij};\boldsymbol{a_{j}},d_{j},\boldsymbol{\theta_{i}})d\boldsymbol{\theta_{i}} =i=1∑N∫θi∫θi∏j=1JL(Yij;aj,dj,θi)g(θi)dθi∏j=1JL(Yij;aj,dj,θi)g(θi)∂aj′∂logj=1∏JL(Yij;aj,dj,θi)dθi
因为 θ i \boldsymbol{\theta_{i}} θi 的后验概率 P ( θ i ∣ Y i . , A , d ) = ∏ j = 1 J L ( Y i j ; a j , d j , θ i ) g ( θ i ) ∫ θ i ∏ j = 1 J L ( Y i j ; a j , d j , θ i ) g ( θ i ) d θ i P(\boldsymbol{\theta_{i}}|Y_{i.},A,\boldsymbol{d})=\frac{\prod_{j=1}^{J}L(Y_{ij};\boldsymbol{a_{j}},d_{j},\boldsymbol{\theta_{i}})g(\boldsymbol{\theta_{i}})}{\int_{\boldsymbol{\theta_{i}}}\prod_{j=1}^{J}L(Y_{ij};\boldsymbol{a_{j}},d_{j},\boldsymbol{\theta_{i}})g(\boldsymbol{\theta_{i}})d\boldsymbol{\theta_{i}}} P(θi∣Yi.,A,d)=∫θi∏j=1JL(Yij;aj,dj,θi)g(θi)dθi∏j=1JL(Yij;aj,dj,θi)g(θi),并且导数可求。从而上式就变为,
= ∑ i = 1 N ∫ θ i P ( θ i ∣ Y i . , A , d ) ( Y i j θ i − e x p ( a j ′ θ i + d j ) 1 + e x p ( a j ′ θ i + d j ) θ i ) d θ i =\sum_{i=1}^{N}\int_{\boldsymbol{\theta_{i}}}P(\boldsymbol{\theta_{i}}|Y_{i.},A,\boldsymbol{d})(Y_{ij}\boldsymbol{\theta_{i}}-\frac{exp(\boldsymbol{a_{j}^{'}}\boldsymbol{\theta_{i}}+d_{j})}{1+exp(\boldsymbol{a_{j}^{'}}\boldsymbol{\theta_{i}}+d_{j})}\boldsymbol{\theta_{i}})d\boldsymbol{\theta_{i}} =i=1∑N∫θiP(θi∣Yi.,A,d)(Yijθi−1+exp(aj′θi+dj)exp(aj′θi+dj)θi)dθi -
Substitute for integration using nodes X i k 1 , X i k 2 , X i k 3 X_{ik_{1}},X_{ik_{2}},X_{ik_{3}} Xik1,Xik2,Xik3, for k 1 , k 2 , k 3 = 1 , 2 , … , q k_{1}, k_{2},k_{3}=1,2,…,q k1,k2,k3=1,2,…,q这里假设考察的能力 K = 3 K=3 K=3, 则上述积分变为,
= ∑ i = 1 N ∑ k 1 = 1 q ∑ k 2 = 1 q ∑ k 3 = 1 q P ( θ i = c ( X i k 1 , X i k 2 , X i k 3 ) ∣ Y i . , A , d ) ( Y i j ∗ c ( X i k 1 , X i k 2 , X i k 3 ) ′ =\sum_{i=1}^{N}\sum_{k_{1}=1}^{q}\sum_{k_{2}=1}^{q}\sum_{k_{3}=1}^{q}P(\boldsymbol{\theta_{i}}=c(X_{ik_{1}},X_{ik_{2}},X_{ik_{3}})|Y_{i.},A,\boldsymbol{d})(Y_{ij}*c(X_{ik_{1}},X_{ik_{2}},X_{ik_{3}})^{'} =i=1∑Nk1=1∑qk2=1∑qk3=1∑qP(θi=c(Xik1,Xik2,Xik3)∣Yi.,A,d)(Yij∗c(Xik1,Xik2,Xik3)′
− e x p ( a j ′ c ( X i k 1 , X i k 2 , X i k 3 ) + d j ) 1 + e x p ( a j ′ c ( X i k 1 , X i k 2 , X i k 3 ) + d j ) ∗ c ( X i k 1 , X i k 2 , X i k 3 ) ′ ) -\frac{exp(\boldsymbol{a_{j}}^{'}c(X_{ik_{1}},X_{ik_{2}},X_{ik_{3}})+d_{j})}{1+exp(\boldsymbol{a_{j}}^{'}c(X_{ik_{1}},X_{ik_{2}},X_{ik_{3}})+d_{j})}*c(X_{ik_{1}},X_{ik_{2}},X_{ik_{3}})^{'}) −1+exp(aj′c(Xik1,Xik2,Xik3)+dj)exp(aj′c(Xik1,Xik2,Xik3)+dj)∗c(Xik1,Xik2,Xik3)′)
-4. EM program for simulation data
- Generate simulated data
假设 N = 3000 N=3000 N=3000个人作答了 J = 20 J=20 J=20题,测试了 K = 3 K=3 K=3个能力,项目区分度参数 a j ∼ r u n i f ( K , 0 , 2 ) \boldsymbol{a_{j}}\sim runif(K, 0, 2) aj∼runif(K,0,2), 难度参数等价参数 d j ∼ r u n i f ( 1 , − 1 , 1 ) d_{j}\sim runif(1,-1,1) dj∼runif(1,−1,1), 能力参数 θ i 1 ∼ r n o r m ( 1 , 0 , 1 ) \theta_{i1}\sim rnorm(1,0,1) θi1∼rnorm(1,0,1), θ i 2 ∼ r n o r m ( 1 , 0 , 1 ) \theta_{i2}\sim rnorm(1,0,1) θi2∼rnorm(1,0,1), θ i 3 ∼ r n o r m ( 1 , 0 , 1 ) \theta_{i3}\sim rnorm(1,0,1) θi3∼rnorm(1,0,1)。根据M2PL生成数据,得到的前5个人的反应如下,
生成代码如下,
####generate simulated data####
N <- 3000;J <- 20;K <- 3
theta_1 <- rnorm(N);theta_2 <- rnorm(N);theta_3 <- rnorm(N);theta_true <- cbind(rep(1,N),theta_1,theta_2,theta_3)
A_true <- matrix(0,J,K);A_true[1:10,1] <- runif(10,1,2);A_true[11:20,1] <- A_true[1:10,1]*1.5;A_true[1:10,2] <- runif(10,1.5,2);A_true[11:20,2] <- runif(10,0,0.5);A_true[1:10,3] <- runif(10,0,0.5);A_true[11:20,3] <- runif(10,1.5,2);
d_true <- runif(J,0,1)
D_true <- cbind(d_true,A_true)
P_true <- 1/(1+exp(-theta_true%*%t(D_true)))
response <- matrix(rbinom(N*J,1,P_true),nrow=N,ncol=J);#response
response[1:5,]
- Estimating the parameters in M2PL model
– Using the function “mirt” from R package “mirt” as follows,
###mirt analysis###
library(mirt)
m2pl_mod <- 'F1 = 1-20
F2 = 1-19
F3= 1-18
COV = F1 * F2*F3'
m2pl_fit <- mirt(data = response, model = m2pl_mod,
itemtype = "2PL", method = "EM",
SE = TRUE)
m2pl_params <- coef(m2pl_fit, simplify = TRUE)
bb <- cbind(D_true,m2pl_params$items[,4],m2pl_params$items[,1:3])
colnames(bb)<-c("d_true","a1_true","a2_true","a3_true","d_estimator","a1_estimator","a2_estimator","a3_estimator")
以下是真值和估计的结果,MAE是0.2794661,
d
\boldsymbol{d}
d 的MAE是0.05971843,
a
.
1
\boldsymbol{a}_{.1}
a.1 的MAE是0.5641373,
a
.
2
\boldsymbol{a}_{.2}
a.2 的MAE是0.3748544,
a
.
3
\boldsymbol{a}_{.3}
a.3 的MAE是0.1191544。
– 以下是我用GD算法估计的R code 和结果。MAE是0.1221731, d \boldsymbol{d} d 的MAE是0.05969582, a . 1 \boldsymbol{a}_{.1} a.1 的MAE是0.1148704, a . 2 \boldsymbol{a}_{.2} a.2 的MAE是0.1575343, a . 3 \boldsymbol{a}_{.3} a.3 的MAE是0.1565919。结果大致一样。
###my code###
Q <- matrix(1,J,K);Q[J,2:3] <- 0;Q[(J-1),3] <- 0;
##initial value###
A_initial <- matrix(0,J,K);A_initial[,1] <- runif(J,1,2);A_initial[,2] <- runif(J,1,2);A_initial[,3] <- runif(J,1,2);
A_initial <- A_initial*Q;
d_initial <- rnorm(J,0,1);D_initial <- cbind(d_initial,A_initial);
KK <- 20;theta_min <- -4;theta_max <- 4;mm1 <- seq(theta_min,theta_max,(theta_max-theta_min)/KK);mm <- mm1[-1]
THETA_tuta <- matrix(0,nrow=KK*KK*KK,ncol=3);THETA_tuta[,3] <- rep(mm,KK*KK);
THETA_tuta[,2] <-rep(c(rep(1,KK)%*%t(mm)),KK);THETA_tuta[,1] <-c(rep(1,KK*KK)%*%t(mm))#针对K <- 3的theta分块,获取theta的分块
THETA_tuta <- cbind(rep(1,nrow(THETA_tuta)),THETA_tuta)
theta_tmp <- rowSums(THETA_tuta[,2:4]*THETA_tuta[,2:4])/2
response <- t(response);A_0 <- t(D_initial)
cc <- exp(THETA_tuta%*%A_0%*%response-theta_tmp)/apply(1+exp((THETA_tuta%*%A_0)),1,prod);
theta_post <- sweep(cc, 2, colSums(cc), "/")
log_lik_0 <- sum(log(colSums(cc)))
disA <- 0.5;m <- A_0
timstart <- Sys.time()
for(j in 1:J){
A_grad <- c(c(theta_post%*%response[j,]-rowSums(theta_post*c(1/(1+exp(-THETA_tuta%*%A_0[,j])))))%*%THETA_tuta)
step <- 1/50
A_tuta <- A_0[,j]+step*A_grad;
aa <- A_0;aa[,j] <- A_tuta;
log_lik_1 <- sum(log(colSums(exp(THETA_tuta%*%aa%*%response-theta_tmp)/apply(1+exp((THETA_tuta%*%aa)),1,prod))));
while(log_lik_1<log_lik_0+step*sum(A_grad*A_grad)*0.2){
step <- step*disA;
A_tuta <- A_0[,j]+step*A_grad;
aa <- A_0;aa[,j] <- A_tuta;
log_lik_1 <- sum(log(colSums(exp(THETA_tuta%*%aa%*%response-theta_tmp)/apply(1+exp((THETA_tuta%*%aa)),1,prod))));
}
m[,j] <- A_tuta
}
A_0 <- m*t(cbind(rep(1,J),Q))
log_lik_1 <- sum(log(colSums(exp(THETA_tuta%*%A_0%*%response-theta_tmp)/apply(1+exp((THETA_tuta%*%A_0)),1,prod))));
eps <- log_lik_1-log_lik_0
while(eps>0.001){
log_lik_0 <- log_lik_1;
cc <- exp(THETA_tuta%*%A_0%*%response-theta_tmp)/apply(1+exp((THETA_tuta%*%A_0)),1,prod);
theta_post <- sweep(cc, 2, colSums(cc), "/")
for(j in 1:J){
A_grad <- c(c(theta_post%*%response[j,]-rowSums(theta_post*c(1/(1+exp(-THETA_tuta%*%A_0[,j])))))%*%THETA_tuta)
step <- 1/50
A_tuta <- A_0[,j]+step*A_grad;
aa <- A_0;aa[,j] <- A_tuta;
log_lik_1 <- sum(log(colSums(exp(THETA_tuta%*%aa%*%response-theta_tmp)/apply(1+exp((THETA_tuta%*%aa)),1,prod))));
while(log_lik_1<log_lik_0+step*sum(A_grad*A_grad)*0.2){
step <- step*disA;
A_tuta <- A_0[,j]+step*A_grad;
aa <- A_0;aa[,j] <- A_tuta;
log_lik_1 <- sum(log(colSums(exp(THETA_tuta%*%aa%*%response-theta_tmp)/apply(1+exp((THETA_tuta%*%aa)),1,prod))));
}
m[,j] <- A_tuta
}
A_0 <- m*t(cbind(rep(1,J),Q))
log_lik_1 <- sum(log(colSums(exp(THETA_tuta%*%A_0%*%response-theta_tmp)/apply(1+exp((THETA_tuta%*%A_0)),1,prod))));
eps <- log_lik_1-log_lik_0
}
timend <- Sys.time()
tt <- timend-timstart
tt
[1]:Béguin, A. A., & Glas, C. A. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541-561.
[2]:Fraser, C., & McDonald, R. P. (1988). NOHARM: Least squares item factor analysis. Multivariate behavioral research, 23(2), 267-269.
[3]:Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34(2), 183-202.
[4]:Bolt, D. M., & Lall, V. F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 27(6), 395-414.
[5]:Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 75(1), 33-57.
[6]:Sun, J., Chen, Y., Liu, J., Ying, Z., & Xin, T. (2016). Latent Variable Selection for Multidimensional Item Response Theory Models via L1 Regularization. psychometrika, 81(4), 921-939.