0.摘要
生成服从对数正态的样本,写出关于参数的极大似然函数,对极大似然函数求最值,求得参数估计值。比较R的不同优化函数、不同样本量、不同估计次数的MSE和Bias。
1.产生100个随机样本
set.seed(100)
lnormdata1 <- rlnorm(100, 2, 3)
2.利用不同方法计算参数的极大似然估计
Step1: Set the initial value
theta0 <- c(1, 2)
data0 <- log(lnormdata1)
Step2: Give the likelihood function
NormHood <- function(data, params){
like <- sum(log(dnorm(data, params[1], params[2])))
return(-like)
}
Step3: MLE by 4 kinds of R functions
fit1 <- constrOptim(theta =theta0, data = data0, f = NormHood, grad = NULL, ui = diag(2), ci = c(0,1), outer.iterations = 1000)
fit2 <- nlminb(start = theta0, objective = NormHood, data = data0)
fit3 <- optim(par = theta0, fn = NormHood, data = data0)
fit4 <- nlm(NormHood, p = theta0, data = data0)
Step4: Report the results
result1 <- data.frame(constrOptim_result = fit1$par, nlminb_result = fit2$par, optim_result = fit3$par, nlm_result = fit4$estimate, row.names = c("μ", "σ"))
kable(result1)
constrOptim_result | nlminb_result | optim_result | nlm_result | |
---|---|---|---|---|
μ | 2.008688 | 2.008739 | 2.008813 | 2.008739 |
σ | 3.047036 | 3.046783 | 3.046608 | 3.046780 |
由上面的表格可以看出,constrOptim方法估计出的μ最接近真实值,optim方法估计出的σ最接近真实值.
3.不同方法、样本量、估计次数下比较Bias和MSE,并画出盒图。
1) 初始化结果输出
snum <- c(500, 1000, 2000)
etimes <- c(800, 1500, 3000)
result2 <- vector(mode = "list", length = length(etimes))
for (i in 1 : length(etimes)){
result2[[i]] <- array(data = NA, dim = c(etimes[i], 3, 2, 2), dimnames = list(c(1 : etimes[i]), c("s500", "s1000", "s2000"), c("constrOptim", "nlminb"), c("μ", "σ")))
}
names(result2) <- c("t800", "t1500", "t3000")
以result2作为结果输出列表,result2的三个元素为array类型,分别为估计次数为800、1500、3000时的估计结果,result2的单个元素为具有三个维度的array,三个维度分别表示不同样本量:500、1000、2000,不同估计方法:constrOptim、nlminb,不同估计参数: μ ^ \hat{μ} μ^、 σ ^ \hat{σ} σ^。
2)编写循环进行估计
for (i in 1 : length(snum)){
for (j in 1 : length(etimes)){
for (k in 1 : etimes[j]){
lnormdata2 <- rlnorm(snum[i], 2, 3)
temp1 <- constrOptim(theta = c(3, 3), data = log(lnormdata2), f = NormHood, grad = NULL, ui = diag(2), ci = c(0, 1), outer.iterations = 1000)
temp2 <- nlminb(start = c(1, 1), objective = NormHood, data = log(lnormdata2))
result2[[j]][k, i, 1, 1] <- temp1$par[1]
result2[[j]][k, i, 1, 2] <- temp1$par[2]
result2[[j]][k, i, 2, 1] <- temp2$par[1]
result2[[j]][k, i, 2, 2] <- temp2$par[2]
}
}
}
3)计算Bias和MSE
在这里我定义了一个integ_off函数,用来删去一个实数的整数部分,构造这个函数的原因是我把 μ ^ \hat{μ} μ^和 σ ^ \hat{σ} σ^融在result2的某一层了,相应地在result2的理论值result2_mean比较复杂,难以利用result2 - result2_mean来求得Bias,而Bias又一定小于1,所以可以借助inter_off来求Bias.
BIAS <- NULL
integ_off <- function(x){return(x-round(x))}
BIAS <- list(t800 = integ_off(apply(result2[[1]], c(2, 3, 4), mean)), t1500 = integ_off(apply(result2[[2]], c(2, 3, 4), mean)), t3000 = integ_off(apply(result2[[3]], c(2, 3, 4), mean)))
MSE <- NULL
MSE <- list(t800 = BIAS$t800 ^ 2 + apply(result2[[1]], c(2, 3, 4), var), t1500 = BIAS$t1500 ^ 2 + apply(result2[[2]], c(2, 3, 4), var), t3000 = BIAS$t3000 ^ 2 + apply(result2[[3]], c(2, 3, 4), var))
5)比较不同重复次数样本量估计的Bias和MSE
kable(BIAS)
kable(MSE)
上面的两张表格从上向下分别是估计次数为800、1500、3000的参数估计结果的BIAS和MSE。
观察表格可以得出以下结论
a.大体上,估计次数越多,BIAS和BIAS越小,局部会有反常。
b.大体上,样本量越多,MSE和BIAS越小,局部会有反常。
4)画出盒图
a.比较不同方法
boxplot(result2[[1]][,1,,1],ylab = "μ", main = "different MLE functions when t800 and s500")
boxplot(result2[[2]][,1,,2],ylab = "σ", main = "different MLE functions when t1500 and s1000")
boxplot(result2[[3]][,3,,1],ylab = "μ", main = "different MLE functions when t3000 and s2000")
从上图可以看出,不同情况下,两种优化函数的结果没有显著的差异,但是否真的没有差异还需要进一步检验。
b.比较不同样本量
boxplot(result2[[1]][,,1,1], ylab = "μ", main = "different sample size when t800 and constrOptim")
boxplot(result2[[2]][,,2,2], ylab = "σ", main = "different sample size when t1500 and nlminb")
boxplot(result2[[3]][,,1,1], ylab = "μ", main = "different sample size when t3000 and constrOptim")
从上图可以看出,不同情况下,样本量越大。估计越接近真实值。
c.比较不同估计次数
boxplot(list(t800 = result2[[1]][,1,1,1], t1500 = result2[[2]][,1,1,1], t3000 = result2[[3]][,1,1,1]), ylab = "μ", main = "different estimate times when s500 and constrOptim")
boxplot(list(t800 = result2[[1]][,2,2,2], t1500 = result2[[2]][,2,2,2], t3000 = result2[[3]][,2,2,2]), ylab = "σ", main = "different estimate times when s1000 and nlminb")
boxplot(list(t800 = result2[[1]][,3,2,1], t1500 = result2[[2]][,3,2,1], t3000 = result2[[3]][,3,2,1]), ylab = "μ", main = "different estimate times when s2000 and nlminb")
从上图可以看出,不同情况下,随着估计次数的增大,估计值的方差没有明确的趋势。
6)比较两种方法优劣
由于从上面的表格和图像,我看不出哪种方法更加好,尝试从整体上比较两种方法,即从以下两个维度考虑对两种方法的比较:第一,将BIAS中的数据求绝对值,然后根据两种方法求和,最后相减。第二,计算6对数据中,前一种方法比后一种方法大的数量。
sapply(BIAS, function(x){a <- sum(abs(x[,1,]) - abs(x[,2,]))
b<-sum(abs(x[,1,]) > abs(x[,2,]))
return(c(a,b))})
t800 t1500 t3000
[1,] -9.650769e-05 -0.0001977652 -0.0002651535
[2,] 2.000000e+00 1.0000000000 0.0000000000
sapply(MSE, function(x){a <- sum(abs(x[,1,]) - abs(x[,2,]))
b<-sum(abs(x[,1,]) > abs(x[,2,]))
return(c(a,b))})
t800 t1500 t3000
[1,] 1.010099e-06 -3.027061e-08 2.872811e-06
[2,] 4.000000e+00 3.000000e+00 6.000000e+00
根据上面的结果,对于BIAS,constrOptim整体上比nlminb小,对于MSE,constrOptim整体上比nlminb大,这产生了矛盾,再考虑上面的结果,我认为我无法在这种情景下比较出两种方法的优劣。