2020-10-22 文章阅读 EIN

文章阅读 - Understanding and Modeling On-Die Error Correction in Modern DRAM: An Experimental Study Using Real Devices

1. Summary

1)for what problem?

目前的DRAM chip很多使用了ECC,并且对于研究者能看到的测试数据也是经过ECC纠错以后的数据,而这掩盖了原本的错误发生的分布。

Unfortunately, recent DRAM technology scaling issues are forcing manufacturers to adopt on-die error-correction codes  ECC), which pose a significant challenge for DRAM error characterization studies by obfuscating raw error distributions using undocumented, proprietary, and opaque errorcorrection hardware. As we show in this work, errors observed in devices with on-die ECC no longer follow expected, well-studied distributions (e.g., lognormal retention times) but rather depend on the particular ECC scheme used.

2) key idea?

 . Our approach is based on the key idea that even though ECC obfuscates the exact locations of the pre-correction errors, we can leverage known statistical properties of pre-correction error distributions (e.g., uniform-randomness [5, 57, 98, 112]) in order to disambiguate the effects of different ECC schemes (Section 4)

3) Mechanism?

EIN uses maximum a posteriori (MAP) estimation over statistical models that we develop to represent ECC operation to: i) reverse-engineer the ECC scheme and ii) infer the pre-correction error rates given only the post-correction errors. We design and publicly release EINSim, a flexible open-source simulator that can apply EIN to a wide variety of DRAM devices and standards.

对于已知所有的Cj',对于任意的可能的w'都可以求出概率

因为列举所有的可能的w'数量过多(2^64),将w归类到几个Wn', n ∈(0,N)

将所有的环境因素(e.g.芯片结构)影响因子算入θ,(实际上(Fi,θ)等价于Fi'?)

现在,我们的工作是,对于一个不知道的ECC算法(F unknown),基于观察O,推测出最可能的F

根据贝叶斯公示

舍去分母的P[O]是因为并不影响何时取最大值。

with j = 0 to jmax growing as time increasing, we can observe each n-j for the first j bursts, and the actual sequence must be same to N(as time goes, total error bits = [0, 0, 0, 1, 2, 2, 2,...] means time 4 and 5 has error bits)

因为不知道θ,所以首先假设实验是在对于任意Fi都是在另P[O|Fi]最大的环境下进行的

这样可以求出F unknown,并且在求出F后,通过

可以进一步得出实验环境的θ

4) Result?

we show that EIN enables: i) reverse-engineering the on-die ECC scheme, which we find to be a single-error correction Hamming code with (n = 136, k = 128, d = 3), ii) inferring pre-correction error rates given only post-correction errors, and iii) recovering the well-studied precorrection error distributions that on-die ECC obfuscates.

2. Strengths

 Can get the original ECC algorithms and pre-ECC error rates, and the pre-ECC error pattern is essential for new ECC technique development and test and evaluation. (and this really work with the example of Retention time)

Also work for retention time of different temperatures.

 

3. Weakness

Can not determine which ECC algorithm exactly with confidence. (But with some knowledge user can have basic idea about the algorithms)

Cannot determine where exactly the error happen(it is not very possible to really implement such a simulator...)

 

4. Takeaway

根据以前的研究,ECC在一定的温度下对时间不显现规率 -> 实际上这表明error bit以uniform distribution(均匀分布)出现

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值