引自:http://www.cnmaizi.com/tech/driver/nand-flash-bad-block-handling-understanding/#comment-342
研究了几天的nand datasheet后,发现要真正做好Nand Flash的控制还是一件比较麻烦的事情。首先就是坏块(Invalid Block)的处理,初始情况下(出厂时),坏块被标记在每一个Block的Page0或1的2048列上,如果该地址的数据是0xff,那么可以证明该Block在出厂时(注意仅仅是出厂时,Flash的坏块会随着使用而慢慢增多)不是坏块,反之则是一个坏块,而且坏块的判断不能在芯片擦除后进行,因为坏块标记可能会被擦除(Datasheet上说的非常明确),在记录下来整个Flash的坏块信息后,就需要建立坏块映射表。
收集到的处理办法如下:将Flash划分为三个区,第一个区使用信息(包括坏块表,使用Block数目等,预留10个Block),第二个区为数据记录区,第三个区为坏块映射区(专用于替换坏块,为保留区,不能直接往里面写数据,为128个Block)。 第一次使用时,读取Block0中Page0的资料(一次全部读出),如果Page0的0、1Byte不是’O'、’K'(自己建立的”使用头”标志)的话,那么就运行初始化坏块扫描程序,记录坏块的数目和坏块地址,并且为他们分配坏块映射区的地址(依次分配),并且将坏块信息以及被替换块信息分别保存到两个数组中,检查完后,得到了所有坏块的数目和坏块的映射信息以及最后一个被替换块(用以决定下次替换的地址),将他们格式化后,分别写入到使用信息记录区(Block0-9)中,如果发现Block0编程失败,那么擦除该块直接跳至下一个信息记录Block,直到成功写进去,或者Block0-9全部失败(那么标记整个Flash失效,因为连续10个Block在生命周期内全部失效的几率非常小,所以基本可以忽略~)至此,已经记录下了所有坏块信息、替换信息以及使用状况。初始信息写进去后,下次重新启动后,系统从Page0至Page9检查是否有头标志,检测到头标志后,继而将所有坏块信息以及实使用信息分别读取到Ram中去,在正常使用中,将要写的Block号在坏块信息数组中查询,如果有,则直接写入被替换的块中,如果发现此时被替换块也是坏块,那么直接读出最后被替换块的值,将此值写到替换表中,并将最后被替换块+1,直到找到合适的替换块,此时将之前坏块的有用信息全部写入到被替换块中,然后继续写入新的数据,最后更新坏块表及坏块信息,并写至使用信息Flash块中,至此,Flash坏块替换完毕。
其次就是位交换问题了,Flash靠电荷储存来存储信息,当由于工艺问题造成某个位写入失败时,位交换发生,需要采用专用算法去校正。比较典型的算法就是ECC,专门用来校正Flash的单个位交换。校验方式涉及到众多的移位以及异或运算。
以下是三星Nand Flash芯片的参考资料:
http://www.samsung.com/global/business/semiconductor/products/flash/downloads/applicationnote/eccalgo_040624.pdf
ECC算法的代码如下:
/* */
/* PROJECT : SAMSUNG ECC */
/* FILE : SAMSUNG_ECC.c */
/* PURPOSE : This file implements core ECC algorithms adopted */
/* Hamming Error Correction and Detection Algorithm */
/* */
/*—————————————————————————*/
/* */
/* COPYRIGHT 2000-2004, SAMSUNG ELECTRONICS CO., LTD. */
/* ALL RIGHTS RESERVED */
/* */
/* Permission is hereby granted to licensees of Samsung Electronics */
/* Co., Ltd. products to use or abstract this computer program for the */
/* sole purpose of implementing a product based on Samsung */
/* Electronics Co., Ltd. products. No other rights to reproduce, use, */
/* or disseminate this computer program, whether in part or in whole, */
/* are granted. */
/* */
/* Samsung Electronics Co., Ltd. makes no representation or warranties */
/* with respect to the performance of this computer program, and */
/* specifically disclaims any responsibility for any damages, */
/* special or consequential, connected with the use of this program. */
/* */
/*—————————————————————————*/
/* */
/* REVISION HISTORY */
/* */
/* 13-NOV-2003 [Chang JongBaek] : first writing */
/* 03-MAR-2004 [ Kim YoungGon ] : Second writing */
/* 03-MAR-2004 [ Lee JaeBum ] : Third writing */
/*—————————————————————————*/
/* */
/* NOTES */
/* */
/* – Make ECC parity code of 512bytes(256words) and 3 bytes are represented */
/* And ECC compare & Correction code is also represented */
/* */
/*****************************************************************************/
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
#include <string.h>
#include “ecc.h”
#define XMODE 8
/*****************************************************************************/
/* Address Types */
/*****************************************************************************/
typedef unsigned char * address_t; /* address (pointer) */
typedef unsigned long address_value_t; /* address (for calculation) */
/*****************************************************************************/
/* Integer Types */
/*****************************************************************************/
typedef unsigned long uint32_t; /* unsigned 4 byte integer */
typedef signed long int32_t; /* signed 4 byte integer */
typedef unsigned short uint16_t; /* unsigned 2 byte integer */
typedef signed short int16_t; /* signed 2 byte integer */
typedef unsigned char uint8_t; /* unsigned 1 byte integer */
typedef signed char int8_t; /* signed 1 byte integer */
typedef enum {
ECC_NO_ERROR = 0, /* no error */
ECC_CORRECTABLE_ERROR = 1, /* one bit data error */
ECC_ECC_ERROR = 2, /* one bit ECC error */
ECC_UNCORRECTABLE_ERROR = 3 /* uncorrectable error */
} eccdiff_t;
/*****************************************************************************/
/* */
/* NAME */
/* make_ecc_512 */
/* DESCRIPTION */
/* This function generates 3 byte ECC for 512 byte data. */
/* (Software ECC) */
/* PARAMETERS */
/* ecc_buf the location where ECC should be stored */
/* data_buf given data */
/* RETURN VALUES */
/* none */
/* */
/*****************************************************************************/
#if (XMODE == 8)
void make_ecc_512(uint8_t * ecc_buf, uint8_t * data_buf)
#else
void make_ecc_512(uint16_t * ecc_buf, uint16_t * data_buf)
#endif
{
uint32_t i, ALIGN_FACTOR;
uint32_t tmp;
uint32_t uiparity = 0;
uint32_t parityCol, ecc = 0;
uint32_t parityCol4321 = 0, parityCol4343 = 0, parityCol4242 = 0, parityColTot = 0;
uint32_t *Data;
uint32_t Xorbit=0;
ALIGN_FACTOR = (uint32_t)data_buf % 4 ;
Data = (uint32_t *)(data_buf + ALIGN_FACTOR);
for( i = 0; i < 16; i++)
{
parityCol = *Data++;
tmp = *Data++; parityCol ^= tmp; parityCol4242 ^= tmp;
tmp = *Data++; parityCol ^= tmp; parityCol4343 ^= tmp;
tmp = * Data ++; parityCol ^= tmp; parityCol4242 ^= tmp;
tmp = * Data ++; parityCol ^= tmp; parityCol4242 ^= tmp;
parityColTot ^= parityCol;
tmp = (parityCol >> 16) ^ parityCol;
tmp = (tmp >> 8) ^ tmp;
tmp = (tmp >> 4) ^ tmp;
tmp = ((tmp >> 2) ^ tmp) & 0×03;
if ((tmp == 0×01) || (tmp == 0×02))
{
uiparity ^= i;
Xorbit ^= 0×01;
}
}
#if (XMODE == 8)
tmp = (parityCol4321 >> 16) ^ parityCol4321;
tmp = (tmp << 8) ^ tmp;
tmp = (tmp >> 4) ^ tmp;
tmp = (tmp >> 2) ^ tmp;
ecc |= ((tmp << 1) ^ tmp) & 0×200; // p128
#else
tmp = (parityCol4321 >> 16) ^ parityCol4321;
tmp = (tmp >> 8) ^ tmp;
tmp = (tmp << 4) ^ tmp;
tmp = (tmp << 2) ^ tmp;
ecc |= ((tmp << 1) ^ tmp) & 0×80; // p128
#endif
#if (XMODE == 8)
tmp = (parityCol4343 >> 16) ^ parityCol4343;
tmp = (tmp >> 8) ^ tmp;
tmp = (tmp << 4) ^ tmp;
tmp = (tmp << 2) ^ tmp;
ecc |= ((tmp << 1) ^ tmp) & 0×80; // p64
#else
tmp = (parityCol4343 >> 16) ^ parityCol4343;
tmp = (tmp >> 8) ^ tmp;
tmp = (tmp << 4) ^ tmp;
tmp = (tmp >> 2) ^ tmp;
ecc |= ((tmp << 1) ^ tmp) & 0×20; // p64
#endif
#if (XMODE == 8)
tmp = (parityCol4242 >> 16) ^ parityCol4242;
tmp = (tmp >> 8) ^ tmp;
tmp = (tmp << 4) ^ tmp;
tmp = (tmp >> 2) ^ tmp;
ecc |= ((tmp << 1) ^ tmp) & 0×20; // p32
#else
tmp = (parityCol4242 >> 16) ^ parityCol4242;
tmp = (tmp >> 8) ^ tmp;
tmp = (tmp >> 4) ^ tmp;
tmp = (tmp << 2) ^ tmp;
ecc |= ((tmp << 1) ^ tmp) & 0×08; // p32
#endif
#if (XMODE == 8)
tmp = parityColTot & 0xFFFF0000;
tmp = tmp >> 16;
tmp = (tmp >> 8) ^ tmp;
tmp = (tmp >> 4) ^ tmp;
tmp = (tmp << 2) ^ tmp;
ecc |= ((tmp << 1) ^ tmp) & 0×08; // p16
#else
tmp = parityColTot & 0xFFFF0000;
tmp = tmp >> 16;
tmp = (tmp >> 8) ^ tmp;
tmp = (tmp >> 4) ^ tmp;
tmp = (tmp >> 2) ^ tmp;
ecc |= ((tmp << 1) ^ tmp) & 0×02; // p16
#endif
#if (XMODE == 8)
tmp = parityColTot & 0xFF00FF00;
tmp = (tmp >> 16) ^ tmp;
tmp = (tmp >> 8);
tmp = (tmp >> 4) ^ tmp;
tmp = (tmp >> 2) ^ tmp;
ecc |= ((tmp << 1) ^ tmp) & 0×02; // p8
#else
tmp = parityColTot & 0xFF00FF00;
tmp = (tmp << 16) ^ tmp;
tmp = (tmp >> 8);
tmp = (tmp << 4) ^ tmp;
tmp = (tmp << 2) ^ tmp;
ecc |= ((tmp << 1) ^ tmp) & 0×800000; // p8
#endif
#if (XMODE == 8)
tmp = parityColTot & 0xF0F0F0F0 ;
tmp = (tmp << 16) ^ tmp;
tmp = (tmp >> 8) ^ tmp;
tmp = (tmp << 2) ^ tmp;
ecc |= ((tmp << 1) ^ tmp) & 0×800000; // p4
#else
tmp = parityColTot & 0xF0F0F0F0 ;
tmp = (tmp << 16) ^ tmp;
tmp = (tmp >> 8) ^ tmp;
tmp = (tmp >> 2) ^ tmp;
ecc |= ((tmp << 1) ^ tmp) & 0×200000; // p4
#endif
#if (XMODE == 8)
tmp = parityColTot & 0xCCCCCCCC ;
tmp = (tmp << 16) ^ tmp;
tmp = (tmp >> 8) ^ tmp;
tmp = (tmp << 4) ^ tmp;
tmp = (tmp >> 2);
ecc |= ((tmp << 1) ^ tmp) & 0×200000; // p2
#else
tmp = parityColTot & 0xCCCCCCCC ;
tmp = (tmp << 16) ^ tmp;
tmp = (tmp >> 8) ^ tmp;
tmp = (tmp >> 4) ^ tmp;
ecc |= ((tmp << 1) ^ tmp) & 0×80000; // p2
#endif
#if (XMODE == 8)
tmp = parityColTot & 0xAAAAAAAA ;
tmp = (tmp << 16) ^ tmp;
tmp = (tmp >> 8) ^ tmp;
tmp = (tmp >> 4) ^ tmp;
tmp = (tmp << 2) ^ tmp;
ecc |= (tmp & 0×80000); // p1
#else
tmp = parityColTot & 0xAAAAAAAA ;
tmp = (tmp << 16) ^ tmp;
tmp = (tmp >> 8) ^ tmp;
tmp = (tmp >> 4) ^ tmp;
tmp = (tmp >> 2) ^ tmp;
ecc |= (tmp & 0×20000); // p1
#endif
#if (XMODE == 8)
ecc |= (uiparity & 0×01) <<11;
ecc |= (uiparity & 0×02) <<12;
ecc |= (uiparity & 0×04) <<13;
ecc |= (uiparity & 0×08) <<14;
#else
ecc |= (uiparity & 0×01) <<9;
ecc |= (uiparity & 0×02) <<10;
ecc |= (uiparity & 0×04) <<11;
ecc |= (uiparity & 0×08) <<12;
#endif
if (Xorbit)
{
ecc |= (ecc ^ 0x00AAAAAA)>>1;
}
else
{
ecc |= (ecc >> 1);
}
#if (XMODE == 8)
ecc = ~ecc;
*(ecc_buf + 2) = (uint8_t) (ecc >> 16);
*(ecc_buf + 1) = (uint8_t) (ecc >> 8);
*(ecc_buf + 0) = (uint8_t) (ecc);
#else // X16
ecc = ( ~ecc ) | 0xFF000000;
*(ecc_buf + 1) = (uint16_t) (ecc >> 16);
*(ecc_buf + 0) = (uint16_t) (ecc);
#endif
}
/*****************************************************************************/
/* */
/* NAME */
/* compare_ecc_512 */
/* DESCRIPTION */
/* This function compares two ECCs and indicates if there is an error.*/
/* PARAMETERS */
/* ecc_data1 one ECC to be compared */
/* ecc_data2 the other ECC to be compared */
/* page_data content of data page */
/* corrected correct data */
/* RETURN VALUES */
/* Upon successful completion, compare_ecc returns SSR_SUCCESS. */
/* Otherwise, corresponding error code is returned. */
/* */
/*****************************************************************************/
#if (XMODE == 8)
eccdiff_t compare_ecc_512( uint8_t * iEccdata1 , uint8_t * iEccdata2 ,
uint8_t * pPagedata , int32_t pOffset , uint8_t pCorrected)
#else // X16
eccdiff_t compare_ecc_512( uint16_t * iEccdata1 , uint16_t * iEccdata2 ,
uint16_t * pPagedata , int32_t pOffset , uint16_t pCorrected)
#endif
{
uint32_t iCompecc = 0, iEccsum = 0;
uint32_t iFindbyte = 0;
uint32_t iIndex;
uint32_t nT1 = 0, nT2 =0;
#if (XMODE == 8)
uint8_t iNewvalue;
uint8_t iFindbit = 0;
uint8_t *pEcc1 = (uint8_t *)iEccdata1;
uint8_t *pEcc2 = (uint8_t *)iEccdata2;
for ( iIndex = 0; iIndex <2; iIndex++)
{
nT1 ^= (((*pEcc1) >> iIndex) & 0×01);
nT2 ^= (((*pEcc2) >> iIndex) & 0×01);
}
for (iIndex = 0; iIndex < 3; iIndex++)
iCompecc |= ((~(*pEcc1++) ^ ~(*pEcc2++)) << iIndex * 8);
for(iIndex = 0; iIndex < 24; iIndex++) {
iEccsum += ((iCompecc >> iIndex) & 0×01);
}
#else // X16
uint16_t iNewvalue;
uint16_t iFindbit = 0;
uint16_t *pEcc1 = (uint16_t *)iEccdata1;
uint16_t *pEcc2 = (uint16_t *)iEccdata2;
for ( iIndex = 0; iIndex <2; iIndex++)
{
nT1 ^= (((*pEcc1) >> iIndex) & 0×01);
nT2 ^= (((*pEcc2) >> iIndex) & 0×01);
}
for (iIndex = 0; iIndex < 2; iIndex++) // 2 word of ECC data
iCompecc |= (((~*pEcc1++) ^ (~*pEcc2++)) << iIndex * 16);
for(iIndex = 0; iIndex < 24; iIndex++) {
iEccsum += ((iCompecc >> iIndex) & 0×01);
}
#endif
switch (iEccsum) {
case 0 :
printf(“RESULT : no errorn“);
return ECC_NO_ERROR;
case 1 :
printf(“RESULT : ECC code 1 bit failn“);
return ECC_ECC_ERROR;
case 12 :
if (nT1 != nT2)
{
iFindbyte = (( iCompecc >> 17 & 1) << 8) + (( iCompecc >> 15 & 1) << 7) + (( iCompecc >> 13 & 1) << 6)
+ (( iCompecc >> 5 & 1) << 2) + (( iCompecc >> 3 & 1) << 1) + ( iCompecc >> 1 & 1);
iNewvalue = ( uint16_t)( pPagedata [ iFindbyte ] ^ ( 1 << iFindbit));
#endif
printf( “RESULT : one bit error rn “);
printf( “byte = %d, bit = %d rn “ , iFindbyte , iFindbit);
printf( “corrupted = %x, corrected = %x rn “ ,
if (pOffset != NULL) {
pOffset = iFindbyte;
}
if (pCorrected != NULL) {
pCorrected = iNewvalue;
}
return ECC_CORRECTABLE_ERROR;
}
else
return ECC_UNCORRECTABLE_ERROR;
default :
printf(“RESULT : unrecoverable errorn“);
return ECC_UNCORRECTABLE_ERROR;
}
}
one Comment
对于nand最大的问题就是会有bad block,由于bad block的不确定性,所以进一步加大了对nand编程访问的难度。所以只有解决了bad block的问题才可能使用nand,将bad block处理的好才会最大的提升nand的access效率。
什么是bad block呢?就是在这一个block里有1个或多个bit的状态不能稳定的编程,所以就没法使用它,但是如果一个block(128KByte)有一个Bit是坏的,那么整个block放弃使用。听起来有点浪费,可能是根据物理原理使整个block的稳定性不能保证吧,或者是其他考虑。不过既然三星要求我们这么做,那么为了系统的稳定,也不要计较那几百KB的容量了。
bad block有2种,一种是initial bad block,另一种是runtime bad block。所谓initial bad block就是在三星出厂时就是坏块的。为什么出厂会有坏块?这个很正常,因为nand就是会有坏块,比LCD有坏点的几率大的多。但是可以放心的是三星在出厂前都对nand进行了测试,erase了所有的block,所以内容会都是0xFF,同时标记了bad block,这些即是initial bad block。每一种容量的nand允许的最多坏块数都是有规定的,低于某个值是不会出厂的,同时低于98%的可用块,基本就会认为这块nand很不稳定了。
initial bad block的标记是保证坏块的前2个page里的spare area里的第一个字节的内容不会是0xFF。即如果前2个page其中一个的2048地址(从0开始计数,以后都是这种计数方式)上的数据(就是spare area的第一个字节)是0xFF,那么这个块不是initial bad block。值得注意的一点是bad block的标记会被erase掉成0xFF,所以有可能会误被认为是good block。所以在对每一个block earse之前一定要判断是不是bad block,否则将bad block的标志erase了,以后使用的时候会误被当作good block,从而有可能带来数据损失。
对于runtime的bad block,就是使用中出现的bad block,可以从erase或program后对成功与否的判断来决定这个block是否变成bad block,如果是的话则标记。一般标记的法则和initial bad block一样,在前两个page的spare area的第一个byte上写非0xFF值,一般写0。
对于bad block的处理办法就是驱动层的问题了,做的好的话会很复杂,这个以后再介绍诸如三星的pocketstoreII这样的专门的nand driver来提供高可靠性和高perfermance。最简单的就是遇到坏块向后跳,但也是效率最差的,因为访问第n块时,你要知道0–(n-1)块一共有多少个bad block才能决定操作的偏移量,而扫描n-1块对于n比较大的情况无疑会很慢。