UBI - Unsorted Block Images（未排序的块图像）（14-18）

UBI - Unsorted Block Images

How a UBI flasher should work

UBI应如何工作

The following is a list of what a UBI flasher program has to do when erasing the flash or when writing UBI images.

以下是UBI闪存程序在擦除闪存或写入UBI镜像时必须执行的操作列表。

First, scan the flash and collect the erase counters. Namely, it reads the EC header from each PEB, checks the CRC-32 checksum of the header, and saves the erase counter in RAM. It is not necessary to read the VID headers. Bad PEBs should be skipped.
首先，扫描整块flash并收集擦除计数器。即，读取每块PEB的EC头部，检查头部的CRC-32校验，在RAM中保存擦除计数器。读取VID头部的操作并不是必须的。同时，应该跳过坏的PEB。
Next, calculate the average erase counter. This will be used for PEBs with corrupted or missing EC headers. Such PEBs may occur due to unexpected reboots, but there shouldn't be too many of them.
然后，计算平均的擦除次数。这主要用于那些损坏或丢失EC头部的PEB。这些有问题的PEB可能是由于异常重启导致的，但异常PEB的数量不应该太多。
If the intention is to just erase the flash, then each PEB has to be erased and a proper EC header has to be written at the beginning of the PEB. The EC header should contain the updated erase counter. Bad PEBs should be skipped. For NAND flashes, in the case of I/O errors while erasing or writing, the PEB should be marked as bad (see here for more information on how UBI marks PEBs as bad).
如果只是为了擦除flash，那么每个PEB都要被擦除并且要在每个PEB的起始位置写入合适的EC头部。EC头部需要包含升级过的擦除计数器。坏的PEB需要被跳过。对于NAND flash，在擦除或者写入时可能会出现I/O错误，出现这些错误时要将问题块标志为坏块。
If the intention is to flash an UBI image, then the flasher should do the following for each non-bad PEB.
如果是为了刷新UBI镜像，那么需要对每个好的PEB块做如下动作。
- Read the contents of this PEB from the UBI image (PEB size bytes) into a buffer.
  将此PEB的内容从UBI镜像(PEB大小字节)读入缓冲区。
- Strip minimum I/O units full of 0xFF bytes from the end of the buffer (the details are given below).
  从缓冲区末尾剥离充满0xFF字节的最小I/O单元(详细信息如下)。
- Erase the PEB.
  擦除PEB。
- Change the EC header in the buffer - put the new erase counter value there and re-calculate the CRC-32 checksum.
  修改缓冲区中的EC头部 - 放入新的擦除计数器并重新计算CRC-32验证码。
- Write the buffer to the physical eraseblock.
  将缓冲区的数据写入物理可擦除块。
As always, bad PEBs should be skipped, and for NAND flashes, in the case of I/O errors while erasing or writing, the PEB should be marked as bad.
同样，跳过坏块，对于NAND flash，在擦除或者写入时发生I/O错误，需要将PEB标记为坏块。

In practice the input UBI image is usually shorter than the flash, so the flasher has to flash the used PEBs properly, and erase the unused PEBs properly.
实际上，输入的UBI镜像大小通常小于闪存总荣浪，所以闪存必须正确地刷新使用过的PEB，并正确擦除未使用的PEB。

Note, when writing an UBI image, it does not matter where eraseblocks from the input UBI image are written. For example, the first input eraseblock may be written to the first PEB, or to the second one, or to the last one.

请注意，写入UBI镜像时，输入的UBI镜像中的擦除块写入位置并不重要。例如，第一个输入擦除块可以写入第一个PEB，或者写入第二个，或者写入最后一个。

Also note, if you create a flasher to write UBI images at the time of production, (i.e., new flash, only once) then the flasher does not have to change the EC headers of the input UBI image, because this is new flash and each PEB has zero erase counter anyway. This means the production-line flasher may be simpler.

另请注意，如果你创建了一个程序在生产时写入UBI镜像(即，新闪存，第一次写入)，则不必更改输入UBI镜像的EC标头，因为这是新的闪存，每个PEB都有零擦除计数器。这意味着生产线的闪存程序可能会更简单。

If your UBI image contains a UBIFS file system, and your flash is NAND, you may have to insert 0xFF bytes at the end of your input PEB data. This is very important, although not required for all NAND flashes. Sometimes a failure to do this may result in very unpleasant problems which might be difficult to debug later on. So we recommend to always do this.

如果你的UBI镜像包含一个UBIFS文件系统，而且你的flash是NAND，那么你可能需要在输入的PEB数据末尾补充0xFF。这一点非常重要，尽管并非所有NAND闪存都需要。有时，如果没有做到这一点，可能会导致非常令人不快的问题，以后可能很难进行调试。因此，我们建议您始终这样做。

The reason for this is that UBIFS treats NAND pages which contain only 0xFF bytes (let's refer them to as empty NAND pages) as free. For example, suppose the first NAND page of a PEB has some data, the second one is empty, the third one also has some data, the fourth one and the rest of NAND pages are empty as well. In this case UBIFS will treat all NAND pages starting from the fourth one as free, and will write data there. If the flasher program has already written 0xFF's to these pages, then any new UBIFS data will cause a second write. However, many NAND flashes require NAND pages to be written only once, even if the data contains only 0xFF bytes.

这样做的原因是，UBIFS将数据全为0xFF的NAND页当做未使用的页（让我们将它们称为空的NAND页）。例如，假设NAND的某个块的第一个页有数据，第二个页是空的，第三个页有数据，第四个和剩下的其他页都是空的。在这种情况下，UBIFS会将从第四页开始的所有页认为是空的NAND页，并且会在上面写数据。如果闪存程序已将0xFF写入这些页面，则任何新的UBIFS数据都将导致第二次写入。然而，许多NAND闪存只需要写入一次NAND页，即使数据仅包含0xFF字节。

To put it differently, writing 0xFF bytes may have side-effects. What the flasher has to do is to drop all empty NAND pages from the end of the PEB buffer before writing it. It is not necessary to drop all empty NAND pages, just the last ones. This means that the flasher does not have to scan the whole buffer for 0xFF's. It is enough to scan the buffer from the end, and stop on the first non-0xFF byte. This is much faster. Here is the code from UBI which does the right thing:

换句话说，写入0xFF字节可能会有副作用。程序必须做的是在写入PEB缓冲区之前从PEB缓冲区的末尾删除所有空的NAND页。没必要删除所有空的NAND页，只要最后那些空的页就可以。这意味着不必扫描整个缓冲区。从缓冲区的末尾开始检查即可，在遇到第一个非0xFF时停下。这将大大提升速度。以下是来自UBI的代码，它做了正确的事情：

/**
 * calc_data_len - calculate how much real data are stored in a buffer.
 * @ubi: UBI device description object
 * @buf: a buffer with the contents of the physical eraseblock
 * @length: the buffer length
 *
 * This function calculates how much "real data" is stored in @buf and returns
 * the length. Continuous 0xFF bytes at the end of the buffer are not
 * considered as "real data".
 */
int ubi_calc_data_len(const struct ubi_device *ubi, const void *buf,
                      int length)
{
        int i;

        for (i = length - 1; i >= 0; i--)
                if (((const uint8_t *)buf)[i] != 0xFF)
                        break;

        /* The resulting length must be aligned to the minimum flash I/O size */
        length = ALIGN(i + 1, ubi->min_io_size);
        return length;
}

This function is called before writing the buf buffer to the PEB. The purpose of this function is to drop 0xFF's from the end and prevent the situation described above. The ubi->min_io_size is the minimal I/O unit size, which is equivalent to the NAND page size.

在将buf缓冲区写入PEB前调用该程序。此功能的目的是从末尾丢弃0xFF，防止出现上述情况。ubi->min_io_size 就是最小的I/O单元，其值与NAND页大小相等。

By the way, we experienced similar problems with JFFS2. The JFFS2 images generated by the mkfs.jffs2 program were padded to the physical eraseblock size and were later flashed to our NAND. The flasher did not bother to skip empty NAND pages. When JFFS2 was mounted, it wrote to those NAND pages, and the writes did not fail. But later we observed weird ECC errors. It took a while to find out the problem. In other words, this is also relevant to JFFS2 images.

顺便一提，我们在JFFS2上也见过相似的问题。Mkfs.jffs2程序生成的JFFS2图像被填充到物理擦除块大小，然后被刷新到我们的NAND。没有跳过空的NAND页。当挂载JFFS2时，会写入某些NAND页，并且在写入时没有出现失败。但是之后发生了ECC错误。花了许多时间才找到问题。换句话说，这也与JFFS2图像相关。

An alternative to this approach is to enable the "free space fixup" option when generating the UBIFS file system using mkfs.ubifs. This will allow your flasher to not have to worry about 0xFF bytes at the end of PEBs, which is particularly useful if you need to use an industrial flash programmer to write a UBI image. More information is available here.

此方法的另一种选择是在使用mkfs.ubis生成UBIFS文件系统时启用“可用空间修复”选项。这将使您的闪存不必担心PEB末尾的0xFF字节，这在您需要使用工业闪存编程器写入UBI镜像时特别有用。
有关更多信息，请单击此处。

Marking eraseblocks as bad

标志坏块

This section is relevant for NAND flashes as well as other flashes which exhibit bad eraseblocks. UBI marks physical eraseblocks as bad in the following 2 scenarios:

本节与NAND闪存以及其他显示错误擦除块的闪存相关。在以下两种情况下，UBI会将物理擦除块标记为坏块：

an eraseblock write operation failed, in which case UBI moves the data from this PEB to some other PEB (data recovery) and schedules this PEB for torturing;
擦除块写入操作失败，在这种情况下，UBI将数据从该PEB移动到某个其他PEB(数据恢复)，并调度该PEB进行磨损检查；
the erase operation failed with EIO error, in which case the eraseblock s marked as bad immediately.
擦除操作失败，出现 EIO 错误，在这种情况下，擦除块被立即标记为坏块。

The torturing is done in the background for the purpose of detecting whether the physical eraseblock is actually bad. The write failure could have occurred for one of many reasons, including bugs in the driver or in the upper level stuff like the file system (e.g., the FS mistakenly writes many times to the same NAND page). During the torturing UBI does the following:

磨损检查是在后台进行的，目的是检测物理擦除块是否真的是坏的。发生写入失败的原因可能有很多，包括驱动程序中的错误或文件系统等较高级别内容中的错误(例如，FS多次错误地写入同一NAND页)。在磨损检查中UBI会做这些事：

erase the eraseblock;
擦除块；
read it back and make sure it contains only 0xFF bytes;
读数据并确认它只包含0xFF;
write test pattern bytes;
写入测试数据；
read the eraseblock back and check the pattern;
读数据并将读到的数据与测试收据比较；
and so on for several patterns (0xA5, 0x5A, 0x00).
依此类推，适用于几种模式(0xA5, 0x5A, 0x00)。

The eraseblock is not marked as bad if it survives the torture test. However, a bit-flip during the torture test is a good reason to mark the eraseblock as bad. Please, refer to the torture_peb() function for detailed information.

如果通过了磨损测试，那么这个块就不会被标记为坏块。但是，如果在磨损测试中发生了位反转，那么这个块就可能被错误的标记为坏块了。可以查看 torture_peb() 函数来了解更多详细信息。

Scalability issues

可扩展性问题

Unfortunately, UBI performance scales linearly with flash size. UBI initialization time is directly proportional to the number of physical eraseblocks on the flash. This means that the larger the flash, the more time it takes for UBI to initialize (i.e., to attach the MTD device). Note: Starting with Linux v3.7 UBI offers an optional and experimental feature called "fastmap", which allows attaching in nearly constant time, see Fastmap. The initialization time depends on the flash I/O speed and (slightly) on the CPU speed, because:

不幸的是，UBI性能随闪存大小线性扩展。UBI初始化所用的时间与闪存上的物理块数量成正比。这意味着闪存越大，UBI初始化所用的时间越久（例如，attatch MTD设备）。注意：从Linux v3.7版本开始，UBI提供了一个可选的试验性特性，叫做“fastmap”，这允许在几乎恒定的时间里完成连接，详情查看Fasemap章节。初始化时间取决于闪存输入/输入的速度和CPU速度（影响较小），因为：

UBI scans the MTD device when attaching - it reads the erase EC and VID headers from every single PEB; the headers are small (64 bytes each), so this means reading 128 bytes from each PEB in the case of NOR flash or one or two NAND pages in the case of NAND flash (this depends on whether the NAND flash supports sub-pages or not); in any case this is much less time than JFFS2 needs to read when it mounts MTD devices, so UBI attaches MTD devices many times faster than JFFS2 would mount a file system on the same MTD device;
建立连接时UBI会扫描整个MTD设备 - 从每块PEB读取记录擦除信息的EC头和身份信息的VID头；这两个头部都很小（每个64字节），因此在使用NOR flash时只需要读取128字节，在使用支持子页的Nand flash时只需要读取1或2个子页；无论在哪种情况，UBI所花费的时间和所要读取的信息，都比JFFS2挂载到MTD设备所要的时间和读取的信息少得多，因此即使UBI多次连接到MTD设备，所花费的时间也比JFFS2建立一次连接的时间少；
UBI calculates the CRC-32 checksum of each EC and VID header, which consumes CPU, although this is usually minor compared to the flash I/O overhead.
UBI计算每个EC和VID头部的CRC-32校验码，这写操作会占用CPU时间，即使这些花销和flash I/O花销比起来小的多。

Here are some figures:

以下是一些数据：

a 256MiB OneNAND flash found in Nokia N800 devices attaches in less than 1 sec; the flash does support sub-pages so UBI only has to read the first 2KiB NAND page of each PEB while scanning;
诺基亚N800设备中的一个256MiB OneNAND闪存可在不到1秒的时间内连接；该闪存支持子页，因此UBI在扫描时只需读取每个PEB的前2KiB NAND页；
a 1GiB NAND flash found in OLPC XO-1 devices attaches in about 2 seconds; the flash is an SLC NAND and supports sub-pages, but the Cafe controller which is used in the laptop does not allow sub-page writes, so UBI has to read two 2KiB NAND pages from each PEB.
OLPC XO-1设备中的一个1GiB NAND闪存在大约2秒内连接；这个闪存是一个SLC NAND并支持子页，但笔记本电脑中使用的Nand控制器不允许子页写入，因此UBI必须从每个PEB读取两个2KiB NAND页。

Unfortunately we do not have more data and the reader is welcome to send it to us via the MTD mailing list.

可惜的是，我们没有更多的数据，欢迎读者通过MTD邮件列表发送给我们。

Implementation details

实现细节

In general, UBI needs three tables to operate:

总的来说，UBI需要三个表来实现操作：

volume table which contains per-volume information, like volume size, type, etc;
卷表，包含每个卷的信息，像卷大小，卷类型等等；
eraseblock association (EBA) table which contains the logical-to-physical eraseblock mapping information; for example, when reading an LEB, UBI first looks up the table to find the corresponding PEB number, then reads from this PEB;
擦除块关联(EBA)表，包含逻辑块-物理块的映射关系；例如，当读一个LEB，UBI首先查看EBA来寻找对应的PEB块，然后再读取这个PEB块；
erase counters (EC) table which contains the erase counter value for each physical eraseblock; the UBI wear-leveling sub-system uses this table when it needs to find, for example, a highly worn-out LEB;
擦除计数（EC）表，包含每个物理块的擦除计数值；例如，当UBI磨损均衡子系统需要查找严重磨损的LEB时，它会使用该表；

The volume table is maintained on-flash. It changes only when UBI volumes are created, deleted, or re-sized, which are rare and not time-critical operations, when UBI can afford slow and simple volume table management.

卷表在flash中维护。只有在创建、删除或调整UBI卷大小时(这些操作很少见且不是时间关键型操作)，当UBI能够承受缓慢而简单的卷表管理时，它才会发生变化。

The EBA and EC tables are changed every time an LEB is mapped to a PEB or a PEB is erased, which happens quite often and means that the table management methods should be fast and efficient.
EBA表和EC表只在LEB和PEB建立映射关系或者一个PEB块被擦除时才会发生改变，因为这些操作经常发生，所以表格管理方法应该是快速高效的。

UBI could maintain the EBA and EC tables on the flash media, but this would inevitably involve journaling, journal replay, journal commit, etc. In other words, this would introduce a lot of complexity. But UBI would be logarithmically scalable in this case.

UBI可以在闪存上维护EBA和EC表，但这不可避免地会涉及日志记录、日志重放、日志提交等。换句话说，这会带来很大的复杂性。但在这种情况下，UBI将是对数可伸缩的。

One of the UBI requirements was simplicity of the on-flash format, because UBI authors had to read UBI volumes from the boot-loader and they had very tight constraints on the boot-loader code size. It was basically impossible to add complex journal scanning and replay code to the boot-loader.

UBI要求之一是闪存格式的简单性，因为UBI作者必须从引导加载程序读取UBI卷，并且他们对引导加载程序的代码大小有非常严格的限制。向引导加载程序添加复杂的日志扫描和重放代码基本上是不可能的。

Therefore UBI does not maintain the EBA and EC tables on the flash media. Instead, it builds them in RAM each time it attaches the MTD device. This means that UBI has to scan the entire flash and read the EC and VID headers from each PEB in order to build the in-RAM EC and EBA tables.

因此，UBI不会在flash上维护EBA和EC表，而是在连接MTD设备时在RAM中创建这两个表。这意味着UBI必须扫描整个flash并读取每个PEB的EC和VID头，然后在RAM中创建EC和EBA表。

The drawbacks of this design are poor scalability and relatively high overhead on NAND flashes (e.g., the overhead is 1.5%-3% of flash space in case of a NAND flash with 2KiB NAND page and a 128KiB eraseblock). The advantages of this simplicity are a simple binary format as well as robustness.

这种设计的缺点是可扩展性差，NAND闪存的开销相对较高(例如，对于具有2KiB NAND页和128KiB擦除块的NAND闪存，开销是闪存空间的1.5%-3%)。这种简单性的优点是简单的二进制格式以及健壮性。

Nonetheless, someday we might see a "UBI2" which would maintain the tables in separate flash areas. UBI2 would not be compatible with UBI because of completely different on-flash formats, but the user interfaces would stay the same, which would guarantee compatibility of all the software built on top of UBI.

尽管如此，有一天我们可能会看到一个“UBI2”，它将在不同的闪存区中维护表格。由于闪存格式完全不同，UBI2将与UBI不兼容，但用户界面将保持不变，这将保证建立在UBI之上的所有软件的兼容性。

Reserved blocks for bad block handling (only for NAND chips)

用于坏块处理的预留块（只应用于NAND芯片）

It is well-known that NAND chips have some amount of physical eraseblocks marked as bad by the manufacturer. During the lifetime of the NAND device, other bad blocks may appear. Nonetheless, manufacturers usually guarantee that the first few physical eraseblocks are not bad and that the total number of bad PEBs will not exceed certain number. For example, a 256MiB (2048 128KiB PEBs) Samsung OneNAND chip is guaranteed to have not more than 40 128KiB PEBs during its endurance lifetime. This is a very common value for NAND devices: 20/1024 PEB, which is about 2% of flash size.

众所周知，NAND芯片有一定数量的物理擦除块被制造商标记为坏块。在NAND设备的使用过程中，也会产生其他的坏块。尽管如此，生产厂商通常会保证每个Nand芯片开始的几个物理块不会是坏块，并且坏块的总数不会超过一定的数量。例如，一个256MiB的三星OneNand芯片（共2048个块，每个块128KiB），在其使用寿命期间坏块数量不会超过40个。这是NAND设备中非常常见的值：20 / 1024PEB，约为闪存大小的2%。

This ratio of 20/1024 is the default number of blocks that UBI reserves for a UBI device. This means that if there are 2 UBI devices on a 4096 PEB NAND, 80 PEB for each UBI device will be reserved. This may appear to be a waste of space, but, given that bad blocks can appear anywhere on the NAND flash, and are not equally distributed on the whole device, it's the safer way. So instead of using several UBI devices on a NAND flash, it's more space-efficient to use only one UBI device which contains several UBI volumes.

20/1024的比率是UBI为UBI设备保留的默认预留块数。这意味着如果在一个有4096个块的NAND设备上有两个UBI设备，那么每个UBI设备就会有80个PEB作为预留块。这可能会导致一些空间被浪费，但是考虑到坏块可能出现在NAND flash的任何一个位置，并且在整个设备上分布不均，这种预留的方法就是非常必要和安全的。因此，与在一个NAND flash上使用几个UBI设备相比，只创建包含多个UBI卷的一个UBI设备，空间利用率会更高。

The default value of 20 PEB reserved per 1024 PEB is a kernel config option. For each UBI device, this value can be adjusted via a kernel parameter or an ubiattach parameter (since kernel 3.7).

每1024个PEB保留20个PEB的默认值是内核配置选项。对于每个UBI设备，该值可以通过内核参数或ubiAttach参数(从内核3.7开始，PS by lcy，通过-b参数可以调整预留块数量)进行调整。

Volume auto-resize

自动调整卷大小

When a UBI image is to be flashed during production, one should specify exact sizes for all volumes (the sizes are stored in the UBI volume table). However, in practice, in the embedded world, we like to have one read only volume for the root file system and one read/write volume for however much space is left (logs, user data, etc.). If the size of the root file system is fixed, the size of the second one can vary from one product to another (given different flash sizes).

在建立UBI镜像时，应指定所有卷的确切大小（卷大小会被储存在卷表中）。但是，在使用时，在嵌入式系统中，我们希望根文件系统有一个只读卷，剩下的空间则作为一个读/写卷使用（如日志，用户数据等）。如果根文件系统的大小是固定的，那么剩下的读/写卷的大小可以根据产品的不同而不同（给定不同大小的闪存）。

This is the purpose of the auto-resize flag. If the volume has the auto-resize flag enabled, its size will expand to fill the remaining unused space when UBI is run for the first time. After the volume size is adjusted, UBI removes the auto-resize flag and the volume is not re-sized anymore. The auto-resize flag is stored in the volume table and only one volume may be marked as auto-resize.

这就是自动调整大小标志的用途。如果这个卷的auto-resize标志被使能了，那么当第一次运行UBI时，它的大小把剩下的未使用的空间扩展进来。调整卷大小之后，UBI移除auto-resize标志，并且卷大小不能再重新调整。auto-resize储存在卷表中，并且只有一个卷可以被标记为auto-resize。