UBI - Unsorted Block Images

Table of contents

  1. Big red note
  2. Overview
  3. Power-cuts tolerance
  4. Kernel source code
  5. Mailing list
  6. User-space tools
  7. UBI headers
  8. UBI volume table
  9. Minimum flash input/output unit
  10. NAND flash sub-pages
  11. UBI headers position
  12. Flash space overhead
  13. Saving erase counters
  14. How UBI flasher should work
  15. Marking eraseblocks as bad
  16. Scalability issues
  17. Reserved blocks for bad block handling (only for NAND chips)
  18. Volume auto-resize
  19. UBI operations
    1. LEB un-map
    2. LEB map
    3. Volume update
    4. Atomic LEB change
  20. Fastmap
  21. R/O block devices on top of UBI volumes
  22. UBI stress testing
  23. More documentation

The LEB un-map operation is implemented by the ubi_leb_unmap() UBI kernel API function. And starting from kernel version 2.6.29 the un-map operation is available to user-space programs via the UBI_IOCEBUNMAP ioctl command. The ioctl should be called for UBI volume character devices.

LEB取消映射操作由ubi_leb_unmap()UBI内核API函数实现。从内核版本2.6.29开始,用户空间程序可以通过UBI_IOCEBMINAP ioctl命令执行unmap操作。UBI卷字符设备调用ioctl。

The LEB un-map operation:


  • first un-maps the LEB from the corresponding PEB;
  • then schedules the PEB for erasure and returns; it does not wait for the erasure of the PEB to be finished; the PEB is instead erased by the UBI background thread;

UBI returns all 0xFF bytes when an un-mapped LEB is read, so the un-map operation may be considered as a very fast erase operation. But there is one aspect to which UBI programmers have to be aware:


Suppose you un-map LEB L which is mapped to PEB P. Since P is not synchronously erased, but just scheduled for erasure, there might be "surprises" in the case of unclean reboots: if a reboot happens before P has been physically erased, L will be mapped to P again when UBI attaches the MTD device at the next bootup. Indeed, UBI will scan the MTD device and find the P which refers to L, and it will add this mapping information to the EBA table.

假设你要解除LEB L和PEB P的映射关系。由于PEB P不是立刻擦除,而是通过后台调度实现,那么在意外断电或重启时可能会有产生“惊喜”:如果在PEB P擦除完成前出现意外重启,那么当下次UBI连接MTD设备时,LEB L又会映射到PEB P上。事实上,UBI会扫描整个MTD设备并且找到LEB L映射的PEB P,并且会将映射信息添加到EBA表中。

However, once you write any data to L, or map it using the LEB map operation, it gets mapped to a new PEB and the old contents are gone forever, because even in the case of an unclean reboot UBI would pick the newer mapping for L.


Implementation details


This section describes how UBI distinguishes between older and newer versions of an LEB in the case of an unclean reboot. Suppose we un-map LEB L which is mapped to PEB P1, which means UBI schedules P1 for erasure. Then we write some data to L, which means that UBI finds another PEB P2, maps L to P2, and writes the data to P2. If an unclean reboot happens before P1 is physically erased, but after the write operation, we end up with 2 PEBs (P1 and P2) mapped to the same LEB L.

被节描述当发生意外掉电重启时,UBI怎么分辨新旧LEB块。假设我们解除LEB L和PEB P1的映射关系,之后UBI会调度擦除P1。然后我们向L写入数据,此时UBI会找到另一个PEB P2,将L映射到P2,并且将数据写入P2。如果在P1被实际擦除之前但在P2被写入之后,发生了一次意外掉电重启,那么最后会有两个PEB(P1和P2)映射到同一个LEB L。

To handle situations like this, UBI maintains a global 64-bit sequence number variable. The sequence number variable is incremented each time a PEB is mapped to a LEB and its value is stored in the VID header of the PEB. So each VID header has a unique sequence number, and the larger the sequence number, the "younger" the VID header. When UBI attaches MTD devices, it initializes the global sequence number variable to the highest value found in the existing VID headers plus one.


In the above situation, UBI simply selects a PEB with the highest sequence number (P2) and drops the PEB with the lower sequence number (P1).


Note, the situation is more difficult if an unclean reboot happens when UBI moves the contents of one PEB to another for wear-leveling purposes, or when the unclean reboot happens during an atomic LEB change operation. In this case it is not enough to just pick the newer PEB, it is also necessary to make sure the data reached the new PEB.


The LEB map operation maps a previously un-mapped logical eraseblock (LEB) to a physical eraseblock (PEB). For example, if the operation is run for LEB A, UBI will find an appropriate PEB, write a VID header to the PEB, and amend the in-memory EBA table. The VID header will now refer to LEB A. After this operation all I/O to LEB A will actually go to the mapped PEB.

​LEB映射操作将先前未映射的逻辑擦除块(LEB)映射到物理擦除块(PEB)。例如,如果有操作使用了LEB A,那么UBI会找到一个合适的PEB,写入VID头部,修改内存中的EBA表。这个VID头部就指向LEB A。这之后所有关于LEB A的I/O操作,实际上就都是对PEB的I/O操作了。

The LEB map operation is available via the ubi_leb_map() UBI kernel API function, or via the UBI_IOCEBMAP volume character device ioctl command. However, this ioctl interface is available only starting from kernel version 2.6.29.


One of the functions of the LEB map operation is to make sure old LEB contents are removed. As was explained in this section, when an LEB is un-mapped, the corresponding PEB is not erased immediately. If an unclean reboot happens, the LEB may become mapped to the same PEB again, after the UBI attaches the MTD device. So, if you map the LEB immediately after un-mapping it, you are guaranteed that the old LEB contents are deleted. In other words, the LEB is guaranteed to contain only 0xFF bytes after the map operation returns, even in case of an unclean reboot.


Please, use the LEB map operation sparingly. Do not use it unless it is really needed, because mapped LEBs add more overhead on the UBI wear-leveling sub-system, comparing to un-mapped LEBs. Indeed, if an LEB is un-mapped, there is no PEB which contains this LEB's data, and the wear-leveling sub-system does not have to move any data to maintain wear-leveling. Conversely, if the LEB is mapped to a PEB, there is one more PEB for the wear-leveling sub-system to care about, and one more LEB to re-map to another PEB if the erase counter of the current PEB becomes too low (then the LEB is re-mapped to a PEB with higher erase counter and the old PEB is used for other operations).


The volume update operation is useful for device software updates. The operation changes the contents of the whole UBI volume with new contents. But if it gets interrupted in the middle of the update, the volume goes into the "corrupted" state and further I/O on the volume ends up with an EBADF error. The only way to get the volume back to the normal state is to start a new volume update operation and finish it.


The volume update operation can detect interrupted updates and re-start the update with the help of, for example, a "mirror" volume which would have the same contents or by showing a dialog window which would inform the user about the problem and request re-flashing. In contrast, it is difficult to detect interrupted updates when using raw MTD partitions.


The volume update operation is available via the user-space UBI interface and not available via the UBI kernel API. To update a volume, you first have to call the UBI_IOCVOLUP ioctl on the corresponding UBI volume character device node and pass it a pointer to a 64-bit value containing the length of the new volume contents in bytes. Then this number of bytes has to be written to the volume character device node. Once the last byte has been sent to the character device node, the update operation is finished. Conceptually, the sequence (in pseudo-code) is:

卷更新操作可通过用户空间UBI接口执行,但不能通过UBI内核API执行。要更新卷,首先必须调用相应UBI卷字符设备节点上的UBI_IOCVOLUP ioctl,并向其传递一个指向64位值的指针,该值包含新卷内容的长度(以字节为单位)。然后,则必须将该字节数写入卷字符设备节点。一旦最后一个字节被发送到字符设备节点,更新操作就完成了。从概念上讲,流程(伪代码)是:

fd = open("/dev/my_volume");
ioctl(fd, UBI_IOCVOLUP, &image_size);
write(fd, buf, image_size);

See include/mtd/ubi-user.h for more details. Bear in mind, the old contents of the volume are not preserved if the update is interrupted. Also, you do not have to write all the new data in one go. It is OK to call the write() function an arbitrary number of times and pass arbitrary amounts of data each time. The operation will be finished after all the data has been written. If the last write operation contains more bytes than UBI expects, the extra is ignored.

查看 include/mtd/ubi-user.h 了解更多细节。请记住,如果更新被中断,那么旧的卷数据将不复存在。此外,您也不必一次写入所有新数据。可以任意次调用write()函数,每次传递任意数量的数据。该操作将在所有数据写入后完成。如果上一次写入操作包含的字节比UBI预期的多,则会忽略多余的字节。

A Special case of the volume update operation is what we call volume truncation, which is done by the same ioctl command when the data length is zero. In this case the volume is wiped out and will contain all 0xFF bytes (all LEBs will be un-mapped).


Note, the /sys/class/ubi/ubiX_X/corrupted sysfs file reflects the "corrupted" state of the volume: it contains ASCII "0\n" if the volume is OK and "1\n" if it is corrupted (i.e. if a volume update was started but was not completed).

注意,/sys/class/ubi/ubiX_X/Corrupted sysfs文件反映卷的“已损坏”状态:如果卷正常,则包含ASCII“0\n”;如果卷已损坏(即,如果卷更新已启动但未完成),则包含“1\n”。

The volume update operation does not preserve its previous contents if the update is interrupted; it is not atomic. However, UBI does provide atomic volume updates by means of the volume re-name operation.


Volume updates are implemented with the help of update markers. Once the user has issued the UBI_IOCVOLUP ioctl, UBI sets the update marker flag for the volume in the corresponding record of the UBI volume table. At this point the volume is wiped, and UBI waits for the user to send the data. Only when all the data has been sent and has been written to the flash successfully, will the update marker be cleared. If the update is interrupted (e.g., unclean reboot, crash of the update application, etc.), the update marker is not cleared and the volume is treated as "corrupted". Only once a successful update operation has occurred will the update marker be cleared.

卷更新是在更新标记的帮助下实现的。一旦用户执行UBI_IOCVOLUP ioctl,UBI就会在UBI卷表的相应记录中设置卷的更新标记标志。这时卷被擦除,UBI等待用户发送数据。只有当所有数据都发送并且成功写入到flash,升级标记才会被清除。如果更新被中断(例如,意外重启,升级应用堵塞等等),那么升级标志不会被清除并且卷会被标记为“损坏”。只有重新完成一次成功的升级操作,这个升级标志才会被清除。

The atomic LEB change operation changes the contents of an LEB atomically, so that the old contents are preserved should the operation be interrupted. In other words, the LEB will always contain either the old contents or the new contents. This functionality is available via the ubi_leb_change() kernel API call.


The user-space interface for this operation was added in kernel version 2.6.25. Its functionality is available to user-space via the UBI_IOCEBCH ioctl command. You have to pass a pointer to a properly-filled request object of struct ubi_leb_change_req type. This object stores the LEB number to change and the length of the new contents. Then you have to write the specified number of bytes to the volume character device. Note the similarity to the volume update operation. Conceptually, the sequence (in pseudo-code) is:

此操作的用户空间接口是在内核版本2.6.25中添加的。它的功能可通过UBI_IOCEBCH ioctl命令在用户空间使用。你必须传递一个指向正确填充的ubi_leb_change_req类型的结构体的指针。该对象存储要更改的LEB编号和新内容的长度。然后,您必须将指定数量的字节写入卷字符设备。​请注意与卷更新操作的相似性。从概念上讲,序列(伪代码)是:

struct ubi_leb_change_req req;

req.lnum = lnum_to_change;
req.len = data_len;
fd = open("/dev/my_volume");
ioctl(fd, UBI_IOCEBCH, &req);
write(fd, data_buf, data_len);

If, for some reason, the user does not write the specified number of bytes to the file descriptor before closing the file, the operation is cancelled and the old contents of the LEB are preserved.


Similarly to the volume update operation, it does not matter how many times the write() function is called and how much data it passes to the UBI volume each time. The atomic LEB change operation finishes only once the last data byte has arrived.


The atomic LEB change operation might be very useful for file-systems, for example UBIFS uses this functionality when it commits the file-system index. This behaviour could also be used to create an FTL layer on top of UBI (see  here for a description of the idea).


Keep in mind that the atomic LEB change operation calculates the CRC-32 checksum of the new data, so it has some overhead compared to the "LEB erase" + "LEB write" sequence. The volume update operation does not calculate the data's CRC-32 checksum, so it is faster to update the volume than it is to atomically change all its eraseblocks. Keep this overhead in mind and be sure to only use this operation if/when atomicity is really needed.


Implementation details


Suppose UBI has to change a logical eraseblock L which is mapped to a physical eraseblock P1. First of all, UBI always has one free PEB reserved for the atomic LEB change operation, let it be P2. Before the operation, P1 stores the current contents of the LEB L and P2 is free (it contains only the EC header and 0xFF bytes). The new data is written to P2, not to P1, so should anything go wrong, the old contents of the LEB are maintained.


When the operation finishes, UBI un-maps L from P1, maps in to P2, and schedules P1 for erasure. If the operation is interrupted, L continues to be mapped to P1 and P2 is scheduled for erasure.


If an unclean reboot happens half way through the atomic LEB change operation, it is obvious that UBI has to preserve the L -> P1 mapping and erase P2 when it attaches the MTD device on the next reboot. But if an unclean reboot happens just after the atomic LEB change operation finishes, but before P1 is physically erased, it is obvious that UBI has to preserve the L -> P2 mapping and erase P1.


To resolve situations like that, UBI calculates the CRC-32 checksum of the new contents of the LEB before it is written to the flash, and stores it in the VID header (together with data length). When UBI finds 2 PEBs P1 and P2 mapped to the same LEB L during the initialization, it selects the one with the higher sequence number (P2) only if the data CRC-32 checksum is correct (which means that all data has been written to the flash media), otherwise it selects the PEB with lower sequence number(P1). Of course, UBI has to read the LEB contents in order to verify the CRC-32 checksum.

为了避免上述情况,在将数据写入flash前,UBI会计算新的LEB数据的CRC-32校验值,并且将它保存在VID头部(和数据长度一起)。初始化过程中,当UBI发现两个PEB块P1和P2都映射到同一个LEB L,只有在CRC-32校验通过的情况下(通过校验则表明所有数据都被写入到flash中了),才会选择有更大序列号的那个(P2),否则会选择序列号更小(P1)的那个。当然,UBI需要读取LEB数据来校验CRC-32。



Fastmap is an experimental and optional UBI feature, which can be enabled by setting CONFIG_MTD_UBI_FASTMAP to 'y'. Once enabled UBI evaluates the module parameter "fm_autoconvert". If it is set to 1 (default is 0) UBI automatically enables fastmap for any attached image. This means UBI creates a new internal volume with the fastmap data such that next time the image is attached, the fast attach mode can be used.

FastMap是一种实验性和可选的UBI功能,将CONFIG_MTD_UBI_FASTMAP 设为'y'也可以使能该功能。一旦启用,UBI将设置模块参数“FM_AUTOCONVERT”。如果这个参数被设为1(默认为0)UBI会在每次连接镜像时自动使能FastMap。这意味着UBI使用FastMap数据创建新的内部卷,以便下次连接镜像时可以使用FastMap模式。

In the default configuration UBI will use the information stored in this fastmap volume to accelerate the attach procedure. If you want to test fastmap, set fm_autoconvert to 1 and attach a volume.


The following settings are possible:



fastmap is completely disabled



UBI will use the fastmap data if it exists on an image, but will not install a fastmap on images that don't already have it



UBI will use the fastmap data if it exists on an image, and a fastmap is automatically created on all attached images


Backwards compatibility


The fastmap on-disk data structure makes use of delete compatible volumes, therefore fastmap-enabled images are fully backwards compatible with UBI implementations which do not support fastmap. The kernel will remove the fastmap volumes and continue with scanning. This includes not only kernel version v3.6- but also v3.7+ with this option disabled.


Technical design


An on-disk fastmap contains all the information required to attach the whole image, including: all erase counter values, a list of all PEBs and their state, a list of all volumes and their current EBA, etc... To avoid too many writes of the fastmap, it also contains a list of PEBs which may have changed and need a full scan while attaching. This list is called the "fastmap pool" and has a fixed size of 5% of the total number of PEBs. By design UBI needs to write the fastmap data only if the pool contains no free PEBs. Otherwise it would have to write the fastmap each time the EBA of a volume has changed.

磁盘上的fastmap上有连接时需要的信息,包括:所有擦除计数器的值,一个有关所有PEB和它们的状态的列表,一个有关所有卷和它们的EBA的列表,等等等等......为了避免过多次的写入Fast Map,它还包含一个可能已更改的PEB列表,需要在连接时进行全面扫描。这个列表叫做“fastmap pool”,大小大约是总PEB块的5%。通过这个设计,UBI只在pool不包含空闲PEB时才需要写入fastmap数据。否则每次EBA表变化都需要写入fastmap。

A fastmap consists of a super-block (also known as an anchor PEB) and payload data which can live on any PEB. The anchor PEB has to be located within the first 64 PEBs on the MTD device. It contains pointers to the remaining PEBs which carry the actual fastmap data. On modern NAND chips the whole fastmap fits into a single PEB. Hence, the anchor PEB points to itself. After loading the fastmap data, the UBI attach information structure is created from it.

Fastmap由超级块(也称为anchor PEB)和可以驻留在任何PEB上的有效负载数据组成。anchor PEB必须放在MTD设备最初的64个PEB中。它包含指向剩余PEB的指针,这些PEB携带实际的快速映射数据。在现代NAND芯片上,整个fastmap可以放入一个PEB中。因此,anchor PEB指向它自己。在加载fastmap数据之后,UBI连接信息结构体就根据这些数据创建。

The attach process works as follows:


  1. UBI tries to find the fastmap anchor PEB, if no anchor PEB was found UBI performs a traditional full scan
    UBI尝试发现fastmap anchor PEB,如果没有发现anchor PEB,则UBI做一次传统的全扫描;
  2. It follows the pointers stored in the anchor PEB and reads the fastmap payload data
    根据存储在anchor PEB中的指针并读取快速映射有效载荷数据
  3. Then it performs a traditional scan only on PEBs in the pool instead of all PEBs

If UBI detects that the fastmap data is invalid or corrupt it automatically falls back to scanning mode and performs a full scan. Using a CRC32 checksum and consistency checks of the internal UBI structures UBI is able to detect whether the fastmap data is invalid or not.


The fastmap data is written to the device: each time the fastmap pool becomes full (i.e. no free PEBs are available), the volume layout changes, or the image is detached. If you are wondering why the fastmap data needs to be written at detach time, it is because otherwise all erase counter modifications since the last fastmap write would be lost.




A fastmap-enabled UBI will reserve enough PEBs to carry two complete fastmaps. In practice on modern NAND chips two PEBs are reserved for fastmap.


There is also some runtime overhead. In order to guarantee that the new fastmap is valid and consistent, UBI needs to make sure that all I/O which would cause EBA changes are blocked while attaching. Depending on the specific flash chips, this can take up to one second. Therefore, fastmap only makes sense on fast and large flash devices where a full scan would otherwise take too long. For example: on 4GiB NAND chips a full scan takes several seconds, whereas a fast attach needs less than one second.

还要一些运行时的开销。为了保证新的fastmap有效和一致,UBI需要确保在连接时阻止所有会导致EBA更改的I/O操作。根据具体的内存芯片,这可能需要长达1秒的时间。因此,只有在快速和大型闪存设备上,FastMap才有意义,否则完整扫描将花费太长时间。例如:在4GiB NAND芯片上,完全扫描需要几秒钟,而快速连接只需要不到一秒。



Enabling fastmap does not guarantee that every attach process will be done in optimal time. In some situations a full scan is still needed. This can happen in two cases: (i) if an unexpected reboot occurs while a fastmap is being written to the flash or (ii) UBI runs out of PEBs while writing the fastmap. The latter case can happen if a massive amount of I/O errors happen while writing, and UBI cannot find enough usable PEBs.


R/O block devices on top of UBI volumes


UBI allows the creation of block devices on top of UBI volumes with the following limitations:


  • Read-only operation.
  • Serialized I/O operation, but keep in mind the NAND driver core already serializes all I/O too.

Despite these limitations, a block device is still very useful for the purpose of mounting read-only, regular file systems on top of UBI volumes. Take, for example, squashfs, which can be used as a lightweight read-only rootfs on top of a NAND device. In this case, the UBI layer will take care of low-level details such as bit-flip handling and wear-levelling.




Creating and destroying block devices on a UBI volume is somewhat similar to attaching MTD devices to UBI. You can either use the block UBI module parameter or use the "ubiblock" user-space tool.

在UBI卷上创建或者消除块设备在一定程度上和将UBI设置连接到MTD上相似。你可以使用block UBI模块参数或使用“ubiblock”用户空间工具。

In order to create a block device at bootup time (e.g. to mount the rootfs on such a block device) you can specify the block parameter as a kernel boot argument:


ubi.mtd=5 ubi.block=0,0 root=/dev/ubiblock0_0

There are several ways of specifying a volume:


  • Using the UBI volume path:

  • Using the UBI device, and the volume name:

  • Using both the UBI device number and the UBI volume number:


If you've built UBI as a module you can use the following parameters at module load time:


$ modprobe ubi mtd=/dev/mtd5 block=/dev/ubi0_0

A block device can also be created/removed dynamically at runtime, using the ubiblock user-space tool:


$ ubiblock --create /dev/ubi0_0
$ ubiblock --remove /dev/ubi0_0

UBI stress testing


If enabled when configuring (right before building the code), mtd-utils includes user-space tools that can be used to stress test the UBI stack. This is useful if you want to test the stability and correctness of your particular UBI stack implementation.


Example: running various UBI tests:


$ flash_erase /dev/mtd3 0 0
$ ubiattach --mtdn 3
$ /usr/libexec/mtd-utils/runubitests.sh /dev/ubi0

More documentation

Unfortunately, no complete, up-to-date design documents exist for UBI. But there is an old UBI design document which has some out-of-date information which might still be of limited use: ubidesign.pdf.

There is also a PowerPoint UBI presentation available: ubi.ppt. Note, this document contains a lot of animations, so be sure to view it in "slide show" mode (F5 key) so that the animations will be played.

More information may be found in the FAQ section.

And of course just reading the UBI interface C header files (which are well commented) may help: include/mtd/ubi-user.h contains the user-space interface definition (namely, it defines UBI ioctl commands and the associated data structures), include/linux/mtd/ubi.h defines the kernel API, and drivers/mtd/ubi/kapi.c contains comments for each kernel API function (just above the body of the function).





