UBI 笔记

UBI Structure

UBI uses an abstract model of flash. In short, from UBI’s point of view the flash (or MTD device) consists of eraseblocks, which may be good or bad. Each good eraseblock may be read from, written to, or erased. Good eraseblocks may also be marked as bad.

Important Concepts

  • UBI Header (EC Header & VID Header)
  • PEB & LEB ( Physical EraseBlock & Logical EraseBlock)
  • UBI Volume (A set of consecutive LEBs. Each LEB may be mapped to any PEB)
  • Volume Table (A special-purpose UBI volume which contains information about each volume on this UBI device)
  • Min Flash I/O Unit

UBI Headers

Header’s position

The EC header always resides at offset 0 and takes 64 bytes, the VID header resides at the next available Min I/O Unit or Sub-Page, and also takes 64 bytes. For example:

  • in case of NOR flash which has 1 byte min. I/O unit, the VID header resides at offset 64;
  • in case of NAND flash which does not have sub-pages, the VID header resides at the second NAND page;
  • in case of NAND flash which has sub-pages, the VID header resides at the second sub-page.

EC Header (Erase Counter Header)

struct ubi_ec_hdr {
	__be32  magic;//erase counter header magic number (%UBI_EC_HDR_MAGIC = "UBI#")
	__u8    version;//version of UBI implementation which is supposed to accept this UBI image
	__u8    padding1[3];//reserved for future, zeroes
	__be64  ec; /* the erase counter. Warning: the current limit is 31-bit anyway! */
	__be32  vid_hdr_offset;//where the VID header starts
	__be32  data_offset;//where the user data start
	__be32  image_seq;//image sequence number
	__u8    padding2[32];//reserved for future, zeroes
	__be32  hdr_crc;//erase counter header CRC checksum
} __packed;
/**
 * struct ubi_ec_hdr - UBI erase counter header.
 *
 * The erase counter header takes 64 bytes and has a plenty of unused space for
 * future usage. The unused fields are zeroed. The @version field is used to
 * indicate the version of UBI implementation which is supposed to be able to
 * work with this UBI image. If @version is greater than the current UBI
 * version, the image is rejected. This may be useful in future if something
 * is changed radically. This field is duplicated in the volume identifier
 * header.
 *
 * The @vid_hdr_offset and @data_offset fields contain the offset of the the
 * volume identifier header and user data, relative to the beginning of the
 * physical eraseblock. These values have to be the same for all physical
 * eraseblocks.
 *
 * The @image_seq field is used to validate a UBI image that has been prepared
 * for a UBI device. The @image_seq value can be any value, but it must be the
 * same on all eraseblocks. UBI will ensure that all new erase counter headers
 * also contain this value, and will check the value when attaching the flash.
 * One way to make use of @image_seq is to increase its value by one every time
 * an image is flashed over an existing image, then, if the flashing does not
 * complete, UBI will detect the error when attaching the media.
 */

VID Header (Volume Identifier Header)

struct ubi_vid_hdr {
	__be32  magic;//volume identifier header magic number (%UBI_VID_HDR_MAGIC = "UBI!")
	__u8    version;//UBI implementation version which is supposed to accept this UBI image
	__u8    vol_type;//volume type (%UBI_VID_DYNAMIC or %UBI_VID_STATIC)
	__u8    copy_flag;//if this logical eraseblock was copied from another physical eraseblock (for wear-leveling reasons)
	__u8    compat;//compatibility of this volume (%0, %UBI_COMPAT_DELETE, %UBI_COMPAT_IGNORE, %UBI_COMPAT_PRESERVE, or %UBI_COMPAT_REJECT)
	__be32  vol_id;//ID of this volume
	__be32  lnum;//logical eraseblock number
	__u8    padding1[4];//reserved for future, zeroes
	__be32  data_size;//how many bytes of data this logical eraseblock contains
	__be32  used_ebs;//total number of used logical eraseblocks in this volume
	__be32  data_pad;//how many bytes at the end of this physical eraseblock are not used
	__be32  data_crc;//CRC checksum of the data stored in this logical eraseblock
	__u8    padding2[4];//reserved for future, zeroes
	__be64  sqnum;//sequence number
	__u8    padding3[12];//reserved for future, zeroes
	__be32  hdr_crc;//volume identifier header CRC checksum
} __packed;
/**
 * struct ubi_vid_hdr - on-flash UBI volume identifier header.
 *
 * The @sqnum is the value of the global sequence counter at the time when this
 * VID header was created. The global sequence counter is incremented each time
 * UBI writes a new VID header to the flash, i.e. when it maps a logical
 * eraseblock to a new physical eraseblock. The global sequence counter is an
 * unsigned 64-bit integer and we assume it never overflows. The @sqnum
 * (sequence number) is used to distinguish between older and newer versions of
 * logical eraseblocks.
 *
 * There are 2 situations when there may be more than one physical eraseblock
 * corresponding to the same logical eraseblock, i.e., having the same @vol_id
 * and @lnum values in the volume identifier header. Suppose we have a logical
 * eraseblock L and it is mapped to the physical eraseblock P.
 *
 * 1. Because UBI may erase physical eraseblocks asynchronously, the following
 * situation is possible: L is asynchronously erased, so P is scheduled for
 * erasure, then L is written to,i.e. mapped to another physical eraseblock P1,
 * so P1 is written to, then an unclean reboot happens. Result - there are 2
 * physical eraseblocks P and P1 corresponding to the same logical eraseblock
 * L. But P1 has greater sequence number, so UBI picks P1 when it attaches the
 * flash.
 *
 * 2. From time to time UBI moves logical eraseblocks to other physical
 * eraseblocks for wear-leveling reasons. If, for example, UBI moves L from P
 * to P1, and an unclean reboot happens before P is physically erased, there
 * are two physical eraseblocks P and P1 corresponding to L and UBI has to
 * select one of them when the flash is attached. The @sqnum field says which
 * PEB is the original (obviously P will have lower @sqnum) and the copy. But
 * it is not enough to select the physical eraseblock with the higher sequence
 * number, because the unclean reboot could have happen in the middle of the
 * copying process, so the data in P is corrupted. It is also not enough to
 * just select the physical eraseblock with lower sequence number, because the
 * data there may be old (consider a case if more data was added to P1 after
 * the copying). Moreover, the unclean reboot may happen when the erasure of P
 * was just started, so it result in unstable P, which is "mostly" OK, but
 * still has unstable bits.
 *
 * UBI uses the @copy_flag field to indicate that this logical eraseblock is a
 * copy. UBI also calculates data CRC when the data is moved and stores it at
 * the @data_crc field of the copy (P1). So when UBI needs to pick one physical
 * eraseblock of two (P or P1), the @copy_flag of the newer one (P1) is
 * examined. If it is cleared, the situation* is simple and the newer one is
 * picked. If it is set, the data CRC of the copy (P1) is examined. If the CRC
 * checksum is correct, this physical eraseblock is selected (P1). Otherwise
 * the older one (P) is selected.
 *
 * There are 2 sorts of volumes in UBI: user volumes and internal volumes.
 * Internal volumes are not seen from outside and are used for various internal
 * UBI purposes. In this implementation there is only one internal volume - the
 * layout volume. Internal volumes are the main mechanism of UBI extensions.
 * For example, in future one may introduce a journal internal volume. Internal
 * volumes have their own reserved range of IDs.
 *
 * The @compat field is only used for internal volumes and contains the "degree
 * of their compatibility". It is always zero for user volumes. This field
 * provides a mechanism to introduce UBI extensions and to be still compatible
 * with older UBI binaries. For example, if someone introduced a journal in
 * future, he would probably use %UBI_COMPAT_DELETE compatibility for the
 * journal volume.  And in this case, older UBI binaries, which know nothing
 * about the journal volume, would just delete this volume and work perfectly
 * fine. This is similar to what Ext2fs does when it is fed by an Ext3fs image
 * - it just ignores the Ext3fs journal.
 *
 * The @data_crc field contains the CRC checksum of the contents of the logical
 * eraseblock if this is a static volume. In case of dynamic volumes, it does
 * not contain the CRC checksum as a rule. The only exception is when the
 * data of the physical eraseblock was moved by the wear-leveling sub-system,
 * then the wear-leveling sub-system calculates the data CRC and stores it in
 * the @data_crc field. And of course, the @copy_flag is %in this case.
 *
 * The @data_size field is used only for static volumes because UBI has to know
 * how many bytes of data are stored in this eraseblock. For dynamic volumes,
 * this field usually contains zero. The only exception is when the data of the
 * physical eraseblock was moved to another physical eraseblock for
 * wear-leveling reasons. In this case, UBI calculates CRC checksum of the
 * contents and uses both @data_crc and @data_size fields. In this case, the
 * @data_size field contains data size.
 *
 * The @used_ebs field is used only for static volumes and indicates how many
 * eraseblocks the data of the volume takes. For dynamic volumes this field is
 * not used and always contains zero.
 *
 * The @data_pad is calculated when volumes are created using the alignment
 * parameter. So, effectively, the @data_pad field reduces the size of logical
 * eraseblocks of this volume. This is very handy when one uses block-oriented
 * software (say, cramfs) on top of the UBI volume.
 */

When UBI attaches an MTD device, it has to scan it, read all headers, check the CRC-32 checksums, and store erase counters and the logical-to-physical eraseblock mapping information in RAM.

UBI maps logical eraseblocks to physical eraseblocks. But besides the mapping, UBI implements global wear-leveling and transparent I/O errors handling.

After UBI has erased a PEB, it writes the EC header with increased erase counter value. This means that PEBs always have the EC header, except for the short period of time after the erasure and before the EC header is written. Should an unclean reboot happen during this short period of time, the EC header is lost or becomes corrupted. In this case UBI writes new EC header with an average erase counter just after the MTD device scanning is done.

UBI maintains two per-PEB headers because it needs to write different information on flash at different moments of time:

  • after a PEB is erased, the EC header is written straight away, which minimizes the probability of losing the erase counter due to unclean reboots;
  • when UBI associates a PEB with an LEB, the VID header is written to the PEB.

When the EC header is written to a PEB, UBI does not yet know the volume ID and LEB number this PEB will be associated with. This is why UBI needs to do two separate write operations and to have two separate headers.

Volume Table

Volume table is an on-flash data structure which contains information about each volume on this UBI device. The volume table is an array of volume table records. Each record contains the following information.

struct ubi_vtbl_record {
	__be32  reserved_pebs;//how many physical eraseblocks are reserved for this volume
	__be32  alignment;//volume alignment
	__be32  data_pad;//how many bytes are unused at the end of the each physical eraseblock to satisfy the requested alignment
	__u8    vol_type;//volume type (%UBI_DYNAMIC_VOLUME or %UBI_STATIC_VOLUME)
	__u8    upd_marker;//if volume update was started but not finished
	__be16  name_len;//volume name length
	__u8    name[UBI_VOL_NAME_MAX+1];
	__u8    flags;//volume flags (%UBI_VTBL_AUTORESIZE_FLG)
	__u8    padding[23];//reserved, zeroes
	__be32  crc;//a CRC32 checksum of the record
} __packed;//sizeof(struct ubi_vtbl_record) = 172 Bytes
/**
 * struct ubi_vtbl_record - a record in the volume table.
 *
 * If the size of the logical eraseblock is large enough to fit
 * %UBI_MAX_VOLUMES records, the volume table contains %UBI_MAX_VOLUMES
 * records. Otherwise, it contains as many records as it can fit (i.e., size of
 * logical eraseblock divided by sizeof(struct ubi_vtbl_record)).
 *
 * The @upd_marker flag is used to implement volume update. It is set to %1
 * before update and set to %0 after the update. So if the update operation was
 * interrupted, UBI knows that the volume is corrupted.
 *
 * The @alignment field is specified when the volume is created and cannot be
 * later changed. It may be useful, for example, when a block-oriented file
 * system works on top of UBI. The @data_pad field is calculated using the
 * logical eraseblock size and @alignment. The alignment must be multiple to the
 * minimal flash I/O unit. If @alignment is 1, all the available space of
 * the physical eraseblocks is used.
 *
 * Empty records contain all zeroes and the CRC checksum of those zeroes.
 */

Each record describes one UBI volume and record index in the volume table array corresponds to the volume ID. I.e, UBI volume 0 is described by record 0 in the volume table, and so on. Count of records in the volume table is limited by the LEB size, but cannot be greater than 128. This means that UBI devices cannot have more than 128 volumes. 如果LEB的空间足够大,能放下128个volume table records当然最好,如果LEB的空间不足以放下128个volume table records,能放几个放几个。

Volume’s Update(卷的更新)

当进行卷升级操作时,先把volume table中相应卷的upd_marker置位,升级完成后复位。如果在还没有升级完成前出现突然的断电情况,下次上电UBI便能知道有卷的升级操作没有完成。

当有卷升级操作时,UBI sets the update marker flag for the volume in the corresponding record of the UBI volume table. Then the volume is wiped out and UBI waits for the the user to pass the data. Once all the data have arrived and have been written to the flash, the update marker is cleaned. But in case of an interruption (e.g., unclean reboot, crash of the update application, etc.), the update marker is not cleaned and the volume is treated as “corrupted”. Only a new successful update operation may clean the update marker.

Volume Table’s Update(卷表的更新)

Every time an UBI volume is created, removed, re-sized, re-named or updated, the corresponding volume table record is changed. 在我们的使用过程中,很少涉及前四个操作,卷的更新操作我们会在升级的时候用到,在卷更新前会操作卷表,将卷表中对应卷的upd_marker置位。

有可能在更新卷表时,出现突然的断电的情况。为了数据操作的安全,为了不至于在操作卷表时破坏卷标里的数据,UBI maintains two copies of the volume table for reliability and power-cut tolerance reasons. UBI用下面的算法来更新卷表:

  • Prepare in-memory buffer with the new volume table contents.
  • Un-map LEB0 of the layout volume.
  • Write the new volume table to LEB0.
  • Un-map LEB1 of the layout volume.
  • Write the new volume table to LEB1.
  • Flush the UBI work queue to make sure the PEBs are corresponding to the un-mapped LEBs are erased.

按照这样的算法,不管任何时候断电,两个卷表中,不可能出两个volume table中的数据都不完整,两者之一总有一个是完整的,或者都是完整的。因为我们在更新卷表时,总是先更新LEB0,完成后再更新LEB1。这样之所以有效的前提是,在更新卷表前,两个互为备份的卷表中的数据都是完整的并相同的,为了保证这一点,每次设备上电时,UBI都会让两个卷表中的内容相同。具体:When attaching the MTD device, UBI makes sure that the 2 volume table copies are equivalent. If they are not equivalent, which may be caused by an unclean reboot, UBI picks the one from LEB0 and copies it to LEB1 of the layout volume (because it is newer). If one of the volume table copies is corrupted, UBI restores it from the other volume table copy.

When attaching the MTD device, 有下面几种情况:

LEB0’s PEBLEB1’s PEB
情况1OK(pick this one)OKLEB0 == LEB1
情况2OK(pick this one,newer)OKCP LEB0 to LEB1
情况3OK(pick this one)NOKCP LEB0 to LEB1
情况4NOKOK(pick this one)CP LEB1 to LEB0
情况5(impossible)NOKNOK

如果没有上电时让两份卷表相等的机制,那么上述算法只能抵御一次意外的断电情况。这个机制是上电时UBI完成的。

我们在启动时只会扫描读取卷表中的内容,并不会改变卷表中的内容。在启动时,我们需要判断扫描整个UBI设备,读出Layout Volume中的Volume Table,判断是否有卷的更新被中断(比如突然的断电)。如:我们需要启动kernel,先要在UBI设备中读出kernel的镜像,但是如果发现存储kernel的卷的upd_marker是置位的,说明上一次kernel的更新失败了,读出的kernel镜像我们是用不了的,那么我们便需要在备份分区中再次读取kernel,尝试启动。

vtbl update

UBI Operations

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值