EXT4新特性概要介绍

Ext4

转自:http://kernelnewbies.org/Ext4

Ext4 is part of the Linux 2.6.28 kernel, read the previous link to know more details about that release.

  1. Introduction
  2. EXT4 features
    1. Compatibility
    2. Bigger filesystem/file sizes
    3. Sub directory scalability
    4. Extents
    5. Multiblock allocation
    6. Delayed allocation
    7. Fast fsck
    8. Journal checksumming
    9. "No Journaling" mode
    10. Online defragmentation
    11. Inode-related features
    12. Persistent preallocation
    13. Barriers on by default
  3. How to use Ext4
    1. Creating a new Ext4 filesystem from the scratch
    2. Migrate existing Ext3 filesystems to Ext4
    3. Mount an existing Ext3 filesystem with Ext4 without changing the format

1. Introduction

Ext4 is the evolution of the most used Linuxfilesystem, Ext3. In many ways, Ext4 is a deeper improvement over Ext3than Ext3 was over Ext2. Ext3 was mostly about adding journaling toExt2, but Ext4 modifies important data structures of the filesystemsuch as the ones destined to store the file data. The result is afilesystem with an improved design, better performance, reliability andfeatures.

2. EXT4 features

2.1. Compatibility

Any existing Ext3 filesystem can be migrated toExt4 with an easy procedure which consists in running a couple ofcommands in read-only mode (described in the next section). This meansthat you can improve the performance, storage limits and features ofyour current filesystems without reformatting and/or reinstalling yourOS and software environment. If you need the advantages of Ext4 on aproduction system, you can upgrade the filesystem. The procedure issafe and doesn't risk your data (obviously, backup of critical data isrecommended, even if you aren't updating your filesystem :). Ext4 willuse the new data structures only on new data, the old structures willremain untouched and it will be possible to read/modify them whenneeded. This means, of course, that once you convert your filesystem toExt4 you won't be able to go back to Ext3 again (although there's apossibility, described in the next section, of mounting an Ext3filesystem with Ext4 without using the new disk format and you'll beable to mount it with Ext3 again, but you lose many of the advantagesof Ext4).

2.2. Bigger filesystem/file sizes

Currently, Ext3 support 16 TB of maximumfilesystem size, and 2 TB of maximum file size. Ext4 adds 48-bit blockaddressing, so it will have 1 EB of maximum filesystem size and 16 TBof maximum file size. 1 EB = 1,048,576 TB (1 EB = 1024 PB, 1 PB = 1024TB, 1 TB = 1024 GB). Why 48-bit and not 64-bit? There are somelimitations that would need to be fixed before making Ext4 fully 64-bitcapable, which have not been addressed in Ext4. The Ext4 datastructures have been designed keeping this in mind, so a future updateto Ext4 will implement full 64-bit support at some point. 1 EB will beenough (really :))until that happens. (Note: The code to create filesystems bigger than16 TB is -at the time of writing this article- not in any stablerelease of e2fsprogs. It will be in future releases.)

2.3. Sub directory scalability

Right now the maximum possible number of subdirectories contained in a single directory in Ext3 is 32000. Ext4breaks that limit and allows an unlimited number of sub directories.

2.4. Extents

The traditionally Unix-derived filesystems likeExt3 use an indirect block mapping scheme to keep track of each blockused for the blocks corresponding to the data of a file. This isinefficient for large files, specially on large file delete andtruncate operations, because the mapping keeps a entry for every singleblock, and big files have many blocks -> huge mappings, slow tohandle. Modern filesystems use a different approach called "extents".An extent is basically a bunch of contiguous physical blocks. Itbasically says "The data is in the next n blocks". For example, a 100MB file can be allocated into a single extent of that size, instead ofneeding to create the indirect mapping for 25600 blocks (4 KB perblock). Huge files are split in several extents. Extents improve theperformance and also help to reduce the fragmentation, since an extentencourages continuous layouts on the disk.

2.5. Multiblock allocation

When Ext3 needs to write new data to the disk,there's a block allocator that decides which free blocks will be usedto write the data. But the Ext3 block allocator only allocates oneblock (4KB) at a time. That means that if the system needs to write the100 MB data mentioned in the previous point, it will need to call theblock allocator 25600 times (and it was just 100 MB!). Not only this isinefficient, it doesn't allow the block allocator to optimize theallocation policy because it doesn't knows how many total data is beingallocated, it only knows about a single block. Ext4 uses a "multiblockallocator" (mballoc) which allocates many blocks in a single call,instead of a single block per call, avoiding a lot of overhead. Thisimproves the performance, and it's particularly useful with delayedallocation and extents. This feature doesn't affect the disk format.Also, note that the Ext4 block/inode allocator has other improvements,described in detail in this paper.

2.6. Delayed allocation

Delayed allocationis a performance feature (it doesn't change the disk format) found in afew modern filesystems such as XFS, ZFS, btrfs or Reiser 4, and itconsists in delaying the allocation of blocks as much as possible,contrary to what traditionally filesystems (such as Ext3, reiser3, etc)do: allocate the blocks as soon as possible. For example, if a processwrite()s, the filesystem code will allocate immediately the blockswhere the data will be placed - even if the data is not being writtenright now to the disk and it's going to be kept in the cache for sometime. This approach has disadvantages. For example when a process iswriting continually to a file that grows, successive write()s allocateblocks for the data, but they don't know if the file will keep growing.Delayed allocation, on the other hand, does not allocate the blocksimmediately when the process write()s, rather, it delays the allocationof the blocks while the file is kept in cache, until it is really goingto be written to the disk. This gives the block allocator theopportunity to optimize the allocation in situations where the oldsystem couldn't. Delayed allocation plays very nicely with the twoprevious features mentioned, extents and multiblock allocation, becausein many workloads when the file is written finally to the disk it willbe allocated in extents whose block allocation is done with the mballocallocator. The performance is much better, and the fragmentation ismuch improved in some workloads.

2.7. Fast fsck

Fsck is a very slow operation, especially thefirst step: checking all the inodes in the file system. In Ext4, at theend of each group's inode table will be stored a list of unused inodes(with a checksum, for safety), so fsck will not check those inodes. Theresult is that total fsck time improves from 2 to 20 times, dependingon the number of used inodes (http://kerneltrap.org/Linux/Improving_fsck_Speeds_in_Ext4).It must be noticed that it's fsck, and not Ext4, who will build thelist of unused inodes. This means that you must run fsck to get thelist of unused inodes built, and only the next fsck run will be faster(you need to pass a fsck in order to convert an Ext3 filesystem to Ext4anyway). There's also a feature that takes part in this fsck speed up -"flexible block groups" - that also speeds up filesystem operations.

2.8. Journal checksumming

The journal is the most used part of the disk,making the blocks that form part of it more prone to hardware failure.And recovering from a corrupted journal can lead to massive corruption.Ext4 checksums the journal data to know if the journal blocks arefailing or corrupted. But journal checksumming has a bonus: it allowsone to convert the two-phase commit system of Ext3's journaling to asingle phase, speeding the filesystem operation up to 20% in some cases- so reliability and performance are improved at the same time. (Note:the part of the feature that improves the performance, the asynchronouslogging, is turned off by default for now, and will be enabled infuture releases, when its reliability improves)

2.9. "No Journaling" mode

Journaling ensures the integrity of thefilesystem by keeping a log of the ongoing disk changes. However, it isknown to have a small overhead. Some people with special requirementsand workloads can run without a journal and its integrity advantages.In Ext4 the journaling feature can be disabled, which provides a small performance improvement.

2.10. Online defragmentation

While delayed allocation,extents and multiblock allocation help to reduce the fragmentation,with usage filesystems can still fragment. For example: You write threefiles in a directory and continually on the disk. Some day you need toupdate the file of the middle, but the updated file has grown a bit, sothere's not enough room for it. You have no option but fragment theexcess of data to another place of the disk, which will cause a seek,or allocate the updated file continually in another place, far from theother two files, resulting in seeks if an application needs to read allthe files on a directory (say, a file manager doing thumbnails on adirectory full of images). Besides, the filesystem can only care aboutcertain types of fragmentation, it can't know, for example, that itmust keep all the boot-related files contiguous, because it doesn'tknow which files are boot-related. To solve this issue, Ext4 willsupport online fragmentation, and there's a e4defrag tool which candefragment individual files or the whole filesystem.

2.11. Inode-related features

Larger inodes, nanosecond timestamps, fast extended attributes, inodes reservation...

  • Larger inodes: Ext3 supports configurable inode sizes (via the-I mkfs parameter), but the default inode size is 128 bytes. Ext4 willdefault to 256 bytes. This is needed to accommodate some extra fields(like nanosecond timestamps or inode versioning), and the remainingspace of the inode will be used to store extend attributes that aresmall enough to fit it that space. This will make the access to thoseattributes much faster, and improves the performance of applicationsthat use extend attributes by a factor of 3-7 times.

  • Inode reservation consists in reserving several inodes when adirectory is created, expecting that they will be used in the future.This improves the performance, because when new files are created inthat directory they'll be able to use the reserved inodes. Filecreation and deletion is hence more efficient.

  • Nanoseconds timestamps means that inode fields like "modifiedtime" will be able to use nanosecond resolution instead of the secondresolution of Ext3.

2.12. Persistent preallocation

This feature, available in Ext3 in the latestkernel versions, and emulated by glibc in the filesystems that don'tsupport it, allows applications to preallocate disk space: Applicationstell the filesystem to preallocate the space, and the filesystempreallocates the necessary blocks and data structures, but there's nodata on it until the application really needs to write the data in thefuture. This is what P2P applications do in their own when they"preallocate" the necessary space for a download that will last hoursor days, but implemented much more efficiently by the filesystem andwith a generic API. This has several uses: first, to avoid applications(like P2P apps) doing it themselves inefficiently by filling a filewith zeros. Second, to improve fragmentation, since the blocks will beallocated at one time, as contiguously as possible. Third, to ensurethat applications always have the space they know they will need, whichis important for RT-ish applications, since without preallocation thefilesystem could get full in the middle of an important operation. Thefeature is available via the libc posix_fallocate() interface.

2.13. Barriers on by default

This is an option thatimproves the integrity of the filesystem at the cost of someperformance (you can disable it with "mount -o barrier=0", recommendedtrying it if you're benchmarking). From this LWN article:"The filesystem code must, before writing the [journaling] commitrecord, be absolutely sure that all of the transaction's informationhas made it to the journal. Just doing the writes in the proper orderis insufficient; contemporary drives maintain large internal caches andwill reorder operations for better performance. So the filesystem mustexplicitly instruct the disk to get all of the journal data onto themedia before writing the commit record; if the commit record getswritten first, the journal may be corrupted. The kernel's block I/Osubsystem makes this capability available through the use of barriers;in essence, a barrier forbids the writing of any blocks after thebarrier until all blocks written before the barrier are committed tothe media. By using barriers, filesystems can make sure that theiron-disk structures remain consistent at all times."

3. How to use Ext4

At this time, all relevant distros support it. GRUB also supports Ext4. Just use it.

Switching to Ext4 is very easy. There are three different ways to switch:

3.1. Creating a new Ext4 filesystem from the scratch

  • The easiest one, recommended fornew installations. Just update your e2fsprogs package to Ext4, andcreate the filesystem with mkfs.ext4.

3.2. Migrate existing Ext3 filesystems to Ext4

You need to use the tune2fs and fsck tools in the filesystem, and that filesystem needs to be unmounted. Run:

  • tune2fs -O extents,uninit_bg,dir_index /dev/yourfilesystem

After running this command you MUST run fsck. If you don't do it,Ext4 WILL NOT MOUNT your filesystem. This fsck run is needed to returnthe filesystem to a consistent state. It WILL tell you that it findschecksum errors in the group descriptors - it's expected, and it'sexactly what it needs to be rebuilt to be able to mount it as Ext4, sodon't get surprised by them. Since each time it finds one of thoseerrors it asks you what to do, always say YES. If you don't want to beasked, add the "-p" parameter to the fsck command, it means "automaticrepair":

  • fsck -pDf /dev/yourfilesystem

There's another thing that must be mentioned. All your existingfiles will continue using the old indirect mapping to map all theblocks of data. The online defrag tool will be able to migrate each oneof those files to an extent format (using an ioctl that tells thefilesystem to rewrite the file with the extent format; you can use itsafely while you're using the filesystem normally)

3.3. Mount an existing Ext3 filesystem with Ext4 without changing the format

You can mount an existing Ext3 filesystem withExt4 but without using features that change the disk format. This meansyou will be able to mount your filesystem with Ext3 again. You canmount an existing Ext3 filesystem with "mount -t ext4/dev/yourpartition /mnt". Doing this without having done the conversionprocess described in the previous point will force Ext4 to not use thefeatures that change the disk format, such as extents, it will use onlythe features that don't change the file format, such as mballoc ordelayed allocation. You'll be able to mount your filesystem as Ext3again. But obviously you'll be losing the advantages of the Ext4features that don't get used...
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值