

bl@d3:~/test/sparse_file$ ls -l fs.img
-rw-r--r-- 1 bl bl 1073741824 2012-02-17 05:09 fs.img
bl@d3:~/test/sparse_file$ du -sh fs.img
0 fs.img




  • 稀疏文件(sparse file)
  • ls和du显示出的size有不同的含义


#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main(int argc, char *argv[])
int fd = open("sparse.file", O_RDWR|O_CREAT);
lseek(fd, 1024, SEEK_CUR);
write(fd, "\0", 1);

return 0;



$ dd if=/dev/zero of=sparse_file.img bs=1M seek=1024 count=0
0+0 records in
0+0 records out


The advantage of sparse files is that storage is only allocated when actually needed: disk space is saved, and large files can be created even if there is insufficient free space on the file system.



The du command which prints the occupied space, while ls print the apparent size。


bl@d3:~/test/sparse_file$ echo -n 1 > 1B.txt
bl@d3:~/test/sparse_file$ ls -l 1B.txt
-rw-r--r-- 1 bl bl 1 2012-02-19 05:17 1B.txt
bl@dl3:~/test/sparse_file$ du -h 1B.txt
4.0K 1B.txt

这里我们先创建一个文件1B.txt,大小是一个字节,ls显示出的size就是1Byte,而1B.txt这个文件在硬盘上会占用N个block,然后根据每个block的大小计算出来的。这里之所以用了N,而不是一个具体的数字,是因为隐藏在幕后的细节还很多,例如Fragment size,我们以后再讨论。

当然,上述这些都是ls和du的缺省行为,ls和du分别提供了不同参数来改变这些行为。比如ls的-s选项(print the allocated size of each file, in blocks)和du的--apparent-size选项(print  apparent  sizes,  rather than disk usage; although the apparent size is usually smaller, it may be larger due to holes in (`sparse') files, internal fragmentation, indirect blocks, and the like)。


strace cp fs.img fs.img.copy >log 2>&1


stat("fs.img.copy", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
stat("fs.img", {st_mode=S_IFREG|0644, st_size=1073741824, ...}) = 0
stat("fs.img.copy", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
open("fs.img", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1073741824, ...}) = 0
open("fs.img.copy", O_WRONLY|O_TRUNC) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap(NULL, 532480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f90df965000
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 524288) = 524288
lseek(4, 524288, SEEK_CUR) = 524288
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 524288) = 524288
lseek(4, 524288, SEEK_CUR) = 1048576
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 524288) = 524288
lseek(4, 524288, SEEK_CUR) = 1572864


By  default,  sparse  SOURCE files are detected by a crude heuristic and the corresponding DEST file is made sparse as well.  That is the behavior selected by --sparse=auto.  Specify --sparse=always to create a sparse DEST file whenever the SOURCE file contains a long enough sequence of  zero bytes.  Use --sparse=never to inhibit creation of sparse files.



About Sparse Files

This document describes sparse files, exposure due to sparse files, and the effects of certain commands on sparse files. This document applies to all versions of AIX.

Creating a sparse file 
The effect of certain commands on sparse files 

Many applications, particularly databases, maintain data in sparse files. A sparse file is a file with empty space, or gaps, left open for future addition of data. If the empty spaces are filled with the ASCII null character and the spaces are large enough, the file will be sparse, and disk blocks will not be allocated to it.

This creates an exposure: a large file will be created, but the disk blocks will not be allocated. Then, as data is added to the file, the disk blocks will be allocated but there may not be enough free disk blocks in the file system. Then the file system will be full and writes to any file in the file system will fail.

You can prevent these problems by either assuring that you have no sparse files on your system or by planning to have enough free space in the file system for the future allocation of the blocks.

You also need to be aware of how you manipulate sparse or potentially sparse files because you can easily change them from sparse to not sparse or vice-versa.

An example sparse file can be created fairly easily. To do this, open the file, seek to a large address, and write some data. This can be demonstrated with the dd command, as follows:

  1. First, create a regular file:
       date > notsparse
       ls -l

    The output of the ls command will be similar to:

       total 8
       -rw-r--r--   1 root     sys           29 Dec 21 08:12 notsparse
  2. Use the fileplace command to see how many allocated and unallocated blocks are included in the file notsparse.

    (NOTE: perfagent.tools must be installed to run the fileplace command at AIX 4.x and 5.x.)

       fileplace notsparse

    The output will look similiar to:

        File: notsparse  Size: 29 bytes  Vol: /dev/lv03
        Blk Size: 4096  Frag size: 4096 Nfrags: 1 Compress: no
          Logical Fragment
          00716                   1 frags         4096 bytes,  100.0%

    (NOTE: Performance Analysis and Control Commands [perfagent.tools] must be installed to enable the fileplace command.)

  3. The du command will also reflect how many 512-byte blocks a file occupies.
       du -rs *

    Example output:

       8 notsparse
  4. Now create a sparse file using the regular file notsparse as input:
       touch sparse.1
       dd if=notsparse of=sparse.1 seek=100

    Example output:

       dd: 0+1 records in.
       dd: 0+1 records out.

    The dd command takes the data from the regular file and places it 100 512-byte blocks into thesparse.1 file. Note that nothing is written to the initial 99 512-byte blocks. The following steps show the characteristics of the resulting file.

  5. The ls command reports the distance from block zero to the last block in the file:
       ls -l

    Example output:

       total 16
       -rw-r--r--   1 root     sys           29 Dec 21 08:12 notsparse
       -rw-r--r--   1 root     sys        51229 Dec 21 08:13 sparse.1
  6. The fileplace command tells the story accurately - there are 12 unallocated 4K blocks and one allocated 4K block in the file:
       fileplace sparse.1

    Example output:

       File: sparse.1  Size: 51229 bytes  Vol: /dev/lv03
       Blk Size: 4096  Frag Size: 4096  Nfrags:  1   Compress: no
       Logical Fragment
       unallocated                     12 frags   49152 Bytes,  0.0%
       0000769                          1 frags    4096 Bytes, 100.0%
  7. The du command reports the number of allocated blocks the file takes:
       du -rs *

    Example output:

       8 notsparse
       8 sparse.1

backup/restore (by name and inode)

The restore command aggressively preserves sparseness. In fact, the restore command will unallocate any blocks filled with zeroes, thus making a file sparse.


The cp command does not preserve the sparseness of a file.


If you create a backup using the cpio command on sparse files, you will need to use the paxcommand to restore that data. Using the cpio command to restore the data will not preserve sparseness.


Using the dd command on the file itself does not preserve sparseness. However, using dd on the file system device does preserve the state of the individual files.

Example: Backing up a logical volume:

   dd if=/dev/datalv of=/dev/rmt0 ibs=4096 obs=1024 conv=sync


The mksysb command uses backup/restore. See the section on backup/restore.


NOTE: The pax command can read tar archives and can read cpio archives if the c flag was used.

The pax command aggressively preserves sparseness. In fact, the pax command will unallocate any blocks filled with zeroes, thus making a file sparse.


Sysback will use either backup by name or inode to backup the data on the system. See the section on backup.


If you create a backup using the tar command on sparse files, you will have to use the paxcommand to restore that data. Using the tar command to restore the data will not preserve sparseness.

