1.7 Scanf/Print family of routines
The input and output support in RMIOS is through newlib, that provides
APIs and functionality similar to that provided by glibc.
rmios的输入输出支持,是newlib,API和glbc类似。
In RMIOS environment, the standard input, output and error streams are
mapped to UART device 0.
RMIOS下的标准输入输出以及错误流,都是映射到UART设备0上的。
The functions provided by newlib are not synchronized, and attempts by
multiple instances of an RMIOS application to write and read using
newlib functions might result in garbled output and bad input. Hence,
for synchronizing output, RMIOS lib provides the ’printk’ macro, which
holds a spinlock while the output is written to the UART device. Please
consult the API guide for its usage and the header files to include.
WARNING: ’printk’ does not use a recursive spinlock and might lead to
a deadlock.
printk没用循环的 旋转锁,可能会导致死锁。
For example:
f()
{
printk ("g() returns %d", g());
}
g()
{
printk ("this will not be prined due to deadlock/n");
return 0;
}
BUGS:
scanf() doesn’t read the floating point value with precision. For example,
a value of 4.58 is read as 4.5498. This will be investigated in a subsequent
release
*********************************************
1.8 Ext2 File System
*********************************************
是个RAM disk,使用的是ext2rd 库。
RMI OS EXT2 RAM DISK
RMI OS ram disk is a library implementing an ext2 file system
to make file I/O functions available to PSB applications.
Although, the current implementation provides support for a ram
disk device. It can also be applied to other devices as well.
Using ext2rd library, PSB applications can use familiar file
I/O functions to store and retrieve temporary result in an
also-familiar hierarchical分级的 file system.
The below diagram illustrates the software layer architecture of
RMI OS EXT2-ram disk library:
下面是EXT2-RAM DISK 库的 软件体系结构:
+--------------------------------------+
| PSB Applications | PSB应用程序
| |
+--------------------------------------+
| File I/O Functions | 文件I/O函数
| |
| fopen(), fclose(), fread(), fwrite() |
| ................................... |
----- +--------------------------------------+
E | EXT2 File System | EXT2文件系统
X | |
T | File/Dir <---> Inode Mapping |
2 +--------------------------------------+
| RAM DISK DRIVER | RAM DISK 驱动
L | |
I | rd_init(), rd_read(), rd_write(),... |
B | |
----- +--------------------------------------+
Ram disk driver will provide underlying support for logical sector addressing
initiated from the EXT2 file system module. The file I/O functions, in turn,
use EXT2 file system API to access files and directories stored on the ram disk.
RAM disk 驱动提供底层支持,实现EXT2文件系统模块的逻辑扇区寻址的功能。
文件 I/O 函数,使用EXT2文件系统的API 访问存储在ram disk上的文件和目录。
The ram disk will be constructed from a continuous memory region by
paritioning the memory region into fixed-size and linear sectors. In current
implementation, more than one file systems is supported and therefore, can be
simultaneously mounted.
ram disk 从一个连续的内存区域
The followings identify the features and limitations of the PSBAPP ext2 FS
implementation:
1. Support multiple file systems. However, mounting a file system on a directory
of another file system has not been supported yet. Mount point can be any string
in file system expression, e.g. /, /mnt, /fs1, /fsN, /log/cpu1, ...
支持多种文件系统。
However, if you have a mounted file system as /mnt and a directory named ’dir’
on /mnt, you can still mount another file system with the name ’/mnt/dir’.
However, a list of ’/mnt/dir’ will list the content of the root directory of
the later file system instead of the ’dir’ directory of the former FS.
2. Currently, the current implementation will provide support for only 1 block group.
Support for multiple block group can be provided with some changes.
3. Access control will be supported to avoid multiple application instances
from overwriting each other files. However, such a support requires the PSBAPP
operating environment to provide support for process/thread identifier.
The getpid(), getuid() functions are currently defined in ext2fs.h to get the
code compiled without error. They can be replaced by the real functions when
RMI OS provides support for the notion of user/process/thread id.
4. We also need a system timer to provide timestamp for file creation/access/modification.
Currently, file timestamp are set to 0 (see the definition of time() in ext2fs.h).
我们需要一个系统定时器提供文件创建/访问/修改 的时间戳。
当前,文件时间戳被设置为0(参见ex2fs.h中的time()的定义)。
5. The some improvements and error handlings need to be taken care of for a robust i
implementation. These are documented in the source code with a prefix前缀 of "TODO".
A grep of this string will reveal显示、泄露 all things that need to be improved.
6. File I/O API is extensively implemented to maximum compatibility with Linux applications.
However, they are not yet integrated to newlib. To do this, you simply remove the
"pp_" prefix in the functions in fileio.c.
FILE I/O API 扩展实现了最大限度的与linux应用程序的兼容。然而,还未被集成进newlib。要想集成进去,应该在fileio.c
函数中去除pp_ 前缀。
FILES:
RMI OS EXT2 file system is implemented as a library, which can later be integrated to
RMI OS core function if so desired. Currently, to take advantage of file I/O functions,
PSB applications link itself to the library. The source files for this library is in
ext2rd_lib directory.
RMIOS的EXT2文件系统被实现为一个库,可以被集成到RMIOS 核心函数中。当前,要想利用I/O函数,PSB应用程序把自己连接到库中。
库的源码文件都是在ext2rd_lib目录下。
ext_io.c implement EXT2 file system API functions. These functions are used by
file I/O layer to access files/directories.
ext_io.c 实现EXT2文件系统API ,这些函数,文件I/O层会用到以访问文件/目录。
ramdisk.c implement a simple ram disk driver.
ramdisk.c实现了一个简单的ram disk 驱动。
ext2_inode.c implement inode-related functions of the EXT2 FS module.
ext2_inode.c 实现的是inode相关的EXT2文件系统模块函数。
ext2_block.c implement block-related functions of the EXT2 FS module.
ext2_block.c 块设备函数。
ext2_dir.c implement directory traversal, file/dir creation/delete functions.
ext2_dir.c 目录移动,文件/目录 创建/删除 函数。
fileio.c implement file I/O API.
fileio.c 实现文件I/O API
ext2fs.h is the only EXT2-related include files used by the library source files.
ext2fs.h 是库源文件唯一使用的EXT2相关的包含文件。
Makefile make file to build the library
makefile文件是用来编译库的。
There is also a test program under directory ’ext2rd’.
ext2rd是个测试程序
The Second Extended Filesystem
==============================
ext2 was originally released in January 1993. Written by R/’emy Card,
Theodore Ts’o and Stephen Tweedie, it was a major rewrite of the
Extended Filesystem. It is currently still (April 2001) the predominant
filesystem in use by Linux. There are also implementations available
for NetBSD, FreeBSD, the GNU HURD, Windows 95/98/NT, OS/2 and RISC OS.
Options
=======
When mounting an ext2 filesystem, the following options are accepted.
Defaults are marked with (*).
bsddf (*) Makes ‘df’ act like BSD.
minixdf Makes ‘df’ act like Minix.
check=none, nocheck (*) Don’t do extra checking of bitmaps on mount
(check=normal and check=strict options removed)
debug Extra debugging information is sent to the
kernel syslog. Useful for developers.
errors=continue (*) Keep going on a filesystem error.
errors=remount-ro Remount the filesystem read-only on an error.
errors=panic Panic and halt the machine if an error occurs.
grpid, bsdgroups Give objects the same group ID as their parent.
nogrpid, sysvgroups (*) New objects have the group ID of their creator.
resuid=n The user ID which may use the reserved blocks.
resgid=n The group ID which may use the reserved blocks.
sb=n Use alternate superblock at this location.
grpquota,noquota,quota,usrquota Quota options are silently ignored by ext2.
Specification
=============
ext2 shares many properties with traditional Unix filesystems. It has
the concepts of blocks, inodes and directories. It has space in the
specification for Access Control Lists (ACLs), fragments, undeletion and
compression though these are not yet implemented (some are available as
separate patches). There is also a versioning mechanism to allow new
features (such as journalling) to be added in a maximally compatible
manner.
Blocks
------
The space in the device or file is split up into blocks. These are
a fixed size, of 1024, 2048 or 4096 bytes (8192 bytes on Alpha systems),
which is decided when the filesystem is created. Smaller blocks mean
less wasted space per file, but require slightly more accounting overhead,
and also impose other limits on the size of files and the filesystem.
Block Groups
------------
Blocks are clustered into block groups in order to reduce fragmentation
and minimise the amount of head seeking when reading a large amount
of consecutive data. Information about each block group is kept in a
descriptor table stored in the block(s) immediately after the superblock.
Two blocks near the start of each group are reserved for the block usage
bitmap and the inode usage bitmap which show which blocks and inodes
are in use. Since each bitmap is limited to a single block, this means
that the maximum size of a block group is 8 times the size of a block.
The block(s) following the bitmaps in each block group are designated
as the inode table for that block group and the remainder are the data
blocks. The block allocation algorithm attempts to allocate data blocks
in the same block group as the inode which contains them.
The Superblock超级块
--------------
The superblock contains all the information about the configuration of
the filing system.
超级快含有文件系统的所有信息。
The primary copy of the superblock is stored at an
offset of 1024 bytes from the start of the device, and it is essential
to mounting the filesystem.
超级块的主要copy放在距离设备起始地址1024字节的地方,这对mount文件系统非常重要。
Since it is so important, backup copies of
the superblock are stored in block groups throughout the filesystem.
The first version of ext2 (revision 0) stores a copy at the start of
every block group, along with backups of the group descriptor block(s).
超级块的重要性,使得超级快的备份存储在文件系统的block groups中。
Because this can consume a considerable amount of space for large
filesystems, later revisions can optionally reduce the number of backup
copies by only putting backups in specific groups (this is the sparse
superblock feature). The groups chosen are 0, 1 and powers of 3, 5 and 7.
The information in the superblock contains fields such as the total
number of inodes and blocks in the filesystem and how many are free,
how many inodes and blocks are in each block group, when the filesystem
was mounted (and if it was cleanly unmounted), when it was modified,
what version of the filesystem it is (see the Revisions section below)
and which OS created it.
If the filesystem is revision 1 or higher, then there are extra fields,
such as a volume name, a unique identification number, the inode size,
and space for optional filesystem features to store configuration info.
All fields in the superblock (as in all other ext2 structures) are stored
on the disc in little endian format, so a filesystem is portable between
machines without having to know what machine it was created on.
Inodes 目录结点
------
The inode (index node) is a fundamental concept in the ext2 filesystem.
Each object in the filesystem is represented by an inode. The inode
structure contains pointers to the filesystem blocks which contain the
data held in the object and all of the metadata about an object except
its name. The metadata about an object includes the permissions, owner,
group, flags, size, number of blocks used, access time, change time,
modification time, deletion time, number of links, fragments, version
(for NFS) and extended attributes (EAs) and/or Access Control Lists (ACLs).
There are some reserved fields which are currently unused in the inode
structure and several which are overloaded. One field is reserved for the
directory ACL if the inode is a directory and alternately for the top 32
bits of the file size if the inode is a regular file (allowing file sizes
larger than 2GB). The translator field is unused under Linux, but is used
by the HURD to reference the inode of a program which will be used to
interpret this object. Most of the remaining reserved fields have been
used up for both Linux and the HURD for larger owner and group fields,
The HURD also has a larger mode field so it uses another of the remaining
fields to store the extra more bits.
There are pointers to the first 12 blocks which contain the file’s data
in the inode. There is a pointer to an indirect block (which contains
pointers to the next set of blocks), a pointer to a doubly-indirect
block (which contains pointers to indirect blocks) and a pointer to a
trebly-indirect block (which contains pointers to doubly-indirect blocks).
The flags field contains some ext2-specific flags which aren’t catered
for by the standard chmod flags. These flags can be listed with lsattr
and changed with the chattr command, and allow specific filesystem
behaviour on a per-file basis. There are flags for secure deletion,
undeletable, compression, synchronous updates, immutability, append-only,
dumpable, no-atime, indexed directories, and data-journaling. Not all
of these are supported yet.
Directories
-----------
A directory is a filesystem object and has an inode just like a file.
It is a specially formatted file containing records which associate
each name with an inode number. Later revisions of the filesystem also
encode the type of the object (file, directory, symlink, device, fifo,
socket) to avoid the need to check the inode itself for this information
(support for taking advantage of this feature does not yet exist in
Glibc 2.2).
The inode allocation code tries to assign inodes which are in the same
block group as the directory in which they are first created.
The current implementation of ext2 uses a singly-linked list to store
the filenames in the directory; a pending enhancement uses hashing of the
filenames to allow lookup without the need to scan the entire directory.
The current implementation never removes empty directory blocks once they
have been allocated to hold more files.
Special files
-------------
Symbolic links are also filesystem objects with inodes. They deserve
special mention because the data for them is stored within the inode
itself if the symlink is less than 60 bytes long. It uses the fields
which would normally be used to store the pointers to data blocks.
This is a worthwhile optimisation as it we avoid allocating a full
block for the symlink, and most symlinks are less than 60 characters long.
Character and block special devices never have data blocks assigned to
them. Instead, their device number is stored in the inode, again reusing
the fields which would be used to point to the data blocks.
Reserved Space
--------------
Inext2, there is a mechanism for reserving a certain number of blocks
for a particular user (normally the super-user). This is intended to
allow for the system to continue functioning even if non-priveleged users
fill up all the space available to them (this is independent of filesystem
quotas). It also keeps the filesystem from filling up entirely which
helps combat fragmentation.
Filesystem check
----------------
At boot time, most systems run a consistency check (e2fsck) on their
filesystems. The superblock of the ext2 filesystem contains several
fields which indicate whether fsck should actually run (since checking
the filesystem at boot can take a long time if it is large). fsck will
run if the filesystem was not cleanly unmounted, if the maximum mount
count has been exceeded or if the maximum time between checks has been
exceeded.
Feature Compatibility
---------------------
The compatibility feature mechanism used in ext2 is sophisticated.
It safely allows features to be added to the filesystem, without
unnecessarily sacrificing compatibility with older versions of the
filesystem code. The feature compatibility mechanism is not supported by
the original revision 0 (EXT2_GOOD_OLD_REV) of ext2, but was introduced in
revision 1. There are three 32-bit fields, one for compatible features
(COMPAT), one for read-only compatible (RO_COMPAT) features and one for
incompatible (INCOMPAT) features.
These feature flags have specific meanings for the kernel as follows:
A COMPAT flag indicates that a feature is present in the filesystem,
but the on-disk format is 100% compatible with older on-disk formats, so
a kernel which didn’t know anything about this feature could read/write
the filesystem without any chance of corrupting the filesystem (or even
making it inconsistent). This is essentially just a flag which says
"this filesystem has a (hidden) feature" that the kernel or e2fsck may
want to be aware of (more on e2fsck and feature flags later). The ext3
HAS_JOURNAL feature is a COMPAT flag because the ext3 journal is simply
a regular file with data blocks in it so the kernel does not need to
take any special notice of it if it doesn’t understand ext3 journaling.
An RO_COMPAT flag indicates that the on-disk format is 100% compatible
with older on-disk formats for reading (i.e. the feature does not change
the visible on-disk format). However, an old kernel writing to such a
filesystem would/could corrupt the filesystem, so this is prevented. The
most common such feature, SPARSE_SUPER, is an RO_COMPAT feature because
sparse groups allow file data blocks where superblock/group descriptor
backups used to live, and ext2_free_blocks() refuses to free these blocks,
which would leading to inconsistent bitmaps. An old kernel would also
get an error if it tried to free a series of blocks which crossed a group
boundary, but this is a legitimate layout in a SPARSE_SUPER filesystem.
An INCOMPAT flag indicates the on-disk format has changed in some
way that makes it unreadable by older kernels, or would otherwise
cause a problem if an old kernel tried to mount it. FILETYPE is an
INCOMPAT flag because older kernels would think a filename was longer
than 256 characters, which would lead to corrupt directory listings.
The COMPRESSION flag is an obvious INCOMPAT flag - if the kernel
doesn’t understand compression, you would just get garbage back from
read() instead of it automatically decompressing your data. The ext3
RECOVER flag is needed to prevent a kernel which does not understand the
ext3 journal from mounting the filesystem without replaying the journal.
For e2fsck, it needs to be more strict with the handling of these
flags than the kernel. If it doesn’t understand ANY of the COMPAT,
RO_COMPAT, or INCOMPAT flags it will refuse to check the filesystem,
because it has no way of verifying whether a given feature is valid
or not. Allowing e2fsck to succeed on a filesystem with an unknown
feature is a false sense of security for the user. Refusing to check
a filesystem with unknown features is a good incentive for the user to
update to the latest e2fsck. This also means that anyone adding feature
flags to ext2 also needs to update e2fsck to verify these features.
Metadata
--------
It is frequently claimed that the ext2 implementation of writing
asynchronous metadata is faster than the ffs synchronous metadata
scheme but less reliable. Both methods are equally resolvable by their
respective fsck programs.
If you’re exceptionally paranoid, there are 3 ways of making metadata
writes synchronous on ext2:
per-file if you have the program source: use the O_SYNC flag to open()
per-file if you don’t have the source: use "chattr +S" on the file
per-filesystem: add the "sync" option to mount (or in /etc/fstab)
the first and last are not ext2 specific but do force the metadata to
be written synchronously. See also Journaling below.
Limitations
-----------
There are various limits imposed by the on-disk layout of ext2. Other
limits are imposed by the current implementation of the kernel code.
Many of the limits are determined at the time the filesystem is first
created, and depend upon the block size chosen. The ratio of inodes to
data blocks is fixed at filesystem creation time, so the only way to
increase the number of inodes is to increase the size of the filesystem.
No tools currently exist which can change the ratio of inodes to blocks.
Most of these limits could be overcome with slight changes in the on-disk
format and using a compatibility flag to signal the format change (at
the expense of some compatibility).
Filesystem block size: 1kB 2kB 4kB 8kB
File size limit: 16GB 256GB 2048GB 2048GB
Filesystem size limit: 2047GB 8192GB 16384GB 32768GB
There is a 2.4 kernel limit of 2048GB for a single block device, so no
filesystem larger than that can be created at this time. There is also
an upper limit on the block size imposed by the page size of the kernel,
so 8kB blocks are only allowed on Alpha systems (and other architectures
which support larger pages).
There is an upper limit of 32768 subdirectories in a single directory.
There is a "soft" upper limit of about 10-15k files in a single directory
with the current linear linked-list directory implementation. This limit
stems from performance problems when creating and deleting (and also
finding) files in such large directories. Using a hashed directory index
(under development) allows 100k-1M+ files in a single directory without
performance problems (although RAM size becomes an issue at this point).
The (meaningless) absolute upper limit of files in a single directory
(imposed by the file size, the realistic limit is obviously much less)
is over 130 trillion files. It would be higher except there are not
enough 4-character names to make up unique directory entries, so they
have to be 8 character filenames, even then we are fairly close to
running out of unique filenames.
Journaling
----------
A journaling extension to the ext2 code has been developed by Stephen
Tweedie. It avoids the risks of metadata corruption and the need to
wait for e2fsck to complete after a crash, without requiring a change
to the on-disk ext2 layout. In a nutshell, the journal is a regular
file which stores whole metadata (and optionally data) blocks that have
been modified, prior to writing them into the filesystem. This means
it is possible to add a journal to an existing ext2 filesystem without
the need for data conversion.
When changes to the filesystem (e.g. a file is renamed) they are stored in
a transaction in the journal and can either be complete or incomplete at
the time of a crash. If a transaction is complete at the time of a crash
(or in the normal case where the system does not crash), then any blocks
in that transaction are guaranteed to represent a valid filesystem state,
and are copied into the filesystem. If a transaction is incomplete at
the time of the crash, then there is no guarantee of consistency for
the blocks in that transaction so they are discarded (which means any
filesystem changes they represent are also lost).
The ext3 code is currently (Apr 2001) available for 2.2 kernels only,
and not yet available for 2.4 kernels.
References
==========
References
==========
The kernel source file:/usr/src/linux/fs/ext2/
e2fsprogs (e2fsck) http://e2fsprogs.sourceforge.net/
Design & Implementation http://e2fsprogs.sourceforge.net/ext2intro.html
Journaling (ext3) ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/
Hashed Directories http://kernelnewbies.org/˜phillips/htree/
Filesystem Resizing http://ext2resize.sourceforge.net/
Extended Attributes &
Access Control Lists http://acl.bestbits.at/
Compression (*) http://www.netspace.net.au/˜reiter/e2compr/
Implementations for:
Windows 95/98/NT/2000 http://uranus.it.swin.edu.au/˜jn/linux/Explore2fs.htm
Windows 95 (*) http://www.yipton.demon.co.uk/content.html#FSDEXT2
DOS client (*) ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
OS/2 http://perso.wanadoo.fr/matthieu.willm/ext2-os2/
RISC OS client ftp://ftp.barnet.ac.uk/pub/acorn/armlinux/iscafs/
(*) no longer actively developed/supported (as of Apr 2001)