文件读写案例分析

1. 要点

对一段文件读写的代码进行分析,谈谈其中的问题,及最后的解决办法。主要涉及如下几个函数:

  • open:打开文件
  • read:读文件
  • write:写文件
  • lseek:重新定位文件的偏移量
  • ftruncate:截断文件的内容


2. 代码

代码如下:

void test(const char *filename)
{
    char *buf="34";
    char read_buf[2];
    int fd = open(filename, O_WRONLY | O_TRUNC);
    if (fd != -1) {  
        read(fd, read_buf, 2);
        printf(" -----before write: %s = (%c,%c)\n",filename,read_buf[0],read_buf[1]);
        write(fd, buf, strlen(buf));
        read(fd, read_buf, 2);
        printf(" -----after write: %s = (%c,%c)\n",filename,read_buf[0],read_buf[1]);
    } else {
        printf(" -----%s open failed", filename);
    }
    close(fd);
} 

这段代码的背景是:

  • 有一个文件,这个文件有两个字节的数据。
  • 现在要修改这个文件,把文件的内容重置为“34”。
  • 为了便于观察运行过程,增加了日志打印(以printf为例)。
  • 首先读出文件的原始内容,即2个字节;然后写入目标数据,写入后再重新读出来,并printf。
  • 因为要替换掉原始数据,所以这里用了O_TRUNC标志位,用于删除文件已有内容。

3. 问题分析

3.1 O_TRUNC

为了删除文件的原有内容,所以用了这个标志位。但问题在于,用了这个标志位之后,文件的内容就删除掉了,此时read()就无法取到原有内容。——错误之一。

3.2 O_WRONLY

这里的主要目的是写入新的数据,所以用了O_WRONLY。但此时即便文件的内容没有被删除掉,read()仍然是读不到数据的,因为在open()中没有设置read的相关标志位。——错误之二。

3.3 偏移位置

在上面的代码中,read()之后调用了write(),之后又调用了read()。其实这里每次操作都是希望从文件头开始的,即SEEK_SET的位置。但这里并没有调用lseek()。——错误之三。

3.4 close()

在open()之后,判断了返回值是否为-1,确定是否打开,然后用if-else做不同的处理。但在close()的时候,是放在if-else语句块的外边,即不管之前的open()成功与否,都close()。——错误之四。

3.5 走读的重要性

诚然,以上各种错误通过调试是可以一一解决的。另外补充一个背景,即这段代码要用到一个非常大的系统中。在这种情况下,如果coder有单元测试的习惯,或许会把这个代码段提取出来,放到一个小文件中进行调试/测试。这样的话,解决以上问题的成本相对较低。如果没有模块化测试的习惯,直接把这段代码放到整个系统中,编译链接通过之后,通过系统的运行来定位问题,势必成本会非常高。

所以,首先对代码进行走读是非常必要的。当然,走读的有效性取决于coder是否对以上几个错误点涉及的知识点有所了解。

但无论如何,一段代码写出来之后,首先还是有必要自己认真审视一遍;然后再把代码抽象出来,放到一个小文件中进行Unit Testing。——好的代码,总是有很好的可测试性,即易于提取出来单独测试。

4. 重构后的代码

如下:

#include <stdio.h>
#include <assert.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>
#include <errno.h>

void create_file(const char* filename)
{
    assert(filename != NULL);
    
    int fd = open(filename, O_WRONLY | O_CREAT, 0777);
    if (fd == -1) {
        printf("create_file() failed: %s\n", strerror(errno));
        return;
    }
    
    char *buf = "12";
    write(fd, buf, strlen(buf));
    close(fd);
}

void read_write_read(const char *filename)
{
    assert(filename != NULL);
    
    char *buf="34";
    char read_buf[2];
    int fd = open(filename, O_RDONLY);
    if (fd != -1) {
        read(fd, read_buf, 2);
        printf(" -----before write: %s = (%c,%c)\n", filename, read_buf[0], read_buf[1]);
        close(fd);
    } else {
        printf(" -----while reading, %s open failed: %s", filename, strerror(errno));
    }
    
    fd = open(filename, O_RDWR | O_TRUNC);
    if (fd != -1) {
        write(fd, buf, strlen(buf));
        lseek(fd, 0, SEEK_SET);
        read(fd, read_buf, 2);
        printf(" -----after write: %s = (%c,%c)\n",filename,read_buf[0],read_buf[1]);
        close(fd);
    } else {
        printf(" -----%s open failed", filename);
    }
}

int main()
{
    const char* filename = "./test.bin";
    create_file(filename);
    read_write_read(filename);
    
    return 0;
}

在这里,并没有把read_write_read()这个函数单独放到一个.c,.h中,去编译一个静态/动态库;而是直接放到一个测试代码中。

运行情况:

flying-bird@flyingbird:~/examples/cpp/read_write_read$ gcc test.c
flying-bird@flyingbird:~/examples/cpp/read_write_read$ ./a.out 
 -----before write: ./test.bin = (1,2)
 -----after write: ./test.bin = (3,4)
flying-bird@flyingbird:~/examples/cpp/read_write_read$ hexdump test.bin
0000000 3433                                   
0000002
flying-bird@flyingbird:~/examples/cpp/read_write_read$ 

5. 继续重构

在上面的代码中,文件打开了2次,所以可以继续重构如下:

#include <stdio.h>
#include <assert.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>
#include <errno.h>

void create_file(const char* filename)
{
    assert(filename != NULL);
    
    int fd = open(filename, O_WRONLY | O_CREAT, 0777);
    if (fd == -1) {
        printf("create_file() failed: %s\n", strerror(errno));
        return;
    }
    
    char *buf = "1234567890";
    write(fd, buf, strlen(buf));
    close(fd);
}

void read_write_read(const char *filename)  
{  
    assert(filename != NULL);  
      
    char *buf="34";  
    char read_buf[2];  
    int fd = open(filename, O_RDWR);  
    if (fd == -1) {  
        printf(" -----%s open failed: %s", filename, strerror(errno));  
        return;
    }  

    read(fd, read_buf, 2);  
    printf(" -----before write: %s = (%c,%c)\n", filename, read_buf[0], read_buf[1]);  
    
    lseek(fd, 0, SEEK_SET);
    write(fd, buf, strlen(buf));
    
    lseek(fd, 0, SEEK_SET);  
    read(fd, read_buf, 2);  
    printf(" -----after write: %s = (%c,%c)\n",filename,read_buf[0],read_buf[1]);  
    
    ftruncate(fd, 2);
    
    close(fd);
} 

int main()
{
    const char* filename = "./test.bin";
    create_file(filename);
    read_write_read(filename);
    
    return 0;
}

这里在初始化测试文件的时候,写入了多于2个字节的数据,以此验证ftrucate()的有效性。运行结果如下:

flying-bird@flyingbird:~/examples/cpp/read_write_read$ gcc test.c
flying-bird@flyingbird:~/examples/cpp/read_write_read$ ./a.out 
 -----before write: ./test.bin = (1,2)
 -----after write: ./test.bin = (3,4)
flying-bird@flyingbird:~/examples/cpp/read_write_read$ ll test.bin
-rwxrwxr-x 1 flying-bird flying-bird 2  6月 18 19:03 test.bin*
flying-bird@flyingbird:~/examples/cpp/read_write_read$ hexdump test.bin
0000000 3433                                   
0000002
flying-bird@flyingbird:~/examples/cpp/read_write_read$ 

关于ftruncate(),作者也是第一次用到,其功能和用法可以参考APUE的4.13节,或者man 2 ftruncate,如下:

TRUNCATE(2)                                                   Linux Programmer's Manual                                                  TRUNCATE(2)



NAME
       truncate, ftruncate - truncate a file to a specified length

SYNOPSIS
       #include <unistd.h>
       #include <sys/types.h>

       int truncate(const char *path, off_t length);
       int ftruncate(int fd, off_t length);

   Feature Test Macro Requirements for glibc (see feature_test_macros(7)):

       truncate():
           _BSD_SOURCE || _XOPEN_SOURCE >= 500 || _XOPEN_SOURCE && _XOPEN_SOURCE_EXTENDED
           || /* Since glibc 2.12: */ _POSIX_C_SOURCE >= 200809L

       ftruncate():
           _BSD_SOURCE || _XOPEN_SOURCE >= 500 || _XOPEN_SOURCE && _XOPEN_SOURCE_EXTENDED
           || /* Since glibc 2.3.5: */ _POSIX_C_SOURCE >= 200112L

DESCRIPTION
       The  truncate()  and  ftruncate()  functions  cause the regular file named by path or referenced by fd to be truncated to a size of precisely
       length bytes.

       If the file previously was larger than this size, the extra data is lost.  If the file previously  was  shorter,  it  is  extended,  and  the
       extended part reads as null bytes ('\0').

       The file offset is not changed.

       If  the  size  changed,  then  the  st_ctime and st_mtime fields (respectively, time of last status change and time of last modification; see
       stat(2)) for the file are updated, and the set-user-ID and set-group-ID permission bits may be cleared.

       With ftruncate(), the file must be open for writing; with truncate(), the file must be writable.

RETURN VALUE
       On success, zero is returned.  On error, -1 is returned, and errno is set appropriately.

ERRORS
       For truncate():

       EACCES Search permission is denied for a component of the path prefix, or the named file is not writable by the user.  (See also path_resolu‐
              tion(7).)

       EFAULT Path points outside the process's allocated address space.

       EFBIG  The argument length is larger than the maximum file size. (XSI)

       EINTR  While blocked waiting to complete, the call was interrupted by a signal handler; see fcntl(2) and signal(7).

       EINVAL The argument length is negative or larger than the maximum file size.

       EIO    An I/O error occurred updating the inode.

       EISDIR The named file is a directory.

       ELOOP  Too many symbolic links were encountered in translating the pathname.

       ENAMETOOLONG
              A component of a pathname exceeded 255 characters, or an entire pathname exceeded 1023 characters.

       ENOENT The named file does not exist.

       ENOTDIR
              A component of the path prefix is not a directory.

       EPERM  The underlying filesystem does not support extending a file beyond its current size.

       EROFS  The named file resides on a read-only filesystem.

       ETXTBSY
              The file is a pure procedure (shared text) file that is being executed.

       For  ftruncate() the same errors apply, but instead of things that can be wrong with path, we now have things that can be wrong with the file
       descriptor, fd:

       EBADF  fd is not a valid descriptor.

       EBADF or EINVAL
              fd is not open for writing.

       EINVAL fd does not reference a regular file.

CONFORMING TO
       4.4BSD, SVr4, POSIX.1-2001 (these calls first appeared in 4.2BSD).

NOTES
       The details in DESCRIPTION are for XSI-compliant systems.  For non-XSI-compliant systems, the POSIX standard allows two behaviors for  ftrun‐
       cate()  when length exceeds the file length (note that truncate() is not specified at all in such an environment): either returning an error,
       or extending the file.  Like most UNIX implementations, Linux follows the XSI requirement when dealing  with  native  filesystems.   However,
       some  nonnative filesystems do not permit truncate() and ftruncate() to be used to extend a file beyond its current length: a notable example
       on Linux is VFAT.

       The original Linux truncate() and ftruncate() system calls were not designed to handle large file offsets.   Consequently,  Linux  2.4  added
       truncate64()  and  ftruncate64()  system  calls  that handle large files.  However, these details can be ignored by applications using glibc,
       whose wrapper functions transparently employ the more recent system calls where they are available.

       On some 32-bit architectures, the calling signature for these system calls differ, for the reasons described in syscall(2).

BUGS
       A header file bug in glibc 2.12 meant that the minimum value of _POSIX_C_SOURCE required to expose the declaration of ftruncate() was 200809L
       instead of 200112L.  This has been fixed in later glibc versions.

SEE ALSO
       open(2), stat(2), path_resolution(7)

COLOPHON
       This page is part of release 3.54 of the Linux man-pages project.  A description of the project, and information about reporting bugs, can be
       found at http://www.kernel.org/doc/man-pages/.



Linux                                                                2013-04-01                                                          TRUNCATE(2)


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值