编程珠玑随机文件_随机访问文件处理的C编程教程

最新推荐文章于 2022-06-22 03:36:01 发布

cumao2792

最新推荐文章于 2022-06-22 03:36:01 发布

阅读量378

点赞数

文章标签：字符串数据库 python java linux

原文链接：https://www.thoughtco.com/random-access-file-handling-958450

版权

编程珠玑随机文件

Apart from the simplest of applications, most programs have to read or write files. It may be just for reading a config file, or a text parser or something more sophisticated. This tutorial focuses on using random access files in C.

除了最简单的应用程序之外，大多数程序都必须读取或写入文件。它可能仅用于读取配置文件，文本解析器或更复杂的内容。本教程重点介绍在C中使用随机访问文件。

用C编程随机存取文件I / O ( Programming Random Access File I/O in C )

The basic file operations are:

基本文件操作是：

fopen - open a file- specify how it's opened (read/write) and type (binary/text)
fopen-打开文件-指定如何打开(读/写)并键入(二进制/文本)
fclose - close an opened file
fclose-关闭打开的文件
fread - read from a file
fread-从文件读取
fwrite - write to a file
fwrite-写入文件
fseek/fsetpos - move a file pointer to somewhere in a file
fseek / fsetpos-将文件指针移动到文件中的某个位置
ftell/fgetpos - tell you where the file pointer is located
ftell / fgetpos-告诉您文件指针的位置

The two fundamental file types are text and binary. Of these two, binary files are usually simpler to deal with. For that reason and the fact that random access on a text file isn't something you need to do often, this tutorial is limited to binary files. The first four operations listed above are for both text and random access files. The last two just for random access.

两种基本文件类型是文本和二进制。在这两个文件中，二进制文件通常更易于处理。因此，您无需经常对文本文件进行随机访问，因此本教程仅限于二进制文件。上面列出的前四个操作是针对文本文件和随机访问文件的。最后两个只是用于随机访问。

Random access means you can move to any part of a file and read or write data from it without having to read through the entire file. Years ago, data was stored on large reels of computer tape. The only way to get to a point on the tape was by reading all the way through the tape. Then disks came along and now you can read any part of a file directly.

随机访问意味着您可以移动到文件的任何部分并从其中读取或写入数据，而不必通读整个文件。几年前，数据存储在大容量的计算机磁带上。到达磁带上某一点的唯一方法是通读整个磁带。然后出现了磁盘，现在您可以直接读取文件的任何部分。

用二进制文件编程 ( Programming With Binary Files )

A binary file is a file of any length that holds bytes with values in the range 0 to 255. These bytes have no other meaning unlike in a text file where a value of 13 means carriage return, 10 means line feed and 26 means end of file. Software reading text files have to deal with these other meanings.

二进制文件是任何长度的文件，其中包含值在0到255之间的字节。这些字节没有其他含义，与文本文件不同，在文本文件中，值13表示回车，10表示换行，26表示末尾。文件。读取文本文件的软件必须处理这些其他含义。

Binary files a stream of bytes, and modern languages tend to work with streams rather than files. The important part is the data stream rather than where it came from. In C, you can think about the data either as files or streams. With random access, you can read or write to any part of the file or stream. With sequential access, you have to loop through the file or stream from the start like a big tape.

二进制文件是字节流，现代语言倾向于使用流而不是文件。重要的是数据流，而不是数据流来自何处。在C中，您可以将数据视为文件或流。通过随机访问，您可以读取或写入文件或流的任何部分。使用顺序访问，您必须像大磁带一样从头开始遍历文件或流。

This code sample shows a simple binary file being opened for writing, with a text string (char *) being written into it. Normally you see this with a text file, but you can write text to a binary file.

此代码示例显示了一个打开的简单二进制文件以进行写入，其中写入了文本字符串(char *)。通常，您会看到一个文本文件，但是可以将文本写入二进制文件。

This example opens a binary file for writing and then writes a char * (string) into it. The FILE * variable is returned from the fopen() call. If this fails (the file might exist and be open or read-only or there could be a fault with the filename), then it returns 0.

本示例打开一个要写入的二进制文件，然后将char *(字符串)写入其中。 FILE *变量从fopen()调用返回。如果失败(文件可能存在并且已打开或只读，或者文件名可能有错误)，则它返回0。

The fopen() command attempts to open the specified file. In this case, it's test.txt in the same folder as the application. If the file includes a path, then all the backslashes must be doubled up. "c:\folder\test.txt" is incorrect; you must use "c:\\folder\\test.txt".

fopen()命令尝试打开指定的文件。在这种情况下，它是test.txt与应用程序位于同一文件夹中。如果文件包含路径，则所有反斜杠必须加倍。 “ c：\ folder \ test.txt”不正确；您必须使用“ c：\\ folder \\ test.txt”。

As the file mode is "wb," this code is writing to a binary file. The file is created if it doesn't exist, and if it does, whatever was in it is deleted. If the call to fopen fails, perhaps because the file was open or the name contains invalid characters or an invalid path, fopen returns the value 0.

由于文件模式为“ wb”，因此此代码正在写入二进制文件。如果该文件不存在，则创建该文件，如果存在，则删除其中的所有文件。如果对fopen的调用失败，可能是因为文件已打开或者名称包含无效字符或无效路径，因此fopen返回值0。

Although you could just check for ft being non-zero (success), this example has a FileSuccess() function to do this explicitly. On Windows, it outputs the success/failure of the call and the filename. It's a little onerous if you are after performance, so you might limit this to debugging. On Windows, there is little overhead outputting text to the system debugger.

尽管您可以仅检查ft是否为非零(成功)，但此示例具有FileSuccess()函数以明确地执行此操作。在Windows上，它将输出呼叫成功/失败和文件名。如果您追求性能，那会有些麻烦，因此您可以将其限制为调试。在Windows上，将文本输出到系统调试器的开销很小。

The fwrite() calls outputs the specified text. The second and third parameters are the size of the characters and the length of the string. Both are defined as being size_t which is unsigned integer. The result of this call is to write count items of the specified size. Note that with binary files, even though you are writing a string (char *), it does not append any carriage return or line feed characters. If you want those, you must explicitly include them in the string.

fwrite()调用输出指定的文本。第二个和第三个参数是字符的大小和字符串的长度。两者都定义为size_t，它是无符号整数。调用的结果是写入指定大小的计数项目。请注意，对于二进制文件，即使您正在编写字符串(char *)，它也不会附加任何回车符或换行符。如果需要这些，则必须在字符串中显式包括它们。

用于读取和写入文件的文件模式 ( File Modes for Reading and Writing Files )

When you open a file, you specify how it is to be opened—whether to create it from new or overwrite it and whether it's text or binary, read or write and if you want to append to it. This is done using one or more file mode specifiers that are single letters "r", "b", "w", "a" and "+" in combination with the other letters.

打开文件时，可以指定如何打开文件-是从新文件创建文件还是覆盖文件，以及文件是文本文件还是二进制文件，请读写，以及是否要追加。这是使用一个或多个文件模式说明符完成的，这些说明符是单个字母“ r”，“ b”，“ w”，“ a”和“ +”与其他字母的组合。

r - Opens the file for reading. This fails if the file does not exist or cannot be found.
r-打开文件进行读取。如果文件不存在或找不到，此操作将失败。
w - Opens the file as an empty file for writing. If the file exists, its contents are destroyed.
w-将文件作为一个空文件打开以进行写入。如果文件存在，其内容将被销毁。
a - Opens the file for writing at the end of the file (appending) without removing the EOF marker before writing new data to the file; this creates the file first if it doesn't exist.
a-在将新数据写入文件之前，在不删除EOF标记的情况下，在文件末尾(附加)打开要写入的文件；如果文件不存在，则会首先创建该文件。

Adding "+" to the file mode creates three new modes:

在文件模式下添加“ +”会创建三个新模式：

r+ - Opens the file for both reading and writing. (The file must exist.)
r +-打开文件进行读取和写入。 (该文件必须存在。)
w+ - Opens the file as an empty file for both reading and writing. If the file exists, its contents are destroyed.
w +-将文件打开为空文件，以供读取和写入。如果文件存在，其内容将被销毁。
a+ - Opens the file for reading and appending; the appending operation includes the removal of the EOF marker before new data is written to the file, and the EOF marker is restored after writing is complete. It creates the file first if it doesn't exist. Opens the file for reading and appending; the appending operation includes the removal of the EOF marker before new data is written to the file, and the EOF marker is restored after writing is complete. It creates the file first if it doesn't exist.
a +-打开文件以进行读取和附加；附加操作包括在将新数据写入文件之前删除EOF标记，并在写入完成后恢复EOF标记。如果文件不存在，它将首先创建文件。打开文件进行读取和追加；附加操作包括在将新数据写入文件之前删除EOF标记，并在写入完成后恢复EOF标记。如果文件不存在，它将首先创建文件。

文件模式组合 ( File Mode Combinations )

This table shows file mode combinations for both text and binary files. Generally, you either read from or write to a text file, but not both at the same time. With a binary file, you can both read and write to the same file. The table below shows what you can do with each combination.

下表显示了文本文件和二进制文件的文件模式组合。通常，您可以读取或写入文本文件，但不能同时读取和写入文本文件。使用二进制文件，您可以读取和写入同一文件。下表显示了每种组合的功能。

r text - read
r文字-阅读
rb+ binary - read
rb +二进制-读取
r+ text - read, write
r +文字-读，写
r+b binary - read, write
r + b二进制-读，写
rb+ binary - read, write
rb +二进制-读，写
w text - write, create, truncate
w文字-编写，创建，截断
wb binary - write, create, truncate
wb binary-编写，创建，截断
w+ text - read, write, create, truncate
w +文字-读取，写入，创建，截断
w+b binary - read, write, create, truncate
w + b二进制-读，写，创建，截断
wb+ binary - read, write, create, truncate
wb +二进制-读取，写入，创建，截断
a text - write, create
文字-编写，创建
ab binary - write, create
AB二进制文件-编写，创建
a+ text - read, write, create
a +文字-读取，写入，创建
a+b binary - write, create
a + b二进制文件-编写，创建
ab+ binary - write, create
ab +二进制-编写，创建

Unless you are just creating a file (use "wb") or only reading one (use "rb"), you can get away with using "w+b".

除非您只是创建一个文件(使用“ wb”)或仅读取一个文件(使用“ rb”)，否则可以避免使用“ w + b”。

Some implementations also allow other letters. Microsoft, for example, allows:

一些实现还允许其他字母。例如， Microsoft允许：

t - text mode
t-文字模式
c - commit
c-提交
n - non-commit
n-不提交
S - optimizing caching for sequential access
S-优化缓存以进行顺序访问
R - caching non-sequential (random access)
R-缓存非顺序(随机访问)
T - temporary
T-临时
D - delete/temporary, which kills the file when it's closed.
D-删除/临时，在文件关闭时将其杀死。

These aren't portable so use them at your own peril.

这些不是便携式的，因此使用时后果自负。

随机存取文件存储示例 ( Example of Random Access File Storage )

The main reason for using binary files is the flexibility that allows you to read or write anywhere in the file. Text files only let you read or write sequentially. With the prevalence of inexpensive or free databases such as SQLite and MySQL, reduces the need to use random access on binary files. However, random access to file records is a little old fashioned but still useful.

使用二进制文件的主要原因是灵活性，它允许您读取或写入文件中的任何位置。文本文件仅允许您顺序读取或写入。随着廉价或免费数据库(例如SQLite和MySQL)的盛行，减少了对二进制文件使用随机访问的需求。但是，随机访问文件记录有些过时，但仍然有用。

检查一个例子 ( Examining an Example )

Assume the example shows an index and data file pair storing strings in a random access file. The strings are different lengths and are indexed by position 0, 1 and so on.

假设该示例显示了将索引和数据文件对存储在随机访问文件中的字符串。字符串的长度不同，并由位置0、1索引，依此类推。

There are two void functions: CreateFiles() and ShowRecord(int recnum). CreateFiles uses a char * buffer of size 1100 to hold a temporary string made up of the format string msg followed by n asterisks where n varies from 5 to 1004. Two FILE * are created both using wb filemode in the variables ftindex and ftdata. After creation, these are used to manipulate the files. The two files are

有两个void函数：CreateFiles()和ShowRecord(int recnum)。 CreateFiles使用大小为1100的char *缓冲区来保存由格式字符串msg和n个星号(其中n在5到1004之间)组成的临时字符串。两个FILE *都使用wb filemode在变量ftindex和ftdata中创建。创建后，将使用这些文件来操作文件。这两个文件是

index.dat
索引数据
data.dat
数据文件

The index file holds 1000 records of type indextype; this is the struct indextype, which has the two members pos (of type fpos_t) and size. The first part of the loop:

索引文件包含1000个indextype类型的记录；这是结构索引类型，具有两个成员pos(类型为fpos_t)和大小。循环的第一部分：

populates the string msg like this.

像这样填充字符串msg。

and so on. Then this:

等等。然后这样：

populates the struct with the length of the string and the point in the data file where the string will be written.

用字符串的长度和数据文件中要写入字符串的点填充结构。

At this point, both the index file struct and the data file string can be written to their respective files. Although these are binary files, they are written sequentially. In theory, you could write records to a position beyond the current end of file, but it's not a good technique to use and probably not at all portable.

此时，索引文件struct和数据文件字符串都可以写入各自的文件中。尽管这些是二进制文件，但它们是顺序写入的。从理论上讲，您可以将记录写到文件当前末尾之外的位置，但这不是一种好方法，并且可能根本无法移植。

The final part is to close both files. This ensures that the last part of the file is written to disk. During file writes, many of the writes don't go directly to disk but are held in fixed-sized buffers. After a write fills the buffer, the entire contents of the buffer are written to disk.

最后一部分是关闭两个文件。这样可以确保将文件的最后部分写入磁盘。在文件写入期间，许多写入操作不会直接进入磁盘，而是保存在固定大小的缓冲区中。写入填充缓冲区后，缓冲区的全部内容将写入磁盘。

A file flush function forces flushing and you can also specify file flushing strategies, but those are intended for text files.

文件刷新功能强制执行刷新，您也可以指定文件刷新策略，但这些策略适用于文本文件。

ShowRecord函数 ( ShowRecord Function )

To test that any specified record from the data file can be retrieved, you need to know two things: where it starts in the data file and how big it is.

要测试是否可以检索数据文件中的任何指定记录，您需要了解两件事：它在数据文件中的起始位置以及大小。

This is what the index file does. The ShowRecord function opens both files, seeks to the appropriate point (recnum * sizeof(indextype) and fetches a number of bytes = sizeof(index).

这就是索引文件的作用。 ShowRecord函数将打开两个文件，查找到适当的点(记录数* sizeof(indextype)并获取一定数量的字节= sizeof(index)。

SEEK_SET is a constant that specifies where the fseek is done from. There are two other constants defined for this.

SEEK_SET是一个常量，它指定从哪里进行fseek。为此定义了另外两个常量。

SEEK_CUR - seek relative to current position
SEEK_CUR-相对于当前位置的搜索
SEEK_END - seek absolute from the end of the file
SEEK_END-从文件末尾查找绝对值
SEEK_SET - seek absolute from the start of the file
SEEK_SET-从文件开头搜索绝对值

You could use SEEK_CUR to move the file pointer forward by sizeof(index).

您可以使用SEEK_CUR将文件指针向前移动sizeof(index)。

Having obtained the size and position of the data, it just remains to fetch it.

获得了数据的大小和位置后，就只剩下要获取它了。

Here, use fsetpos() because of the type of index.pos which is fpos_t. An alternative way is to use ftell instead of fgetpos and fsek instead of fgetpos. The pair fseek and ftell work with int whereas fgetpos and fsetpos use fpos_t.

在这里，使用fsetpos()是因为index.pos的类型是fpos_t。另一种替代方法是使用ftell代替fgetpos和使用fsek代替fgetpos。 fseek和ftell对与int一起使用，而fgetpos和fsetpos使用fpos_t。

After reading the record into memory, a null character \0 is appended to turn it into a proper c-string. Don't forget it or you'll get a crash. As before, fclose is called on both files. Although you won't lose any data if you forget fclose (unlike with writes), you will have a memory leak.

在将记录读入内存后，将附加一个空字符\ 0以将其转换为适当的c字符串。不要忘记它，否则会崩溃。和以前一样，在两个文件上都调用fclose。尽管忘记fclose不会丢失任何数据(与写操作不同)，但是会发生内存泄漏。