|
Chapter 1. Streams and Files
In this chapter, we cover the Java application programming interfaces (APIs) for input and output.
在这一章里,我们想要为了进行I/O,覆盖了Java应用程序接口(APIs)。
You will learn how to access files and directories and how to read and write data in binary and text format.
你将会学到访问文件和目录以及如何读写二进制格式和文本格式的数据。
This chapter also shows you the object serialization mechanism that lets you store objects as easily as you can store text or numeric data.
本章还介绍了对象序列化机制,可以让你觉得就像存储文本或者是数字一样存储对象。
Next, we turn to several improvements that were made in the "new I/O" package java.nio, introduced in Java SE 1.4.
接下来,我们来看看那些在“new I / O”的包java.nio中介绍了在Java SE的1.4作了一些改进。
We finish the chapter with a discussion of regular expressions, even though they are not actually related to streams and files.
即使他们实际上并不涉及流和文件,我们完成了一个正则表达式的讨论。
We couldn't find a better place to handle that topic, and apparently neither could the Java team—the regular expression API specification was attached to the specification request for the "new I/O" features of Java SE 1.4.
我们无法找到一个更好的地方来处理这一议题,显然也不能在Java team---正则表达式API规范是附设在Java SE 1.4“new I / O”上的。
Streams
In the Java API, an object from which we can read a sequence of bytes is called an input stream.
在Java的API里面,一个能够读一个字符序列的对象,称作输入流。
An object to which we can write a sequence of bytes is called an output stream.
一个能够写一个字符序列的对象,称作输出流。
These sources and destinations of byte sequences can be—and often are—files, but they can also be network connections and even blocks of memory. The abstract classes InputStream and OutputStream form the basis for a hierarchy of input/output (I/O) classes.
这些字节序列的来源和目的地可以而且往往是文件,但它们也可以是网络连接,甚至内存块。抽象类InputStream和OutputStream的形成一个输入/输出(I / O)的类的层次结构的基础。
Because byte-oriented streams are inconvenient for processing information stored in Unicode (recall that Unicode uses multiple bytes per character), there is a separate hierarchy of classes for processing Unicode characters that inherit from the abstract Reader and Writer classes. These classes have read and write operations that are based on two-byte Unicode code units rather than on single-byte characters.
Reading and Writing Bytes
The InputStream class has an abstract method:
abstract int read()
This method reads one byte and returns the byte that was read, or -1 if it encounters the end of the input source. The designer of a concrete input stream class overrides this method to provide useful functionality. For example, in the FileInputStream class, this method reads one byte from a file. System.in is a predefined object of a subclass of InputStream that allows you to read information from the keyboard.
The InputStream class also has nonabstract methods to read an array of bytes or to skip a number of bytes. These methods call the abstract read method, so subclasses need to override only one method.
Similarly, the OutputStream class defines the abstract method
abstract void write(int b)
which writes one byte to an output location.
Both the read and write methods block until the bytes are actually read or written. This means that if the stream cannot immediately be accessed (usually because of a busy network connection), the current thread blocks. This gives other threads the chance to do useful work while the method is waiting for the stream to again become available.
The available method lets you check the number of bytes that are currently available for reading. This means a fragment like the following is unlikely to block:
int bytesAvailable = in.available(); if (bytesAvailable > 0) { byte[] data = new byte[bytesAvailable]; in.read(data); }
When you have finished reading or writing to a stream, close it by calling the close method. This call frees up operating system resources that are in limited supply. If an application opens too many streams without closing them, system resources can become depleted. Closing an output stream also flushes the buffer used for the output stream: any characters that were temporarily placed in a buffer so that they could be delivered as a larger packet are sent off. In particular, if you do not close a file, the last packet of bytes might never be delivered. You can also manually flush the output with the flush method.
Even if a stream class provides concrete methods to work with the raw read and write functions, application programmers rarely use them. The data that you are interested in probably contain numbers, strings, and objects, not raw bytes.
Java gives you many stream classes derived from the basic InputStream and OutputStream classes that let you work with data in the forms that you usually use rather than at the byte level.
|
The Complete Stream Zoo
Unlike C, which gets by just fine with a single type FILE*, Java has a whole zoo of more than 60 (!) different stream types (see Figures 1-1 and 1-2).
Let us divide the animals in the stream class zoo by how they are used. There are separate hierarchies for classes that process bytes and characters. As you saw, the InputStream and OutputStream classes let you read and write individual bytes and arrays of bytes. These classes form the basis of the hiearchy shown in Figure 1-1. To read and write strings and numbers, you need more capable subclasses. For example, DataInputStream and DataOutputStream let you read and write all the primitive Java types in binary format. Finally, there are streams that do useful stuff; for example, the ZipInputStream and ZipOutputStream that let you read and write files in the familiar ZIP compression format.
For Unicode text, on the other hand, you use subclasses of the abstract classes Reader and Writer (see Figure 1-2). The basic methods of the Reader and Writer classes are similar to the ones for InputStream and OutputStream.
abstract int read() abstract void write(int c)
The read method returns either a Unicode code unit (as an integer between 0 and 65535) or -1 when you have reached the end of the file. The write method is called with a Unicode code unit. (See Volume I, Chapter 3 for a discussion of Unicode code units.)
Java SE 5.0 introduced four additional interfaces: Closeable, Flushable, Readable, and Appendable (see Figure 1-3). The first two interfaces are very simple, with methods
void close() throws IOException
and
void flush()
respectively. The classes InputStream, OutputStream, Reader, and Writer all implement the Closeable interface. OutputStream and Writer implement the Flushable interface.
The Readable interface has a single method
int read(CharBuffer cb)
The CharBuffer class has methods for sequential and random read/write access. It represents an in-memory buffer or a memory-mapped file. (See "The Buffer Data Structure" on page 72 for details.)
The Appendable interface has two methods for appending single characters and character sequences:
Appendable append(char c) Appendable append(CharSequence s)
The CharSequence interface describes basic properties of a sequence of char values. It is implemented by String, CharBuffer, StringBuilder, and StringBuffer.
Of the stream zoo classes, only Writer implements Appendable.
|
|
|
|
|
Combining Stream Filters
FileInputStream and FileOutputStream give you input and output streams attached to a disk file. You give the file name or full path name of the file in the constructor. For example,
FileInputStream fin = new FileInputStream("employee.dat");
looks in the user directory for a file named "employee.dat".
Tip
Because all the classes in java.io interpret relative path names as starting with the user's working directory, you may want to know this directory. You can get at this information by a call to System.getProperty("user.dir"). |
Like the abstract InputStream and OutputStream classes, these classes support only reading and writing on the byte level. That is, we can only read bytes and byte arrays from the object fin.
byte b = (byte) fin.read();
As you will see in the next section, if we just had a DataInputStream, then we could read numeric types:
DataInputStream din = . . .; double s = din.readDouble();
But just as the FileInputStream has no methods to read numeric types, the DataInputStream has no method to get data from a file.
Java uses a clever mechanism to separate two kinds of responsibilities. Some streams (such as the FileInputStream and the input stream returned by the openStream method of the URL class) can retrieve bytes from files and other more exotic locations. Other streams (such as the DataInputStream and the PrintWriter) can assemble bytes into more useful data types. The Java programmer has to combine the two. For example, to be able to read numbers from a file, first create a FileInputStream and then pass it to the constructor of a DataInputStream.
FileInputStream fin = new FileInputStream("employee.dat"); DataInputStream din = new DataInputStream(fin); double s = din.readDouble();
If you look at Figure 1-1 again, you can see the classes FilterInputStream and FilterOutputStream. The subclasses of these files are used to add capabilities to raw byte streams.
You can add multiple capabilities by nesting the filters. For example, by default, streams are not buffered. That is, every call to read asks the operating system to dole out yet another byte. It is more efficient to request blocks of data instead and put them in a buffer. If you want buffering and the data input methods for a file, you need to use the following rather monstrous sequence of constructors:
DataInputStream din = new DataInputStream( new BufferedInputStream( new FileInputStream("employee.dat")));
Notice that we put the DataInputStream last in the chain of constructors because we want to use the DataInputStream methods, and we want them to use the buffered read method.
Sometimes you'll need to keep track of the intermediate streams when chaining them together. For example, when reading input, you often need to peek at the next byte to see if it is the value that you expect. Java provides the PushbackInputStream for this purpose.
PushbackInputStream pbin = new PushbackInputStream( new BufferedInputStream( new FileInputStream("employee.dat")));
Now you can speculatively read the next byte
int b = pbin.read();
and throw it back if it isn't what you wanted.
if (b != '<') pbin.unread(b);
But reading and unreading are the only methods that apply to the pushback input stream. If you want to look ahead and also read numbers, then you need both a pushback input stream and a data input stream reference.
DataInputStream din = new DataInputStream( pbin = new PushbackInputStream( new BufferedInputStream( new FileInputStream("employee.dat"))));
Of course, in the stream libraries of other programming languages, niceties such as buffering and lookahead are automatically taken care of, so it is a bit of a hassle in Java that one has to resort to combining stream filters in these cases. But the ability to mix and match filter classes to construct truly useful sequences of streams does give you an immense amount of flexibility. For example, you can read numbers from a compressed ZIP file by using the following sequence of streams (see Figure 1-4):
ZipInputStream zin = new ZipInputStream(new FileInputStream("employee.zip")); DataInputStream din = new DataInputStream(zin);
(See "ZIP Archives" on page 32 for more on Java's ability to handle ZIP files.)
|
|
|
|
|
|