java io理解_彻底理解Java IO

最新推荐文章于 2023-06-06 18:01:13 发布

aka卡贴人

最新推荐文章于 2023-06-06 18:01:13 发布

阅读量153

点赞数

文章标签： java io理解

本文链接：https://blog.csdn.net/weixin_32535825/article/details/114170316

版权

Java中的IO类库设计的比较繁琐，IO这一块知识又是基础必备的，而且工作学习中经常用到。这一块知识看起来不难，但是想深入全面掌握也还是要花点功夫的。不是有句玩笑话说吗，“欠下的技术债总要还的”，刚好最近我准备总结一波Okhttp，Okio，就先把Java IO 这一块知识先做个总结，算是给后面2篇总结打个铺垫吧。(JDK源码基于1.8.0)

流

谈到IO，我们会想到从磁盘读取的文件IO，网络请求的Socket IO，还有可能我们不怎么常用的跨进程通信的管道IO......

这些在Java中都被抽象为“流”，读取源就是输入流(InputStream)，输出目标就是输出流(OutputStream)。

字节与字符

要理解IO，首先要清楚我们IO操作的对象，主要有字节和字符二种。

二进制文件中存储的数据都是二进制形式，一个字节是8bit，Java中对应的类型是byte，比如数字255存储到二进制文件中的值就是0xFF(11111111)。

文本文件中存储的数据都是字符形式，具体一个字符占多大空间取决于使用的编码格式，比如我们重用的UTF-8编码，一个英文字母占1个字节，一个中文汉字占3个字节，Java中对应的类型是char，数字255存储到文本文件中就是三个字符序列'2'，'5'，'5'。

Java IO大概可以分为二大类的IO，字节IO和字符IO，它们有各自的一套继承链，字节IO的基类实现是InputStream和OutputStream，字符IO的基类实现是Reader和Writer，他们都是抽象类，我们开看一下JDK中的io继承体系图(图片来源网络)。

d049943a9cc5?utm_campaign=maleskine&utm_content=note&utm_medium=seo_notes&utm_source=recommendation

jdk_io 继承体系图(来源网络).jpeg

字节输入输出流

由于Java中IO的类比较多，篇幅关系，只会拎出来一些我觉得值得记下来的知识点记录下来，有一些不常用的类只需要了解个大概，知道与其他类的差异即可，使用细节可以再参考资料来指导使用。

InputStream

作为字节输入流的抽象基类，主要定义了二个read方法，

// 读取下一个byte，它的取值范围是(0-255)，如果返回-1，说明已经到结尾了

public abstract int read() throws IOException;

// 读取字节到目标byte数组b中，

public int read(byte b[]) throws IOException{

return read(b, 0, b.length);

}

通过这个方法的定义，可以看出，InputStream类中read方法的默认实现中，每次只能读取一个字节，这个知识点在看源码之前我是不清楚了(窃以为这样效率是不是有点低了)。读取byte数组的方法的实现也是基于read方法的，我们看一下具体实现。

public int read(byte b[], int off, int len) throws IOException {

// 做状态检查

if (b == null) {

throw new NullPointerException();

} else if (off < 0 || len < 0 || len > b.length - off) {

throw new IndexOutOfBoundsException();

} else if (len == 0) {

return 0;

}

// 先尝试读取一个字节，注意这里有可能抛出IOException

int c = read();

if (c == -1) {

return -1;

}

b[off] = (byte)c;

int i = 1;

try {

// 循环读取字节，填充到目标字节数组b，这里对异常进行了catch

for (; i < len ; i++) {

c = read();

if (c == -1) {

break;

}

b[off + i] = (byte)c;

}

} catch (IOException ee) {

}

return i;

}

原以为一次读取填充一个byte数组效率应该能高点，然而并没有，看源码知道，它的实现是通过循环调用read一个一个字节的读取来实现的。这里有一个比较有意思的地方，读取字节数组时，会先尝试读取第一个字节，如果失败或者异常了读取就终止了，如果成功了再循环读取后面的字节，之后如果出现异常不会抛出，而会将前面已经成功读取的字节数返回。

FilterInputStream

FilterInputStream内部持有另一个InputStream的引用，这是一个典型的装饰者模式应用，对传入的InputStream进行增强。这个类的实现比较简单。

protected volatile InputStream in;

protected FilterInputStream(InputStream in) {

this.in = in;

}

public int read() throws IOException {

return in.read();

}

BufferedInputStream

BufferedInputStream继承于FilterInputStream，所以它也能装饰一个InputStream，它增强的功能是增加一个缓冲buffer，读取的时候先将字节读入buffer中，buffer的默认大小是8192，这样通过空间换时间的方式提高读取效率，特别是在大文件的读写时提升很明显，后面会给出一个小实验做下对比。

BufferedInputStream的核心方法是fill方法，将字节读入buffer缓冲数组中，看下代码实现。

private void fill() throws IOException {

// 获取buffer数组

byte[] buffer = getBufIfOpen();

// markpos 表示当前有没有设置pos标记，可以通过reset来还原到标记的位置开始读

if (markpos < 0)

pos = 0; /* no mark: throw away the buffer */

else if (pos >= buffer.length) /* no room left in buffer */

// 如果buffer空间不足，会想办法给buffer腾出空间

if (markpos > 0) { /* can throw away early part of the buffer */

// 如果有设置过markpos，那么markpos之前的空间是可以腾出来的

int sz = pos - markpos;

// 把markpos到pos这一段的buffer前移markpos个位置

System.arraycopy(buffer, markpos, buffer, 0, sz);

pos = sz;

markpos = 0;

} else if (buffer.length >= marklimit) {

// 如果buffer的大小超过marklimit，将标记还原

markpos = -1; /* buffer got too big, invalidate mark */

pos = 0; /* drop buffer contents */

} else if (buffer.length >= MAX_BUFFER_SIZE) {

throw new OutOfMemoryError("Required array size too large");

} else { /* grow buffer */

// 对buffer 进行扩容扩容后的大小是当前pos * 2

int nsz = (pos <= MAX_BUFFER_SIZE - pos) ?

pos * 2 : MAX_BUFFER_SIZE;

if (nsz > marklimit)

nsz = marklimit;

byte nbuf[] = new byte[nsz];

System.arraycopy(buffer, 0, nbuf, 0, pos);

// CAS来更新buffer，

if (!bufUpdater.compareAndSet(this, buffer, nbuf)) {

// Can't replace buf if there was an async close.

// Note: This would need to be changed if fill()

// is ever made accessible to multiple threads.

// But for now, the only way CAS can fail is via close.

// assert buf == null;

throw new IOException("Stream closed");

}

buffer = nbuf;

}

count = pos;

// 从装饰的inputstream中读满整个buffer

int n = getInIfOpen().read(buffer, pos, buffer.length - pos);

if (n > 0)

count = n + pos;

}

再来看下read方法的实现, 注意BufferedInputStream中的read都加锁了，所以是线程安全的。

public synchronized int read() throws IOException {

if (pos >= count) {

// 如果读取的位置查过buffer的位置，拉取字节到buffer

fill();

if (pos >= count)

return -1;

}

// 直接从buffer数组中读取byte，这里还与Oxff进行了一次与，暂时没看来有何意义。

return getBufIfOpen()[pos++] & 0xff;

}

再看读取字节数组的实现

private int read1(byte[] b, int off, int len) throws IOException {

int avail = count - pos;

// buffer已经读到尾部了

if (avail <= 0) {

/* If the requested length is at least as large as the buffer, and

if there is no mark/reset activity, do not bother to copy the

bytes into the local buffer. In this way buffered streams will

cascade harmlessly. */

// 如果需要读取的长度大于buffer的长度，直接降级为从源inputstream中读取，避免buffer急剧增大

if (len >= getBufIfOpen().length && markpos < 0) {

return getInIfOpen().read(b, off, len);

}

// 填充buffer

fill();

avail = count - pos;

// 如果可用字节还是0，说明已经到尾部了，返回-1

if (avail <= 0) return -1;

}

// 取当前可用的字节与期望读取的字节数的较小值

int cnt = (avail < len) ? avail : len;

// 拷贝可用的字节数到目标byte数组，所以我们在使用的时候出入固定的len数也没有问题，

// 只会将可用的byte写入数组。

System.arraycopy(getBufIfOpen(), pos, b, off, cnt);

pos += cnt;

return cnt;

}

BufferedInputStream的默认缓冲区大小是8192个字节，Jdk选择这样一个默认大小实现肯定是有原因的，所以我们在项目中考虑是否需要使用BufferInputStream来装饰时需要注意二点。

目标输入流字节数是否比较大，如果比较大考虑使用Buffered。

我们在调用read方法传入的byte[]的大小最好能被8192整除，比如我们经常使用的1024或者2048，这样刚好8次和4次刚好将缓冲区buffer清空，触发下一次fill，提高读取效率。

FileInputStream

FileInputStream 继承于 InputStream，它会在构造器中通过文件的path打开一个文件, 最终会调用open0 这个 native方法。

public FileInputStream(File file) throws FileNotFoundException {

String name = (file != null ? file.getPath() : null);

SecurityManager security = System.getSecurityManager();

if (security != null) {

security.checkRead(name);

}

if (name == null) {

throw new NullPointerException();

}

if (file.isInvalid()) {

throw new FileNotFoundException("Invalid file path");

}

fd = new FileDescriptor();

fd.attach(this);

path = name;

open(name);

}

private void open(String name) throws FileNotFoundException {

open0(name);

}

private native void open0(String name) throws FileNotFoundException;

FileInputStream用native重写了read方法

private native int read0() throws IOException;

private native int readBytes(byte b[], int off, int len) throws IOException;

由于FileInputStream底层支持一次读取多个字节，所以FileInputStream非常适合用上面的BufferedInputStream来装饰使用。

FileInputStream还有一个getChannel方法，这属于nio的知识，通过nio来进行文件拷贝简单高效，示例代码如下，文章结尾会给出一个文件拷贝的小测试，对比几种拷贝方式的耗时。

FileChannel fileChannelA = new FileInputStream(new File( "tempA.txt"))

.getChannel();

FileChannel fileChannelB = new FileInputStream(new File( "tempB.txt"))

.getChannel();

fileChannelA.transferTo(0, fileChannelA.size(), fileChannelB);

DataInputStream

DataInputStream也是继承于FilterInputStream，它主要装饰增强的功能是，帮我们封装了很有有用方法，比如

public final byte readByte() throws IOException {....}

public final int readInt() throws IOException {...}

public final double readDouble() throws IOException {...}

public final String readUTF() throws IOException {...}

//.... 更多的方法可以自行查看源码

这里我们看下readInt的实现，其他的方法实现大同小异，大家可以自行查阅源码理解。

public final int readInt() throws IOException {

int ch1 = in.read();

int ch2 = in.read();

int ch3 = in.read();

int ch4 = in.read();

if ((ch1 | ch2 | ch3 | ch4) < 0)

throw new EOFException();

return ((ch1 << 24) + (ch2 << 16) + (ch3 << 8) + (ch4 << 0));

}

我们知道int是4个字节，而我们每次读取是一个字节，所以我们需要读取4次，然后通过位运算将结果拼装起来(这里又使用了移位运算)。

关于字节输入流InputStream就说这些，水平有限，更详细深入的要各位自行深究了。

字节输出流 OutputStream

OutputStream 主要定义的方法就是 write(int b), write(byte b[])。有了前面InputStream的知识，很容易类比理解，一个写一个读。

与InputStream的继承结构基本是一一对应的，输出流也有很多子类实现，比如FilterOutputStream, BufferedOutputStream, FileOutputStream, DataOutputStream 等，由于实现跟输入流类似，这里就不赘述了，大家可以自行查阅源码来理解。

字符输入输出流

字节输入输出流操作的对象是byte，那么字符输入输出流操作的对象自然是char，字符输入流的基类是Reader。

Reader

看一下内部读取方法的实现

public int read() throws IOException {

char cb[] = new char[1];

if (read(cb, 0, 1) == -1)

return -1;

else

return cb[0];

}

public int read(char cbuf[]) throws IOException {

return read(cbuf, 0, cbuf.length);

}

abstract public int read(char cbuf[], int off, int len) throws IOException;

可以看到方法读取的返回是char或者char数组。Reader只是一个基础的基类，它同样有BufferedReader来增加缓冲区提高读取效率，还有LineNumberReader来提供按行读取的封装。

我们重点看下InputStreamReader，它能够将InputStream字节输入流转化为字符输入流

InputStreamReader

public InputStreamReader(InputStream in) {

super(in);

try {

sd = StreamDecoder.forInputStreamReader(in, this, (String)null); // ## check lock object

} catch (UnsupportedEncodingException e) {

// The default encoding should always be available

throw new Error(e);

}

它的转换能力是通过Nio里面的StreamDecoder对象来获得的。

public int read() throws IOException {

return sd.read();

}

public int read(char cbuf[], int offset, int length) throws IOException {

return sd.read(cbuf, offset, length);

}

可以看到读取操作完全是通过sd来完成的。关于StreamDecoder的具体实现，以后有机会再深入吧。

写在最后

其实IO 还有很多我没有说到的知识点，我们常用的FileReader类也是继承于InputStreamReader。JDK中还有很多的IO类，还有字符输出流，基类是Writer，也有跟Reader类似的继承链，

Java IO 的实现其实很是很难，但是很繁琐，初看起来眼花缭乱，但其实我们也不需要死记硬背去把每个类的实现都记下来，我们只需要心中有个全局把握，掌握关于Java IO 的蓝图，使用的时候也会更有自信。

下一篇是关于Okio的理解，可以跟Java IO的实现做个对比，看大神们如何另辟蹊径，简化IO操作的。

附录

4种copy文件的方式对比测试：

public static void main(String[] args) throws IOException {

//parpare

long time = System.currentTimeMillis();

copyWithFile();

Log.d("FileInputStream take time:" + (System.currentTimeMillis() - time));

time = System.currentTimeMillis();

copyWithBuffer();

Log.d("BufferedInputStream take time:" + (System.currentTimeMillis() - time));

time = System.currentTimeMillis();

copyWithNIO();

Log.d("nio take time:" + (System.currentTimeMillis() - time));

time = System.currentTimeMillis();

copyWithNioDirect();

Log.d("nio direct take time:" + (System.currentTimeMillis() - time));

}

public static void copyWithFile() throws IOException {

FileInputStream fileInputStream = new FileInputStream(new File(BASE_PATH + "WPS2019.dmg"));

FileOutputStream fileOutputStream = new FileOutputStream(

new File(BASE_PATH + "wps.copy1")

);

byte[] bytes = new byte[1024];

int read = 0;

while ((read = fileInputStream.read(bytes)) != -1) {

fileOutputStream.write(read);

}

fileOutputStream.flush();

fileInputStream.close();

fileOutputStream.close();

}

public static void copyWithBuffer() throws IOException{

BufferedInputStream fileInputStream = new BufferedInputStream(new FileInputStream(new File(BASE_PATH + "WPS2019.dmg")));

BufferedOutputStream fileOutputStream = new BufferedOutputStream(new FileOutputStream(

new File(BASE_PATH + "wps.copy2")

));

byte[] bytes = new byte[1024];

int read = 0;

while ((read = fileInputStream.read(bytes)) != -1) {

fileOutputStream.write(read);

}

fileOutputStream.flush();

fileInputStream.close();

fileOutputStream.close();

}

public static void copyWithNIO() throws IOException{

FileChannel fileChannel = new FileInputStream(new File(BASE_PATH + "WPS2019.dmg"))

.getChannel();

ByteBuffer buffer = ByteBuffer.allocate(1024);

FileChannel fileChannel2 = new FileOutputStream(

new File(BASE_PATH + "wps.copy3")

).getChannel();

while ((fileChannel.read(buffer)) != -1) {

buffer.flip();

fileChannel2.write(buffer);

buffer.clear();

}

public static void copyWithNioDirect() throws IOException{

FileChannel fileChannel = new FileInputStream(new File(BASE_PATH + "WPS2019.dmg"))

.getChannel();

FileChannel fileChannel2 = new FileOutputStream(

new File(BASE_PATH + "wps.copy4")

).getChannel();

fileChannel.transferTo(0, fileChannel.size(), fileChannel2);

}

wps文件大小是190M，在我自己电脑上面的测试结果：

FileInputStream take time:1369

BufferedInputStream take time:114

nio take time:1745

nio direct take time:214

可以看到BufferedInputStream 速度最快，nio direct方式次之，另外二种速度比较慢。

aka卡贴人

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java io理解_彻底理解Java IO

Java中的IO类库设计的比较繁琐，IO这一块知识又是基础必备的，而且工作学习中经常用到。这一块知识看起来不难，但是想深入全面掌握也还是要花点功夫的。不是有句玩笑话说吗，“欠下的技术债总要还的”，刚好最近我准备总结一波Okhttp，Okio，就先把Java IO 这一块知识先做个总结，算是给后面2篇总结打个铺垫吧。(JDK源码基于1.8.0)流谈到IO，我们会想到从磁盘读取的文件IO，网络请求的S...
复制链接

扫一扫