JAVA Blocking I/O

最新推荐文章于 2023-04-26 11:52:17 发布

david_huang_84

最新推荐文章于 2023-04-26 11:52:17 发布

阅读量811

点赞数 1

分类专栏： JAVA 文章标签： JAVA I/O BIO

本文链接：https://blog.csdn.net/u011497622/article/details/81280850

版权

JAVA 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

背景

JAVA I/O（Input/Output）是JAVA中一个非常重要的组件，因为文件读写、字节数组读写的应用场景非常多，例如，从配置文件中读取配置，将内存中的内容dump到数据文件中，图片文件、视频文件的读写与解析，网络I/O。因此对于JAVA I/O组件的了解是非常必要的。JAVA I/O分为BIO（Blocking IO）、NIO（Non-blocking IO）和AIO（Asynchronous IO），本文作为JAVA I/O系列中的一篇文章，主要介绍BIO。

字节流与字符流

在解释何为字节流何为字符流之前，我们需要先弄清楚什么是“流”。

在JAVA中，将input和output抽象为流，就像管道一样，只不过其中流动的是数据。这些数据按照不同的最小单元进行划分，可分为两种，一种以字节（Byte，8个bit）作为最小单元，即字节流；另一种则是以字符（根据不同的编码方式，占用不同的字节数）作为最小单元，即字符流。

字节流

字节流可分为InputStream（输入流）和OutputStream（输出流）。

我们集中精力在常见的字节流相关类与接口上，可得到如下所示的类继承关系图。

在我的工作中，遇到比较多的InputStream主要有FileInputStream，BufferedInputStream和ByteArrayInputStream；遇到比较多的OutputStream主要有FileOutputStream，BufferedOutputStream和ByteArrayOutputStream。在下边的篇幅中，我们将一一加以介绍。

FileInputStream

FileInputStream主要用于对文件的读取。

首先看构造函数。FileInputStream有三种构造函数：

根据filePath进行创建：内部转成2
根据File类进行创建：检查是否可读、文件路径是否合法，创建FileDescriptor，并通过JNI调用操作系统API打开文件
根据FileDescriptor进行创建：检查FD（FileDescriptor是否可读，然后做attach）

作为InputStream的子类，read方法是非常重要的，FileInputStream的read方法主要分为以下三种：

无参数read：从流中读取一个字节
传入byte[]的read：从流中读取数组长度的数据到数组中
传入byte[]以及offset和length：从流中读取数据到数组中，从offset开始读取，读取len长度的数据

看完FileInputStream的源码，很多人会比较敏锐地发现，由于在读取的过程中，需要通过JNI调用操作系统API对文件进行读取，读取的速度上肯定不如直接从内存中读取要快。但是如果文件很大，把全部文件都装载到内存中，也会对内存资源造成浪费，并给GC带来负担。那么有没有一种机制能够“聪明”地将文件中的内容按需加载一部分到内存中，从而有效地提升读取的效率呢？下边对BufferedInputStream的分析，将会对这个问题带来更加全面的阐述。

ByteArrayInputStream

首先看构造函数。与FileInputStream主要用于读取从文件中读取数据的使用场景不同，ByteArrayInputStream主要用于从byte数组中读取数据，而byte数组可能是其他函数已经读取或已经处理过的数据。ByteArrayInputStream有两种构造函数：

入参为byte数组的构造函数：将传入的byte数组赋值给ByteArrayInputStream内部的byte数组
入参为byte数组、offset和len的构造函数：除1的操作之外，还需要将当前位置设置为offset。

ByteArrayInputStream的read方法主要分为以下两种：

无参数read：从输入流（byte数组）中读取一个字节
传入byte[]以及offset和length：从流中读取数据到输入的数组中，注意ByteArrayInputStream内部是采用buf数组来存储数据的，因此此方法会从buf数组的offset开始读取，读取len长度的数据，使用System.arraycopy，本质上是通过JNI调用操作系统API进行数组间的数据拷贝。

BufferInputStream

刚才在介绍FileInputStream的时候已经了解到，如果每次都通过JNI调用操作系统API获取文件内容，相对于在内存中对文件内容进行预读取并建立缓存的方案相比，效率是低下的。而BufferedInputStream就是应对这样的使用场景的。它内置了缓存byte数组buf[]，通过count，pos，markPos等参数对缓存byte数组进行控制。

首先看构造函数，BufferedInputStream有两种构造函数：

入参为InputStream的构造函数：基于InputStream创建，缓存byte数组的size为8192
入参为InputStream和size的构造函数：与1类似，缓存byte数组的size等于传入的size值，相对于1，能够根据业务场景定义缓存byte数组的大小，内存管理更加有效率。

下边我们来看一下其read方法。先看无入参的read方法：

 /**
     * See
     * the general contract of the <code>read</code>
     * method of <code>InputStream</code>.
     *
     * @return     the next byte of data, or <code>-1</code> if the end of the
     *             stream is reached.
     * @exception  IOException  if this input stream has been closed by
     *                          invoking its {@link #close()} method,
     *                          or an I/O error occurs.
     * @see        java.io.FilterInputStream#in
     */
    public synchronized int read() throws IOException {
        // 判断当前位置是否大于等于count，初始情况下，二者均为0
        // 二次进入时，缓存已经填充完毕，因此直接跳过该语句块，直接从缓存中读取数据返回，大大加快读取速度
        if (pos >= count) {
            // 填充缓存，详见下面方法
            fill();
            if (pos >= count)
                return -1;
        }
        // 缓存填充完毕之后，从缓存中读取数据，将pos+1
        return getBufIfOpen()[pos++] & 0xff;
    }
 /**
     * Fills the buffer with more data, taking into account
     * shuffling and other tricks for dealing with marks.
     * Assumes that it is being called by a synchronized method.
     * This method also assumes that all data has already been read in,
     * hence pos > count.
     */
    private void fill() throws IOException {
        byte[] buffer = getBufIfOpen();
        // 首次执行的时候，该值为-1，通过mark方法可以将改值设置为当时的pos
        if (markpos < 0)
            pos = 0;            /* no mark: throw away the buffer */
        // 如果设置了markpos，则markpos为当时的pos，走入下述分支
        // 如果当前的pos已经大于了buffer的长度，即需要重新填充buffer了
        else if (pos >= buffer.length)  /* no room left in buffer */
            if (markpos > 0) {  /* can throw away early part of the buffer */
                int sz = pos - markpos;
                System.arraycopy(buffer, markpos, buffer, 0, sz);
                pos = sz;
                markpos = 0;
            } else if (buffer.length >= marklimit) {
                // buffer数组长度超读取的限制（mark函数设置），从0位置开始重新填充
                markpos = -1;   /* buffer got too big, invalidate mark */
                pos = 0;        /* drop buffer contents */
            } else if (buffer.length >= MAX_BUFFER_SIZE) {
                // buffer数组长度超限，抛Error
                throw new OutOfMemoryError("Required array size too large");
            } else {            /* grow buffer */
                // buffer扩容
                int nsz = (pos <= MAX_BUFFER_SIZE - pos) ?
                        pos * 2 : MAX_BUFFER_SIZE;
                if (nsz > marklimit)
                    nsz = marklimit;
                // 构造更大的buffer，将原始buffer中的内容拷贝过来
                byte nbuf[] = new byte[nsz];
                System.arraycopy(buffer, 0, nbuf, 0, pos);
                if (!bufUpdater.compareAndSet(this, buffer, nbuf)) {
                    // Can't replace buf if there was an async close.
                    // Note: This would need to be changed if fill()
                    // is ever made accessible to multiple threads.
                    // But for now, the only way CAS can fail is via close.
                    // assert buf == null;
                    throw new IOException("Stream closed");
                }
                buffer = nbuf;
            }
        count = pos;
        // 从流中读取数据，填充到buffer缓存中
        int n = getInIfOpen().read(buffer, pos, buffer.length - pos);
        if (n > 0)
            // 设置count
            count = n + pos;
    }

其大致流程为：

如果pos大于等于count，则说明是首次执行（二者初始值都为0），或多次读取之后，buffer中的数据均已读取完毕，或本身请求的位置就大于了buffer的count，此时需要重新填充buffer，即调用fill方法，从内置的InputStream中读取数据，填充缓存buffer；
如果pos值并没有超出buffer缓存的范围，则直接从buffer缓存中取数据，并且将pos游标+1。

通过这样的机制，可以做到仅当首次读取或超出缓存buffer范围之外的时候，才会从底层InputStream中读取数据（），其余场景均可直接从缓存buffer中读取数据。由于缓存buffer是在内存中，因此读取速度能够大大加快。

接下来我们看一下，根据offset和len批量获取数据的read方法，具体实现代码如下：

 /**
     * Reads bytes from this byte-input stream into the specified byte array,
     * starting at the given offset.
     *
     * <p> This method implements the general contract of the corresponding
     * <code>{@link InputStream#read(byte[], int, int) read}</code> method of
     * the <code>{@link InputStream}</code> class.  As an additional
     * convenience, it attempts to read as many bytes as possible by repeatedly
     * invoking the <code>read</code> method of the underlying stream.  This
     * iterated <code>read</code> continues until one of the following
     * conditions becomes true: <ul>
     *
     *   <li> The specified number of bytes have been read,
     *
     *   <li> The <code>read</code> method of the underlying stream returns
     *   <code>-1</code>, indicating end-of-file, or
     *
     *   <li> The <code>available</code> method of the underlying stream
     *   returns zero, indicating that further input requests would block.
     *
     * </ul> If the first <code>read</code> on the underlying stream returns
     * <code>-1</code> to indicate end-of-file then this method returns
     * <code>-1</code>.  Otherwise this method returns the number of bytes
     * actually read.
     *
     * <p> Subclasses of this class are encouraged, but not required, to
     * attempt to read as many bytes as possible in the same fashion.
     *
     * @param      b     destination buffer.
     * @param      off   offset at which to start storing bytes.
     * @param      len   maximum number of bytes to read.
     * @return     the number of bytes read, or <code>-1</code> if the end of
     *             the stream has been reached.
     * @exception  IOException  if this input stream has been closed by
     *                          invoking its {@link #close()} method,
     *                          or an I/O error occurs.
     */
    public synchronized int read(byte b[], int off, int len)
        throws IOException
    {
        getBufIfOpen(); // Check for closed stream
        if ((off | len | (off + len) | (b.length - (off + len))) < 0) {
            throw new IndexOutOfBoundsException();
        } else if (len == 0) {
            return 0;
        }

        int n = 0;
        for (;;) {
            // 调用read1方法
            int nread = read1(b, off + n, len - n);
            if (nread <= 0)
                return (n == 0) ? nread : n;
            n += nread;
            // 如果累计的已读取数据大于等于len，即读取到了全部所需的数据了，返回
            if (n >= len)
                return n;
            // 否则，检查InputStream是否还有效，有效就进行下一轮读取，无效则说明读取完成，返回
            // if not closed but no bytes available, return
            InputStream input = in;
            if (input != null && input.available() <= 0)
                return n;
        }
    }
 /**
     * Read characters into a portion of an array, reading from the underlying
     * stream at most once if necessary.
     */
    private int read1(byte[] b, int off, int len) throws IOException {
        // 获取buffer缓存中目前仍可读取的数据个数
        int avail = count - pos;
        if (avail <= 0) {
            /* If the requested length is at least as large as the buffer, and
               if there is no mark/reset activity, do not bother to copy the
               bytes into the local buffer.  In this way buffered streams will
               cascade harmlessly. */
            if (len >= getBufIfOpen().length && markpos < 0) {
                return getInIfOpen().read(b, off, len);
            }
            // 填充buffer缓存
            fill();
            avail = count - pos;
            if (avail <= 0) return -1;
        }
        int cnt = (avail < len) ? avail : len;
        // 从缓存buffer中获取数据，填充到入参byte数组中
        System.arraycopy(getBufIfOpen(), pos, b, off, cnt);
        pos += cnt;
        return cnt;
    }

从中不难发现，其实现方式与上一个方法比较类似，依然是先check缓存buffer中数据是否已经全部读完，如果已经读完则从InputStream中获取，并填充缓存buffer，反之则从缓存buffer中获取数据。

FileOutputStream

与FileInputStream相对应的用于写入文件的OutputStream为FileOutputStream，其主要的使用场景是向文件中写入内容。

首先看构造函数，其构造函数的主要流程就是确定要写入的文件（无论是以文件名作为入参还是直接将File对象作为入参），check文件是否合法，通过JNI调用操作系统API打开文件（可以指定是否采用append模式，即指定是向文件中追加还是覆盖该文件）。

接下来看write方法，即向文件中写入内容的方法，其本质都是通过JNI调用操作系统API向文件中写入数据。

ByteArrayOutputStream

与ByteInputStream相对应的用于向byte数组中写入数据的OutputStream为ByteArrayOutputStream。

首先看构造函数，其构造函数的主要流程就是构建数据写入的载体--byte数组，可根据业务场景指定byte数组的size，也可以采用默认值--32。

接下来看write方法，其实现的大致流程均为先check数组容量是否足够写入该数据，如果不够的话需要做数组扩容，然后向数组中写入数据。

BufferedOutputStream

从上述FileOutputStream的write方法实现中不难发现这样的问题，如果每次写入（即使是写入一个byte）都是通过JNI调用操作系统API操作文件，在性能上是存在问题的。在计算机科学领域，解决这个问题的办法很简单，那就是尽可能地批量向文件中做写入，BufferedOutputStream就这样应运而生了。

其write方法实现方式大致如下：

在内部保有缓存数组buf，当外部调用BufferedOutputStream的write方法时，首先检查buf是否已满，如果已满则执行flushBuffer，在该方法内部调用BufferedOutputStream内部的OutputStream，做写入操作；
如果buf没有满，则向缓存数据buf中添加数据，不真正做写入操作。

字符流

在讲解字符流之前，我们先来回顾一下什么是字符，以及字节与字符的关系。

字节是与字符编码方式无关的，而字符是与字符编码方式有关的。以发电报为例举个形象点的例子，字节流就像莫尔斯电码，每个人都可以获取到，但是字符流就像翻译好的电报，如果没有采用正确的字符编码方式进行编解码，那么得到的将是乱码。

几种常见的字符编码方式为：ASCII，UTF-8（UTF-16，UTF-32）。

与字节流类似，字符流也分为输入字符流（Reader）和输出字符流（Writer）。

我们集中在主要的字符输入流和输出字符流上，可以得到如下类图。

FileReader

FileReader继承了InputStreamReader，内置了StreamDecoder，用于将字节解析成字符。

在FileReader的构造函数中，无论是传入文件名还是直接传入File对象，FileReader都会基于文件创建FileInputStream。

当调用read方法的时候，无论是无入参read（读取单个字符）还是根据offset、length来读取，本质上都是通过内置的StreamDecoder进行读取，在其中进行字节到字符的解码工作。

BufferedReader

BufferedReader，是内置有缓存char数组和InputStream的Reader。

其构造函数中，可指定缓存char数组的size，如果不指定的话就采用默认值：8192。不过出于对内存资源高效管理的目标，建议根据业务场景对缓存char数组的size进行指定。

重点在其read方法，BufferedReader共有三种read方法：

无入参的read方法，读取一个字符；
给出offset和len的read方法，批量读取一定数目的字符；
readLine方法，一次读取一行的内容。

read方法内部的大致逻辑均为，首先check当前要读的位置是否已经超出了内置缓存char数组的size，如果超出了，则从内置的Reader中，读取一定量的数据填充到缓存char数组中，方便后续的持续读取。

FileWriter

FileWriter继承了OutputStreamWriter，内置了StreamEncoder，用于将字符解析成字节，从而写入到OutputStream中去（OutputStream是字节流）。

在FileWriter的构造函数中，无论是传入文件名还是直接传入File对象，FileWriter都会基于文件创建FileOutputStream。

当调用write方法的时候，无论是无入参write（写单个字符）还是根据offset、length来write，本质上都是通过内置的StreamEncoder进行字符到字节的转换，然后写入到OutputStream中去。

BufferedWriter

BufferedWriter，是内置有缓存char数组和OutputStream的Writer。

重点在其write方法，BufferedWriter共有三种常用的write方法：

write(int c)，向输出流中写入单个字符；
write(String s, int off, int len)，向输出流中写入字符串；
write(char cbuf[], int off, int len)，向输出流中批量写入字符数组。

其逻辑在思路上均大同小异，首先check要写入的位置是否超出了缓存char数组的范围，如果超出了，则flushBuffer，将缓存char数组中的内容一次性写入到输出流中，否则，将数据存储到缓存char数组中。这样可以有效地做到批量写入。

JAVA BIO的局限性

本文中介绍的输入输出流均为JAVA BIO中的范畴（JAVA blocking I/O），其在性能上是存在一定的局限性的。例如，对于BufferedInputStream和BufferedOutputStream，其read方法和write方法中，均加了synchronized关键字，即在方法层级上是不能支持多线程并发执行的，因此在现今主流的需要支撑大流量、高并发、低延迟的系统中应用并不是很多。但因其足够简单，并初步给予开发者有关I/O的实现思路，因此会在一些不是很care性能的场景，比如容器启动的时候单线程读取size不大的配置文件、监控程序单线程地dump执行中的trace等场景仍保有一席之地。后续的文章中，我们会针对能够解决BIO性能不高、并发程度不高的JAVA NIO和AIO进行介绍，并深入到多线程网络通信框架中。

david_huang_84

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
JAVA Blocking I/O

背景JAVA I/O（Input/Output）是JAVA中一个非常重要的组件，因为文件读写、字节数组读写的应用场景非常多，例如，从配置文件中读取配置，将内存中的内容dump到数据文件中，图片文件、视频文件的读写与解析，网络I/O。因此对于JAVA I/O组件的了解是非常必要的。JAVA I/O分为BIO（Blocking IO）、NIO（Non-blocking IO）和AIO（Asynch...
复制链接

扫一扫