FileInputStream中read()及read(byte b[])的用法

最新推荐文章于 2024-08-17 23:23:57 发布

仰子瞻

最新推荐文章于 2024-08-17 23:23:57 发布

阅读量1w

点赞数 7

文章标签： FileInputStream

本文链接：https://blog.csdn.net/u010389391/article/details/95508722

版权

3）、ByteArrayOutputStream

在介绍FileInputStream中的方法前，先简单介绍下编码相关的知识

一、编码

1、ANSI编码

American National Standards Institute ，即美国国家标准协会。当记事本或者软件采用Windows代码页中对应的"ANSI"编码时，在不同地区"ANSI"编码是不同的，在中国，"ANSI"就是GBK编码

2、ASCII编码

American Standard Code for Information Interchange，即美国信息交换标准代码。只能表示符合、字母、数字等，无法表示汉字，具体见：ASCII码对照表。

3、UTF-8编码

Unicode Transformation Format，是一套以8位为一个编码单位的针对Unicode的可变长度字符编码，所谓的可变长度，就是说针对不同的字符会分配不同长度的位数，范围在1~4个字节，英文用1个字节表示，中文用三个或四个字节表示（四个字节占少部分，如：𪚥）。通常情况下，一个汉字，utf-8编码时需要三个字节，gbk需要两个。
注意：微软自带的记事本，使用utf-8编码时，会在最前面加上三个字节，因此不推荐使用记事本，推荐使用notepad

二、FileInputStream

1、read()

/**
     * Reads a byte of data from this input stream. This method blocks
     * if no input is yet available.
     *
     * @return     the next byte of data, or <code>-1</code> if the end of the
     *             file is reached.
     * @exception  IOException  if an I/O error occurs.
     */
    public int read() throws IOException {
        return read0();
    }

注释的意思是：从当前输入流中读取一个字节（即8位二进制）的数据，返回值为该二进制数据的十进制值（范围：0~255），当到达该文件的末尾时，返回-1（要实现当前开始读取的位置紧跟上一次结束读取的位置，猜测有一个类似指针的东西指向下一次读取的位置）

下面用一段简单的代码演示
1）、打开notepad并简单的写入一个字“你”，将编码设置为UTF-8
2）、编码如下

public static void main(String[] args) throws Exception {
    FileInputStream inputStream = new FileInputStream("f:\\test");
    int b;
    while ((b = inputStream.read()) != -1) {
        System.out.print(b + " ");//结果：228 189 160
    }
}

2、read(byte b[])

/**
 * Reads up to <code>b.length</code> bytes of data from this input
 * stream into an array of bytes. This method blocks until some input
 * is available.
 *
 * @param      b   the buffer into which the data is read.
 * @return     the total number of bytes read into the buffer, or
 *             <code>-1</code> if there is no more data because the end of
 *             the file has been reached.
 * @exception  IOException  if an I/O error occurs.
 */
public int read(byte b[]) throws IOException {
    return readBytes(b, 0, b.length);
}

从该输入流中读取最多b.length个字节的数据，并存储到字节数组b中。返回值：返回存储到字节数据b中的字节总数（≤b.length）或者当到达该文件的末尾时返回-1。内部调用的readBytes方法是native类型，无法查看具体细节

/**
 * Reads a subarray as a sequence of bytes.
 * @param b the data to be written
 * @param off the start offset in the data
 * @param len the number of bytes that are written
 * @exception IOException If an I/O error has occurred.
 */
private native int readBytes(byte b[], int off, int len) throws IOException;

不过我们可以参考InputStream中的read(byte b[])方法

public int read(byte b[]) throws IOException {
    return read(b, 0, b.length);
}

public int read(byte b[], int off, int len) throws IOException {
    if (b == null) {
        throw new NullPointerException();
    } else if (off < 0 || len < 0 || len > b.length - off) {
        throw new IndexOutOfBoundsException();
    } else if (len == 0) {
        return 0;
    }

    int c = read();
    if (c == -1) {
        return -1;
    }
    b[off] = (byte)c;

    int i = 1;
    try {
        for (; i < len ; i++) {
            c = read();
            if (c == -1) {
                break;
            }
            b[off + i] = (byte)c;
        }
    } catch (IOException ee) {
    }
    return i;
}

看其中的第二个方法
1、首先调用read()方法，返回读取到的一个字节的十进制数或者当到达文件尾部时返回-1
2、为-1则整个方法直接返回-1
3、不为-1则将读取到的值赋值给b[off]，此处的off为0，表示将字节数组b从头开始赋值
4、开始for循环，循环中也是先调用read()方法，如果返回值为-1则终止该循环。如果返回值不为-1，则将读取到的值赋值给b[off + i]，依次循环
5、最后返回本次已经读取到的字节个数（字节个数不是b.length的值，而是i的值。且此时的字节个数至少为1，因此第3步已经读取了一次）

3、总结

read()方法返回字节的十进制数，或当到达文件尾部时返回-1；
read(byte b[])方法返回本次读取到的字节个数，且将读取到的字节存储到b中，或当到达文件尾部时返回-1

4、正确读取文件中的文本

将上面文件中的内容改成“你好世界”，大小为12个字节，通过调用read()方法，其UTF-8编码下的十进制值为：228 189 160 229 165 189 228 184 150 231 149 140

1）、错误方法1

public static void main(String[] args) throws Exception {
    FileInputStream inputStream = new FileInputStream("f:\\test");
    byte[] bytes = new byte[1024];
    while (inputStream.read(bytes) != -1) {
        String s = new String(bytes,"UTF-8");
        System.out.println(s);
    }
}

确实在当前情况下它能正常输出，但是如果文件中内容的字节数大于定义数组的长度1024时就会出现异常。可以改成：byte[] bytes = new byte[10]，此时输出结果为：
你好世�
��好世�
造成这个问题的本质原因是这段代码：String s = new String(bytes,"UTF-8")，通过调试模式（注意：在调试模式下每执行一次inputStream.read(bytes)，其返回结果都可能不同，因为它会从上一次读取的地方开始读）查看while中第一次读取到的bytes值为：-28 -67 -96 -27 -91 -67 -28 -72 -106 -25（byte数据类型的取值范围为-127~128，可以理解为在每个值上加上256，也就是228 189 160 229 165 189 228 184 150 231）；第二次读取到的值为：-107 -116 -96 -27 -91 -67 -28 -72 -106 -25（也就是149 140 160 229 165 189 228 184 150 231）。对比两次获取到的值发现，只有前两个十进制数值不一样，这是因为根据上面的read(byte b[])方法，第二次只读取了最后两个字节，且将这两个字节赋值给了字节数组bytes的前两位，因此才会出现这种错误。

2）、错误方法2

那么改成下面这样呢，第一次将10个字节数据转换成字符串，第二次只将2个字节数据转换成字符串，通过输出来看也没有解决问题

public static void main(String[] args) throws Exception {
    FileInputStream inputStream = new FileInputStream("f:\\test");
    byte[] bytes = new byte[10];
    int b;
    while ((b = inputStream.read(bytes)) != -1) {
        String s = new String(bytes, 0, b, "UTF-8");
        System.out.println(s);
    }
}
//输出
你好世�
��

问题就出在将一段完整的字节数据分割成多段并分别解码，那肯定会有问题。因此解决思路是额外定义一个字节数组byte[] newBytes，将每次while循环中获取到的bytes的bytes[0]到bytes[b-1]值追加到newBytes中，并且当newBytes长度不够时能自动扩容且拷贝数据。

3）、ByteArrayOutputStream

其实JDK已经提供了这样的一个类：ByteArrayOutputStream，此时无论buffer的长度设置多少都可以，且最后可直接调用其toString()获得文本内容

public static void main(String[] args) throws Exception {
    FileInputStream inputStream = new FileInputStream("f:\\test");
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    byte[] buffer = new byte[10];
    int n ;
    while ((n = inputStream.read(buffer)) != -1) {
        out.write(buffer, 0, n);
    }
    System.out.println(out.toString());
}