Java IO类库之DataInputStream和DataOutputStream

最新推荐文章于 2024-01-07 09:44:13 发布

weixin_34176694

最新推荐文章于 2024-01-07 09:44:13 发布

阅读量385

点赞数

文章标签： java python 操作系统

原文链接：https://my.oschina.net/zhangyq1991/blog/1860714

版权

2019独角兽企业重金招聘Python工程师标准>>>

一、DataInputStream

1 - DataInputStream介绍

DataInputStream属于数据输入流，继承自FilterInputStream，使用了装饰器模式通过实现DataInput接口允许程序以机器无关的方式从绑定的底层输入流中读取JAVA内置的基础数据类型。应用程序可以使用DataInputStream读取之前由DataOutputStream写入的数据。

2 - DataInputStream源码分析

1）类成员变量

    /**
     * working arrays initialized on demand by readUTF
     */
    private byte bytearr[] = new byte[80];
    private char chararr[] = new char[80];

由注释可知bytearr和chararr这两个成员变量在readUTF方法里需要用到这里先跳过，后续讲解该方法时再解释他们的作用

2）成员方法

    //构造方法，绑定一个要装饰的底层输入流
    public DataInputStream(InputStream in) {
        super(in);
    }

    //从输入流中读取一段字节数据并存储到字节数组b中
    public final int read(byte b[]) throws IOException {
        return in.read(b, 0, b.length);
    }
    //功能与read(byte b[])一样，off指定了字节数组b开始存储字节数据的起始位置,len表示读取的字节个数
    public final int read(byte b[], int off, int len) throws IOException {
        return in.read(b, off, len);
    }

    //从输入流中读取字节数据并存储到字节数组b中，数组b没有填满则一直读取，直到填满数组，如果字节数组b的长度大于输入流大
    //小,那么抛出EOFException异常
    public final void readFully(byte b[]) throws IOException {
        readFully(b, 0, b.length);
    }
    //从输入流中读取字节数据并存储到字节数组b中，数组b没有填满则一直读取，直到填满数组，如果len-off大于输入流in剩余可 
    //读字节大小，那么抛出EOFException异常
    public final void readFully(byte b[], int off, int len) throws IOException {
        if (len < 0)
            throw new IndexOutOfBoundsException();
        int n = 0;
        while (n < len) {
            int count = in.read(b, off + n, len - n);
            if (count < 0)
                throw new EOFException();
            n += count;
        }
    }

    //尝试跳过n个字节，输入流剩余可读字节可能小于n故实际可能跳过的字节数小于n
    public final int skipBytes(int n)
    
    //从输入流中读取boolean类型的值
    public final boolean readBoolean() throws IOException {
        int ch = in.read();
        if (ch < 0)
            throw new EOFException();
        return (ch != 0);
    }
    
    //从输入流中读取Byte类型的值
    public final byte readByte() throws IOException {
        int ch = in.read();
        if (ch < 0)
            throw new EOFException();
        return (byte)(ch);
    }

    //从输入流中读取无符号Byte类型值，也就是读取值为整数的byte值
    public final int readUnsignedByte() throws IOException {
        int ch = in.read();
        if (ch < 0)
            throw new EOFException();
        return ch;
    }

    //从输入流中读取有符号short(占2个字节)类型值，因为JAVA IO采用的高位编址，所以高位ch1需要左移8位
    public final short readShort() throws IOException {
        int ch1 = in.read();
        int ch2 = in.read();
        if ((ch1 | ch2) < 0)
            throw new EOFException();
        return (short)((ch1 << 8) + (ch2 << 0));
    }

    //从输入流中读取无符号short类型值
    public final int readUnsignedShort() throws IOException {
        int ch1 = in.read();
        int ch2 = in.read();
        if ((ch1 | ch2) < 0)
            throw new EOFException();
        return (ch1 << 8) + (ch2 << 0);
    }

    //从输入流中读取char类型值
    public final char readChar() throws IOException {
        int ch1 = in.read();
        int ch2 = in.read();
        if ((ch1 | ch2) < 0)
            throw new EOFException();
        return (char)((ch1 << 8) + (ch2 << 0));
    }

    //从输入流中读取int类型值(4位)
    public final int readInt() throws IOException {
        int ch1 = in.read();
        int ch2 = in.read();
        int ch3 = in.read();
        int ch4 = in.read();
        if ((ch1 | ch2 | ch3 | ch4) < 0)
            throw new EOFException();
        return ((ch1 << 24) + (ch2 << 16) + (ch3 << 8) + (ch4 << 0));
    }

以上成员方法逻辑较为简单不做分析，下面我们重点分析一下readLong、readDouble、readFloat和readUTF方法

1)readLong方法

    private byte readBuffer[] = new byte[8];

    /**
     * See the general contract of the <code>readLong</code>
     * method of <code>DataInput</code>.
     * <p>
     * Bytes
     * for this operation are read from the contained
     * input stream.
     *
     * @return     the next eight bytes of this input stream, interpreted as a
     *             <code>long</code>.
     * @exception  EOFException  if this input stream reaches the end before
     *               reading eight bytes.
     * @exception  IOException   the stream has been closed and the contained
     *             input stream does not support reading after close, or
     *             another I/O error occurs.
     * @see        java.io.FilterInputStream#in
     */
    public final long readLong() throws IOException {
        readFully(readBuffer, 0, 8);
        return (((long)readBuffer[0] << 56) +
                ((long)(readBuffer[1] & 255) << 48) +
                ((long)(readBuffer[2] & 255) << 40) +
                ((long)(readBuffer[3] & 255) << 32) +
                ((long)(readBuffer[4] & 255) << 24) +
                ((readBuffer[5] & 255) << 16) +
                ((readBuffer[6] & 255) <<  8) +
                ((readBuffer[7] & 255) <<  0));
    }

readLong方法读取输入流8个字节并转化为一个长整形值，注意方法开始必须从输入流阻塞读满8个字节数据到readBuffer字节数组中，如果未读满8个字节且到达输入流末尾，那么抛出EOFException异常，反序列化与序列化相对应，遵循高位编址，值高位保存在输入流低位，输入流从低位开始读取。

2)readDouble方法

    public final double readDouble() throws IOException {
        return Double.longBitsToDouble(readLong());
    }

由源码可知readDouble是先将充分读取输入流8个字节转化为对应长整形值，然后根据改长整形值还原写入字节输入流之前的Double值。

3)readUTF方法

该方法逻辑比较复杂也是本类要讲的重点方法

    public final static String readUTF(DataInput in) throws IOException {
        //从输入流中读取无符号short类型的值，使用UTF-8编码的字节输入流前2个字节保存的是字节数据的长度
        int utflen = in.readUnsignedShort();//获取输入流长度
        byte[] bytearr = null;
        char[] chararr = null;
        //分配字节数组bytearr和字符数组chararr
        if (in instanceof DataInputStream) {
            DataInputStream dis = (DataInputStream)in;
            if (dis.bytearr.length < utflen){
                dis.bytearr = new byte[utflen*2];
                dis.chararr = new char[utflen*2];
            }
            chararr = dis.chararr;
            bytearr = dis.bytearr;
        } else {
            bytearr = new byte[utflen];
            chararr = new char[utflen];
        }

        int c, char2, char3;
        int count = 0;
        int chararr_count=0;
        //从数据输入流中读取字节数据到bytearr中直到读满utflen个字节
        in.readFully(bytearr, 0, utflen);
        //因为UTF-8编码的字节流中一个字符占用的字节数1-4个字节不等，这里相当于预处理输入流中的单字节符号
        while (count < utflen) {
            c = (int) bytearr[count] & 0xff;
            //UTF-8的每个字节值都不会超过127，超过127则退出
            if (c > 127) break;
            count++;
            chararr[chararr_count++]=(char)c;
        }
        //处理完单字节符号之后接下来基于占字节数不同的UTF-8通用格式和第1字节特征处理UTF-8
        while (count < utflen) {
            //将每个字节转换成int值
            c = (int) bytearr[count] & 0xff;
            //转换后的int值c右移4位
            switch (c >> 4) {
                //若UTF-8是单字节，即bytearrcount[count]对应的是UTF-8单字节约定的"0xxxxxxx"通用格式
                //那么c的取值范围在0-7之间，单字节UTF-8字符直接对int值转化即可
                case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
                    /* 0xxxxxxx*/
                    count++;
                    chararr[chararr_count++]=(char)c;
                    break;
                //若UTF-8是2个字节，即bytearr[count]对应通用格式是"110xxxxx 10xxxxxx"通用格式的第一个
                //那么bytearr[count]对应的int值c的取值范围是12-13，需进行移位运算之后转为相应字符
                case 12: case 13:
                    /* 110x xxxx   10xx xxxx*/
                    count += 2;
                    if (count > utflen)
                        throw new UTFDataFormatException(
                            "malformed input: partial character at end");
                    char2 = (int) bytearr[count-1];
                    if ((char2 & 0xC0) != 0x80)
                        throw new UTFDataFormatException(
                            "malformed input around byte " + count);
                    chararr[chararr_count++]=(char)(((c & 0x1F) << 6) |
                                                    (char2 & 0x3F));
                    break;
                //若UTF-8是三个字节，即bytearr[count]对应的是1110xxxx 10xxxxxx 10xxxxxx通用格式中的第一个
                //那么对应的c取值是14
                case 14:
                    /* 1110 xxxx  10xx xxxx  10xx xxxx */
                    count += 3;
                    if (count > utflen)
                        throw new UTFDataFormatException(
                            "malformed input: partial character at end");
                    char2 = (int) bytearr[count-2];
                    char3 = (int) bytearr[count-1];
                    if (((char2 & 0xC0) != 0x80) || ((char3 & 0xC0) != 0x80))
                        throw new UTFDataFormatException(
                            "malformed input around byte " + (count-1));
                    chararr[chararr_count++]=(char)(((c     & 0x0F) << 12) |
                                                    ((char2 & 0x3F) << 6)  |
                                                    ((char3 & 0x3F) << 0));
                    break;
                default:
                    /* 10xx xxxx,  1111 xxxx */
                    throw new UTFDataFormatException(
                        "malformed input around byte " + count);
            }
        }
        // The number of chars produced may be less than utflen
        return new String(chararr, 0, chararr_count);
    }

readUTF方法的作用是从输入流中读取UTF-8编码数据，并以String字符串的形式返回，下面是readUTF方法的方法流程逻辑：

1）读取输出流中UTF-8字节数据的长度

2）创建两个数组字节数组bytearr和字符数组chararr分别用于保存输入流utf-8字节数据和转换后的字符数据。

这里它首先判断方法传入的输入流in是不是DataInputStream：

如果不是，新建数组bytearr和chararr两个数组分配的容量都等于开始读取的UTF-8字节数据的长度，因为无法预测UTF-8字符串所有字符占用的字节数因此chararr数组假设都为单字节字符分配得最大容量；

如果是，判断数据输入流in成员变量bytearr的数组长度是否小于UTF-8字节数据长度：若小于，则成员变量bytearr和chararr均扩增为UTF-8字节数据长度的两倍，设置bytearr和chararr指向数据输入流in的两个成员变量bytearr和chararr。（这里不理解为什么bytearr和chararr需要扩增为utflen的两倍不是平白浪费半的空间吗）

3）将UTF-8数据全部读取到字节数组bytearr中

4）对UTF-8中的单字节数据进行预处理

5）对4）预处理之后的数据，继续进行处理，因为UTF-8字符占用1~4字节不等，我们需要根据占用字节数不同的UTF-8通用格式，通过转化UTF-8首个byte为int值并右移4位区分UTF-8占用的是几个字节，然后分别进行字符转化处理

6）将字符数组chararr转化为字符串并返回。

二、DataOutputStream

1 - DataOutputStream介绍

DataOutputStream是数据输出流，继承自FilterOutputStream，用于装饰其他输出流，通过实现DataOutput接口为绑定的输出流提供写入JAVA内置基础数据类型的额外功能，应用程序可以通过DataInputStream读取由DataOutputStream写入的基础数据类型。

2 - DataOutputStream源码分析

public
class DataOutputStream extends FilterOutputStream implements DataOutput {
    //数据输出流写入的字节数
    protected int written;

    //数据输出流的字节数组用于保存数据输出流写入的数据
    private byte[] bytearr = null;

    //构造方法，绑定其他输出流
    public DataOutputStream(OutputStream out) {
        super(out);
    }

    //增加数据输出流已写入字节数值written
    private void incCount(int value) {
        int temp = written + value;
        if (temp < 0) {
            temp = Integer.MAX_VALUE;
        }
        written = temp;
    }

    //将byte对应int值写入到数据输出流
    public synchronized void write(int b) throws IOException {
        out.write(b);
        incCount(1);
    }

    //将字节数组从off开始的len个字节写入到数据输出流
    public synchronized void write(byte b[], int off, int len)
        throws IOException
    {
        out.write(b, off, len);
        incCount(len);
    }

    //清空缓冲将缓冲中的数据都写入到输出流中
    public void flush() throws IOException {
        out.flush();
    }

    //将Boolean值写入到数据输出流中
    public final void writeBoolean(boolean v) throws IOException {
        out.write(v ? 1 : 0);
        incCount(1);
    }

    //将byte类型值写入到数据输出流中
    public final void writeByte(int v) throws IOException {
        out.write(v);
        incCount(1);
    }

    //将shor类型值写入到数据输出流中
    public final void writeShort(int v) throws IOException {
        out.write((v >>> 8) & 0xFF);
        out.write((v >>> 0) & 0xFF);
        incCount(2);
    }

    //将char类型值写入到数据输出流中，注意char占2个字节
    public final void writeChar(int v) throws IOException {
        out.write((v >>> 8) & 0xFF);
        out.write((v >>> 0) & 0xFF);
        incCount(2);
    }

    //将Int类型值写入数据输出流中
    public final void writeInt(int v) throws IOException {
        out.write((v >>> 24) & 0xFF);
        out.write((v >>> 16) & 0xFF);
        out.write((v >>>  8) & 0xFF);
        out.write((v >>>  0) & 0xFF);
        incCount(4);
    }

    private byte writeBuffer[] = new byte[8];

    //将long类型值写入到数据输出流中，long占8字节
    public final void writeLong(long v) throws IOException {
        writeBuffer[0] = (byte)(v >>> 56);
        writeBuffer[1] = (byte)(v >>> 48);
        writeBuffer[2] = (byte)(v >>> 40);
        writeBuffer[3] = (byte)(v >>> 32);
        writeBuffer[4] = (byte)(v >>> 24);
        writeBuffer[5] = (byte)(v >>> 16);
        writeBuffer[6] = (byte)(v >>>  8);
        writeBuffer[7] = (byte)(v >>>  0);
        out.write(writeBuffer, 0, 8);
        incCount(8);
    }

    //将float类型值写入到数据输出流中，注意float类型值写入先转化为对应字节位整形再以整形值写入
    public final void writeFloat(float v) throws IOException {
        writeInt(Float.floatToIntBits(v));
    }

    //将double类型值写入到数据输入流中
    public final void writeDouble(double v) throws IOException {
        writeLong(Double.doubleToLongBits(v));
    }

    //将string类型值写入到数据输出流中实际写入时是将String对应的每个字符转换成byte数据后写入输出流中
    public final void writeBytes(String s) throws IOException {
        int len = s.length();
        for (int i = 0 ; i < len ; i++) {
            out.write((byte)s.charAt(i));
        }
        incCount(len);
    }

    //将String类型值写入数据输入流，实际写入时是将String每个字符转化为char数据后写入输出流
    public final void writeChars(String s) throws IOException {
        int len = s.length();
        for (int i = 0 ; i < len ; i++) {
            int v = s.charAt(i);
            out.write((v >>> 8) & 0xFF);
            out.write((v >>> 0) & 0xFF);
        }
        incCount(len * 2);
    }

    //将UTF-8编码字符串写入到数据输出流中
    public final void writeUTF(String str) throws IOException {
        writeUTF(str, this);
    }

   
    static int writeUTF(String str, DataOutput out) throws IOException {
        //获取String长度
        int strlen = str.length();
        //统计utf-8字节数
        int utflen = 0;
        int c, count = 0;

        //统计UTF-8字节数，根据UTF-8首字符判断UTF-8是由几个字节组成的
        for (int i = 0; i < strlen; i++) {
            c = str.charAt(i);
            if ((c >= 0x0001) && (c <= 0x007F)) {
                utflen++;
            } else if (c > 0x07FF) {
                utflen += 3;
            } else {
                utflen += 2;
            }
        }
        //如果读取的字节数超出65535字节抛出UTFDataFormatException异常编码字节过长
        if (utflen > 65535)
            throw new UTFDataFormatException(
                "encoded string too long: " + utflen + " bytes");
        //创建字节数组bytearr
        byte[] bytearr = null;
        //如果传入的输出流是DataOutputStream或其子类，如果out成员变量bytearr为空或者长度小于UTF-8编码字节流长度
        //utflen，那么扩容为utflen的2倍+2，并让外部字节数组bytearr指向它，注意多分配2个字节用于记录
        //UTF-8编码字节流长度
        if (out instanceof DataOutputStream) {
            DataOutputStream dos = (DataOutputStream)out;
            if(dos.bytearr == null || (dos.bytearr.length < (utflen+2)))
                dos.bytearr = new byte[(utflen*2) + 2];
            bytearr = dos.bytearr;
        } else {
            bytearr = new byte[utflen+2];
        }

        //写入UTF-8字节长度
        bytearr[count++] = (byte) ((utflen >>> 8) & 0xFF);
        bytearr[count++] = (byte) ((utflen >>> 0) & 0xFF);

        int i=0;
        //对UTF-8单字节数据进行预处理
        for (i=0; i<strlen; i++) {
           c = str.charAt(i);
           if (!((c >= 0x0001) && (c <= 0x007F))) break;
           bytearr[count++] = (byte) c;
        }
        
        //对预处理之后的数据，接着进行处理
        for (;i < strlen; i++){
            c = str.charAt(i);
            //UTF-8是单字节数据写入
            if ((c >= 0x0001) && (c <= 0x007F)) {
                bytearr[count++] = (byte) c;

            }
            //UTF-8是3字节数据写入
            else if (c > 0x07FF) {
                bytearr[count++] = (byte) (0xE0 | ((c >> 12) & 0x0F));
                bytearr[count++] = (byte) (0x80 | ((c >>  6) & 0x3F));
                bytearr[count++] = (byte) (0x80 | ((c >>  0) & 0x3F));
            }
            //UTF-8是双字节数据写入 
            else {
                bytearr[count++] = (byte) (0xC0 | ((c >>  6) & 0x1F));
                bytearr[count++] = (byte) (0x80 | ((c >>  0) & 0x3F));
            }
        }
        //写入将字节数组bytearr写入数据输入流中
        out.write(bytearr, 0, utflen+2);
        return utflen + 2;
    }

    //返回输出流中写入的字节数
    public final int size() {
        return written;
    }
}

总体而言，DataOutputStream的成员方法逻辑较为简单，我们可以重点关注逻辑复杂的writeUTF方法，它的流程逻辑如下：

1）获取UTF-8编码字符串长度strlen;

2）统计UTF-8编码字节流长度utflen，如果字节数超出2个字节16位计数范围2^16-1(65535)，那么抛出异常编码字节数据过长；

3）声明字节数组bytearr保存编码字节数据，字节数组容量分配遵循下述规则：

1-如果当前输入流out不是DataOutputStream及其子类实例对象，那么初始化字节数组bytearr数组长度为utflen；

2- 如果当前输入流out是DataOutputStream及其子类实例对象，那么若数据输入流out内部成员变量bytearr为空或者数组长度小于utflen+2，那么out成员变量字节数组bytearr扩容为utflen*2+2，让声明数组bytearr指向它，否则直接指向数据输入流out的内部字节数组bytearr。

4）将标识编码字节流长度的2个字节写入输出缓冲字节数组bytearr；

5）对UTF-8字符串单字节字符进行预处理，遇到非单字节UTF-8字符直接退出；

6）对预处理之后的UTF-8字符数据区分单字节多字节编码处理后写入输出缓冲字节数组bytearr；

7）将缓冲字节数组bytearr写入到数据输出流中；

8）返回写入的字节长度utflen+2(包括字节长度标识位2个字节)

转载于:https://my.oschina.net/zhangyq1991/blog/1860714