Okio读写流源码详解（第三篇(GzipSink压缩源码详解)）

最新推荐文章于 2023-01-09 16:19:11 发布

飞雨的夏天

最新推荐文章于 2023-01-09 16:19:11 发布

阅读量1.8k

点赞数 1

分类专栏： java/io流文章标签：源码框架 buffer io 压缩

本文链接：https://blog.csdn.net/xiatiandefeiyu/article/details/78026939

版权

java/io流专栏收录该内容

4 篇文章 1 订阅

订阅专栏

看源码，首先得熟练掌握这个api怎么用，那么先看看这两个类怎么用的，先看GzipSink怎么用的

/**
	 * 压缩
	 */
	private static void zipCompress() {
		String filePath = "D:/1.txt";

	try {
		
		Sink sink=Okio.sink(new File(filePath));
		BufferedSink gzipSink = Okio.buffer(new GzipSink(sink));
		gzipSink.writeUtf8("中国好男儿");
		gzipSink.flush();
		gzipSink.close();
	} catch (Exception e) {
		// TODO Auto-generated catch block
		e.printStackTrace();
	}
	}

上两篇已经介绍过，Okio最核心的模式是采用了装饰者模式，用 GzipSink装饰 Sink ，再用 BufferedSink 装饰GzipSink，好首先写方法先经过 BufferedSink 处理，然后交给 GzipSink

最后交给Sink ，可以把他们当做加工器，每次产品经过这个生产线，那么这个生成线吧数据加工成它职责的产品。也就是BufferedSink -GzipSink-Sink ！那么入BufferedSink 的writeUtf8方法

@Override public BufferedSink writeUtf8(String string) throws IOException {
    if (closed) throw new IllegalStateException("closed");
    buffer.writeUtf8(string);
    return emitCompleteSegments();
  }

既 Buffer 的 writeUtf8方法，此方法在上两篇中已详细介绍，这篇不再阐述！就是将数据写到链表中，补充一下这个方法 emitCompleteSegments();

 public BufferedSink emitCompleteSegments() throws IOException {
    if (closed) throw new IllegalStateException("closed");
    long byteCount = buffer.completeSegmentByteCount();
    if (byteCount > 0) sink.write(buffer, byteCount);
    return this;
  }

当向链表写完数据以后，检查链表是否超过两个 Segment，如果超过则将前面的 Segment数据先提前写入最终的字节流中！接下来看 flush

  @Override public void flush() throws IOException {
    if (closed) throw new IllegalStateException("closed");
    if (buffer.size > 0) {
      sink.write(buffer, buffer.size);
    }
    sink.flush();
  }

此时的 sink相当于GzipSink，好，把数据交给GzipSink，让GzipSink进行数据压缩，进入GzipSink的 write 方法

 @Override public void write(Buffer source, long byteCount) throws IOException {
    if (byteCount < 0) throw new IllegalArgumentException("byteCount < 0: " + byteCount);
    if (byteCount == 0) return;
    //进行CRC32数据完整性验证
    updateCrc(source, byteCount);
    deflaterSink.write(source, byteCount);
  }

这个方法首先将链表的数据通过 CRC32方法生成一个唯一值，好在读取的时候进行效验，既验证数据有没有被篡改，篡改了就没法读取了并抛异常

通俗的讲CRC就是目前应用最广泛的一种文件完整性的校验算法。不是很了解的小伙伴可以百度一下。

 private void updateCrc(Buffer buffer, long byteCount) {
    for (Segment head = buffer.head; byteCount > 0; head = head.next) {
      int segmentLength = (int) Math.min(byteCount, head.limit - head.pos);
      crc.update(head.data, head.pos, segmentLength);
      byteCount -= segmentLength;
    }
  }

遍历链表中的数据，然后通过Crc32这个对象计算这个唯一值，这个值什么时候用呢？先别急，接着看 deflaterSink.write(source, byteCount);方法，那么最终真正进行压缩的方法是在

 public GzipSink(Sink sink) {
    if (sink == null) throw new IllegalArgumentException("sink == null");
    this.deflater = new Deflater(DEFAULT_COMPRESSION, true /* No wrap */);
    //创建了一个新的bufferSink
    this.sink = Okio.buffer(sink);
    this.deflaterSink = new DeflaterSink(this.sink, deflater);

    writeHeader();
  }

Deflater这个方法中实现的，好进入

 @Override public void write(Buffer source, long byteCount) throws IOException {
    checkOffsetAndCount(source.size, 0, byteCount);
    //循环将要压缩的数据填充到deflater中
    while (byteCount > 0) {
      // Share bytes from the head segment of 'source' with the deflater.
      Segment head = source.head;
      int toDeflate = (int) Math.min(byteCount, head.limit - head.pos);
      //填入要压缩的数据
      deflater.setInput(head.data, head.pos, toDeflate);

      // Deflate those bytes into sink.
      deflate(false);

      // Mark those bytes as read.
      source.size -= toDeflate;
      head.pos += toDeflate;
      //已经用完的链表Segment的回收
      if (head.pos == head.limit) {
        source.head = head.pop();
        SegmentPool.recycle(head);
      }

      byteCount -= toDeflate;
    }
  }

这个方法的核心意思是循环开始读取链表数据填入 deflater，开始压缩，那么deflater又是什么鬼， private final Deflater deflater，jdk给我提供的一个类，专门进行gzip压缩的类，第一次见，对这个类不熟，没关系看一下文档，文档挺人性化的还提供了很全面的例子

try {
 // Encode a String into bytes
 String inputString = "blahblahblah??";
 byte[] input = inputString.getBytes("UTF-8");

 // Compress the bytes
 byte[] output = new byte[100];
 Deflater compresser = new Deflater();
 compresser.setInput(input);
 compresser.finish();
 int compressedDataLength = compresser.deflate(output);

 // Decompress the bytes
 Inflater decompresser = new Inflater();
 decompresser.setInput(output, 0, compressedDataLength);
 byte[] result = new byte[100];
 int resultLength = decompresser.inflate(result);
 decompresser.end();

 // Decode the bytes into a String
 String outputString = new String(result, 0, resultLength, "UTF-8");
 } catch(java.io.UnsupportedEncodingException ex) {
     // handle
 } catch (java.util.zip.DataFormatException ex) {
     // handle
 }

好大体知道 Deflater 就是进行gzip压缩的工具类，那么重点来了，真正开始压缩

 private void deflate(boolean syncFlush) throws IOException {
    Buffer buffer = sink.buffer();
    while (true) {
      Segment s = buffer.writableSegment(1);

      // The 4-parameter overload of deflate() doesn't exist in the RI until
      // Java 1.7, and is public (although with @hide) on Android since 2.3.
      // The @hide tag means that this code won't compile against the Android
      // 2.3 SDK, but it will run fine there.
      int deflated = syncFlush
          ? deflater.deflate(s.data, s.limit, Segment.SIZE - s.limit, Deflater.SYNC_FLUSH)
          : deflater.deflate(s.data, s.limit, Segment.SIZE - s.limit);

      if (deflated > 0) {
        s.limit += deflated;
        buffer.size += deflated;
        //当链表中存在两个数量级则写入jdk的流时就开始将数据写进
        sink.emitCompleteSegments();
      } 
      /**
       * 假如已经将input中的数据都压缩完成了，input数据为空
       */
      else if (deflater.needsInput()) {
        if (s.pos == s.limit) {
          // We allocated a tail segment, but didn't end up needing it. Recycle!
          buffer.head = s.pop();
          SegmentPool.recycle(s);
        }
        return;
      }
    }

循环遍历 BufferedSink 里链表的数据填入Deflater 中，然后用 deflater.deflate(s.data, s.limit, Segment.SIZE - s.limit)方法将压缩后的数据填入 GzipSink的链表中，即从把数据从一个链表域压缩之后转化为另一个链表域。那么转化完之后呢？继续跟进 gzipSink.close()

 @Override public void close() throws IOException {
    if (closed) return;

    // This method delegates to the DeflaterSink for finishing the deflate process
    // but keeps responsibility for releasing the deflater's resources. This is
    // necessary because writeFooter needs to query the processed byte count which
    // only works when the deflater is still open.

    Throwable thrown = null;
    try {
    	//结束压缩
      deflaterSink.finishDeflate();
      //写入
      writeFooter();
    } catch (Throwable e) {
      thrown = e;
    }

    try {
      deflater.end();
    } catch (Throwable e) {
      if (thrown == null) thrown = e;
    }

    try {
      sink.close();
    } catch (Throwable e) {
      if (thrown == null) thrown = e;
    }
    closed = true;

    if (thrown != null) Util.sneakyRethrow(thrown);
  }

这个方法最重要的两个方法是 deflaterSink.finishDeflate()，结束压缩，并做最后一次压缩的努力

 void finishDeflate() throws IOException {
    deflater.finish();
    //做最后一次压缩努力
    deflate(false);
  }

和writeFooter()

  private void writeFooter() throws IOException {
	//将唯一值写入文件，用于读取数据时进行数据是否被篡改的效验
    sink.writeIntLe((int) crc.getValue()); // CRC of original data.
    //最后写入总共多少数据没有被压缩
    sink.writeIntLe((int) deflater.getBytesRead()); // Length of original data.
  }

这个方法用于记录一下，还有多少数据压缩失败，并且记录数据完整性标记用于读取时的验证。细心的小伙伴可能会发现在初始化 GzipSink的时候会先向链表中写入固定的头信息，这个有什么用，你猜一下应该也是用于读的时候的验证。

  private void writeHeader() {
    // Write the Gzip header directly into the buffer for the sink to avoid handling IOException.
    Buffer buffer = this.sink.buffer();
    buffer.writeShort(0x1f8b); // Two-byte Gzip ID.
    buffer.writeByte(0x08); // 8 == Deflate compression method.
    buffer.writeByte(0x00); // No flags.
    buffer.writeInt(0x00); // No modification time.
    buffer.writeByte(0x00); // No extra flags.
    buffer.writeByte(0x00); // No OS.
  }

关于到底要干什么，下一篇介绍读的时候会详细介绍