Okio原理分析之Segment数据管理

最新推荐文章于 2023-04-17 23:58:51 发布

chadm

最新推荐文章于 2023-04-17 23:58:51 发布

阅读量859

点赞数

分类专栏：开源框架分析文章标签： okio Buffer Segment SegmentPool

本文链接：https://blog.csdn.net/weixin_42695485/article/details/112066128

版权

开源框架分析专栏收录该内容

7 篇文章 0 订阅

订阅专栏

数据移动这一块，主要由3个关键类在控制

Buffer 里面保存了一个segment双向循环链表,从head segment读取，从tail segment写入
Segment 真正保存数据的类，pos和limit保存了可以读写的位置，shared和owner表示是否可以修改此Segment里面的值
SegmentPool 保存了一个单向Segment链表，最大包含有8个Segment。recylce方法加入SegmentPool和take从SegmentPool里面读取，都是从next节点(链表的头节点)开始的

Buffer

代表保存在内存中的字节集合。从一个Buffer移动数据到另外一个Buffer是很快的，相比于把数据从一个地方复制到另外一个地方，Buffer只改变byte数组的owner，不复制数据

关键属性有：

Segment head 指向segment双向循环链表的head节点
long size Buffer里面byte数组的大小，当从Buffer读或写数据时，会更新这个size的大小

Segment

保存在Segment里面的byte数组可能被buffer和bytestring共享，当shared为true时，当前的segment既不能被回收，也不能被改变。唯一的例外是，当前Segment的owner可以在Segment里面添加数据，写入数据到limit或超出这个位置

对于每个byte数组，只有唯一的所属的segment。
position/limits/prev/next 是不共享的

关键属性

int SIZE = 8192 segment里面byte数据的默认大小
int SHARE_MINIMUM = 1024 当segment里面的byte数据大小超过这个值时，segment会变成共享的，来避免复制数据
final byte[] data segment里面保存的数据，初始化后不能改变大小
int limit 指向segment的可写的起始位置
int pos 指向segment可读的起始位置
boolean shared 是否和其它segment或bytestring共享data数组，为true时表示共享
boolean owner 是否拥有byte数组，为true时表示这个segment拥有数组，可以进行写入
Segment next; 当前segment的下一个segment
Segment pre; 当前segment的前一个segment

SegmentPool

管理一个Segment链表，其中next表示head节点。关键属性有

long MAX_SIZE = 64 * 1024 最大容量
Segment next Segment链表的head节点，插入删除都是从这个节点操作
long byteCount pool里面的byte数据大小

类里面只有2个方法

take()方法，取链表里面的head节点，如果next为空，则创建一个新的Segment；如果非空，把返回head节点result，并修改next指针为result.next，result.next为空，并修改容量byteCount
recycle()方法，将要回收的Segment作为head节点加入，同时修改byteCount容量，重置此segment的pos和limit,如果超出了最大容量，则忽略，此Segment会被JVM回收

如果segment是共享的，则直接返回，不能加入

3个对象的关系图如下：
在这里插入图片描述

从Buffer里面读数据

读的时候，先从InputStream里面读取8192大小的字节到一个Segment里面，head节点指向这个Segment,如果要读取的字节数大于Buffer里面的大小，则继续读取一个8192大小，直到Buffer大小超过要读取的字节数。

以读取一行为例来说明

读取一行

使用readUtf8Line可以从Source里面读取一行，在此方法的实现里面，针对"\n"和"\r\n" 2种情况的换行都做了对应的处理

先看一个简单的示例

    static void readLines() throws IOException {
        // 读取ActivityThread.java里面的内容
        try (BufferedSource bufferedSource = Okio.buffer(Okio.source(new File(READ)))) {
            while (true) {
                String line = bufferedSource.readUtf8Line();
                if (line == null) break;
                if (line.contains("ActivityThreadMain")) {
                    System.out.println(line);
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

在上一篇里面讲到，Source可以接受文本类型的参数，Okio.source 方法会返回一个Source对象的匿名子类，实现了read方法，后面从Source读取数据是调用此方法，细节在后面讲解

Okio.buffer方法返回的是RealBufferedSource对象

  public static BufferedSource buffer(Source source) {
    return new RealBufferedSource(source);
  }

再看readUtf8Line里面的具体实现

  @Override public @Nullable String readUtf8Line() throws IOException {
    // 先找到换行符的位置
    long newline = indexOf((byte) '\n');
    // 如果没有找到，则读取buffer的大小
    if (newline == -1) {
      return buffer.size != 0 ? readUtf8(buffer.size) : null;
    }
    // 如果找到，则读取到换行符的byte数据
    return buffer.readUtf8Line(newline);
  }

由于是首次读取，会先Source里面读取一个数据保存到Buffer里面

  @Override public long indexOf(byte b) throws IOException {
    return indexOf(b, 0, Long.MAX_VALUE);
  }
  
  @Override public long indexOf(byte b, long fromIndex, long toIndex) throws IOException {
    if (closed) throw new IllegalStateException("closed");
    if (fromIndex < 0 || toIndex < fromIndex) {
      throw new IllegalArgumentException(
          String.format("fromIndex=%s toIndex=%s", fromIndex, toIndex));
    }

    while (fromIndex < toIndex) {
      long result = buffer.indexOf(b, fromIndex, toIndex);
      if (result != -1L) return result;

      // The byte wasn't in the buffer. Give up if we've already reached our target size or if the
      // underlying stream is exhausted.
      long lastBufferSize = buffer.size;
      if (lastBufferSize >= toIndex || source.read(buffer, Segment.SIZE) == -1) return -1L;

      // Continue the search from where we left off.
      fromIndex = Math.max(fromIndex, lastBufferSize);
    }
    return -1L;
  }

其中fromIndex为0，toIndex为Long.MAX_VALUE，此时Buffer里面为空，while循环里面result = -1, 然后走到source.read，Okio.source方法返回的对象

  private static Source source(final InputStream in, final Timeout timeout) {
    if (in == null) throw new IllegalArgumentException("in == null");
    if (timeout == null) throw new IllegalArgumentException("timeout == null");

    return new Source() {
      @Override public long read(Buffer sink, long byteCount) throws IOException {
        if (byteCount < 0) throw new IllegalArgumentException("byteCount < 0: " + byteCount);
        if (byteCount == 0) return 0;
        try {
          timeout.throwIfReached();
          // 获取一个可以写入的Segment
          Segment tail = sink.writableSegment(1);
          // 尽可能多的读取数据到tail里面
          int maxToCopy = (int) Math.min(byteCount, Segment.SIZE - tail.limit);
          int bytesRead = in.read(tail.data, tail.limit, maxToCopy);
          if (bytesRead == -1) return -1;
          // 更新tail的写入位置，此时limit为8192
          tail.limit += bytesRead;
          // 更新读取Buffer的大小
          sink.size += bytesRead;
          return bytesRead;
        } catch (AssertionError e) {
          if (isAndroidGetsocknameError(e)) throw new IOException(e);
          throw e;
        }
      }
    };
  }

先看writableSegment方法，返回一个可以写入的Segment，没有的话则创建一个新的Segment

  Segment writableSegment(int minimumCapacity) {
    if (minimumCapacity < 1 || minimumCapacity > Segment.SIZE) throw new IllegalArgumentException();

    if (head == null) {
      head = SegmentPool.take(); // Acquire a first segment.
      return head.next = head.prev = head;
    }

    Segment tail = head.prev;
    if (tail.limit + minimumCapacity > Segment.SIZE || !tail.owner) {
      tail = tail.push(SegmentPool.take()); // Append a new empty segment to fill up.
    }
    return tail;
  }

此时Buffer是空的，head节点为空SegmentPool.take()会创建一个新的节点，然后设置前后指针都为head节点本身，并返回

继续回到read方法里面，创建了写入的Segment tail后，调用InputStream的read方法，尽可能多的读取数据到Buffer里面，方法里面也考虑到了tail里面能写入的最大数量。此处读取Segment.SIZE大小的数据到tail segment里面

然后继续回到indexOf方法里面的while循环，buffer.indexOf方法可以换到换行符的位置，直接返回换行符号的位置。调用buffer.readUtf8Line

  String readUtf8Line(long newline) throws EOFException {
    if (newline > 0 && getByte(newline - 1) == '\r') {
      // Read everything until '\r\n', then skip the '\r\n'.
      String result = readUtf8((newline - 1));
      skip(2);
      return result;

    } else {
      // Read everything until '\n', then skip the '\n'.
      String result = readUtf8(newline);
      skip(1);
      return result;
    }
  }

在此方法里面，分2种情况：

存在"\r"，则读取到"\r"之前的数据，并跳过这2个字节
不存在"\r"，则读取到"\n"之前的数据，并跳过这1个字节

在readUtf8方法里面，会创建一个新的String并返回

@Override public String readString(long byteCount, Charset charset) throws EOFException {
    checkOffsetAndCount(size, 0, byteCount);
    if (charset == null) throw new IllegalArgumentException("charset == null");
    if (byteCount > Integer.MAX_VALUE) {
      throw new IllegalArgumentException("byteCount > Integer.MAX_VALUE: " + byteCount);
    }
    if (byteCount == 0) return "";

    Segment s = head;
    if (s.pos + byteCount > s.limit) {
      // If the string spans multiple segments, delegate to readBytes().
      // 如果要读取的数据不在一个Segment里面，则使用readByteArray来读取
      return new String(readByteArray(byteCount), charset);
    }
    // 读取byte数组里面的指定长度的数据
    String result = new String(s.data, s.pos, (int) byteCount, charset);
    // 更新Buffer里面的head节点的可读取位置，下次再读取的时候就从此位置开始
    s.pos += byteCount;
    // 更新Buffer的大小，读取之后 ，size大小会减去已经读取的大小
    size -= byteCount;
    // 当数据读取完了之后，回收head segment
    if (s.pos == s.limit) {
      head = s.pop();
      SegmentPool.recycle(s);
    }

    return result;
}

当读取一行后，Buffer的数据结构为：
在这里插入图片描述

当读取第二行时，只是更新了pos的位置
在这里插入图片描述

当Buffer里面的第一个Segment数据不够读取的时候，会新创建一个Semgent，此时Buffer的结构为
在这里插入图片描述

在Buffer.readString方法里面，由于s.pos + byteCount > s.limit，会调用readByteArray，最后会调用到read方法里面

  @Override public void readFully(byte[] sink) throws EOFException {
    int offset = 0;
    while (offset < sink.length) {
      // 循环从Buffer里面读取到指定大小的数据
      int read = read(sink, offset, sink.length - offset);
      if (read == -1) throw new EOFException();
      offset += read;
    }
  }

  @Override public int read(byte[] sink, int offset, int byteCount) {
    checkOffsetAndCount(sink.length, offset, byteCount);
    // 此时head节点里面可读取的数据小于要读取的数据
    Segment s = head;
    if (s == null) return -1;
    int toCopy = Math.min(byteCount, s.limit - s.pos);
    // 把head节点里面能读取的数据全部copy到sink数据里面
    System.arraycopy(s.data, s.pos, sink, offset, toCopy);
    // 更新pos为limit
    s.pos += toCopy;
    // 更新size 为0
    size -= toCopy;
    // head节点如果已经读取完了，回收，加入SegmentPool
    if (s.pos == s.limit) {
      head = s.pop();
      SegmentPool.recycle(s);
    }

    return toCopy;
  }

在readByteArray里面，会循环从Buffer的head节点开始读取数据，直到读取指定大小的数据，当一个head节点读取完了之后，回收加入到SegmentPool里面
在这里插入图片描述

完整的过程如下：
在这里插入图片描述

往Buffer里面写数据

Buffer.write 写的时候，情况要复杂一些：

public void write(Buffer source, long byteCount)

不浪费CPU
- 复制大量数据是很消耗资源的操作，相反，在Okio里面，会把整个segment重新设置所属关系，从source buffer到target buffer
不浪费内存
- 作为不变变量，Buffer中相邻的Segment对应容量至少满50%，head和tail节点除外。
- head segment不能维持不变性，因为应用会从消费这个segment里面的数据，降低容量
- tail segment不能维持不变性，因为应用会从这个segment里面新增数据，可能需要一个完整的空Segment作为tail添加

在2个buffer之间移动segment

当从一个BufferA写数据到另外一个BufferB,Okio更倾向于修改整个segment的所属于关系，而不是复制数据

场景1：如果BufferA里面[72%]写入BufferB里面[91%, 61%]，则直接修改BufferA里面的segment指向，BufferB[91%, 61%, 72%]
场景2：如果BufferA里面[99%, 3%]写入BufferB[100%, 2%]，则修改指向，BufferB[100%, 2%, 99%, 3%]
场景3：当合并buffer的时候，BufferA[30%, 80%] 写入BufferB[100%, 40%]，结果是BufferB[100%, 70%, 80%]
场景4：拆分segment:当只将source buffer的一部分写入到sink buffer里面时，比如说将source [92%, 82%]写30%到sink [51%, 91%] ，先拆分source的head segment,source变成[30%, 62%, 82%],修改source的head指向为sink的tail，sink变成[51%, 91%, 30%]

上以的内容来源来write方法注释，从source Buffer的head节点开始移动byte数据到当前Buffer的tail，方法的完整代码如下：

public void write(Buffer source, long byteCount) {
    if (source == null) throw new IllegalArgumentException("source == null");
    if (source == this) throw new IllegalArgumentException("source == this");
    checkOffsetAndCount(source.size, 0, byteCount);

    while (byteCount > 0) {
      // 如果要写入的byteCount小于source里面head节点的当前容量
      // Is a prefix of the source's head segment all that we need to move?
      if (byteCount < (source.head.limit - source.head.pos)) {
        Segment tail = head != null ? head.prev : null;
        if (tail != null && tail.owner
            && (byteCount + tail.limit - (tail.shared ? 0 : tail.pos) <= Segment.SIZE)) {
          // 如果能直接写入当前Buffer的tail节点，则直接写入并返回
          // Our existing segments are sufficient. Move bytes from source's head to our tail.
          // writeTo内部使用System.arraycopy来复制数据
          source.head.writeTo(tail, (int) byteCount);
          // 更新source和当前Buffer的大小
          source.size -= byteCount;
          size += byteCount;
          return;
        } else {
          // 如果不能写入当前Buffer的tail节点，则把source的head节点拆分成2个segment,然后移动第一个segment到当前Buffer
          // We're going to need another segment. Split the source's head
          // segment in two, then move the first of those two to this buffer.
          // split方法会把当前Segment拆分成2个Segment，first segment里面包包含里面pos..pos+byteCount，second segment里面包含pos+byteCount..limit，也分2种场景：
          // a)如果byteCount >= SHARE_MINIMUM 则创建一个共享的segment,shared值为true
          // b)如果byteCount < SHARE_MINIMUM，则从SegmentPool里面取一个Segemnt
          source.head = source.head.split((int) byteCount);
        }
      }
      // 经过上面的操作后，source的head节点变成了新创建的segment
      // Remove the source's head segment and append it to our tail.
      // 获取source的head节点作为要移动的segent,然后修改head的指向为其后继节点
      Segment segmentToMove = source.head;
      long movedByteCount = segmentToMove.limit - segmentToMove.pos;
      source.head = segmentToMove.pop();
      if (head == null) {
        // 如果要移动的segment后继节点为空，则修改指向为head本身
        head = segmentToMove;
        head.next = head.prev = head;
      } else {
        // 如果非空，source head添加到当前buffer的tail,然后检查当前tail和前驱节点是否可以合并
        Segment tail = head.prev;
        tail = tail.push(segmentToMove);
        // 在compact方法里面，如果prev节点的容量小于tail节点的数据容量，则直接返回，不能合并；如果大于或等于，则把tail节点的内容写入到prev节点里面，修改节点指向，最后回收tail节点
        tail.compact();
      }
      // 修改source和当前buffer的大小，以及要写入的byteCount，重复上面的操作
      source.size -= movedByteCount;
      size += movedByteCount;
      byteCount -= movedByteCount;
    }
}

对于场景1，如果BufferA里面[72%]写入BufferB里面[91%, 61%]

以下判断为false，要写入的数据大小72%超过tail节点的61%容量

if (byteCount < (source.head.limit - source.head.pos))

走到后面的逻辑，获取source head节点的容量

Segment segmentToMove = source.head;
long movedByteCount = segmentToMove.limit - segmentToMove.pos;
source.head = segmentToMove.pop();

Segment tail = head.prev;
tail = tail.push(segmentToMove);
tail.compact();

此时head节点非空，直接修改当前Buffer的tail节点为source的head节点，相当于从BufferA里面删除head segment的指向，BufferB里面使用tail节点指向这个head segment,直接修改了head segment的从属关系，没有复制任何数据

另外，由于tail节点的前驱节点容量小于tail节点，无法合并，所以最后的结果是BufferB里面[91%, 61%]，BufferA为空

场景2，如果BufferA里面[99%, 3%]写入BufferB[100%, 2%]

和场景1的逻辑类似，直接把BufferA里面的99%和3%的Segment依次加入到BufferB，最终结果是BufferB[100%, 2%, 99%, 3%]，BufferA为空

在这个场景下，没有合并到最优的结构[100%, 100%, 4%]，是因为在compact里面的逻辑设计如此：

  public final void compact() {
    if (prev == this) throw new IllegalStateException();
    if (!prev.owner) return; // Cannot compact: prev isn't writable.
    // 获取当前Segment的容量
    int byteCount = limit - pos;
    // 获取当前Segment的前驱结点的可用容量
    int availableByteCount = SIZE - prev.limit + (prev.shared ? 0 : prev.pos);
    if (byteCount > availableByteCount) return; // Cannot compact: not enough writable space.
    writeTo(prev, byteCount);
    pop();
    SegmentPool.recycle(this);
  }

当把99%的Segment作为tail节点加入到BufferB里时，其容量是99%，其prev节点的可用容量是98%，以下判断为true,直接返回了

if (byteCount > availableByteCount)

同理，把3%作为tail节点加入时，其prev节点的可用容量是1%，放不下，也直接返回了

场景3：当合并buffer的时候，BufferA[30%, 80%] 写入BufferB[100%, 40%]，结果是BufferB[100%, 70%, 80%]

以下判断为false，因为要写入110%，head的剩余容量是70%

if (byteCount < (source.head.limit - source.head.pos))

走到后面的逻辑，获取source head节点的容量

Segment segmentToMove = source.head;
long movedByteCount = segmentToMove.limit - segmentToMove.pos;
source.head = segmentToMove.pop();

此时head节点非空，直接修改当前Buffer的tail节点为source的head节点，在compact方法里面

  public final void compact() {
    if (prev == this) throw new IllegalStateException();
    if (!prev.owner) return; // Cannot compact: prev isn't writable.
    int byteCount = limit - pos;
    int availableByteCount = SIZE - prev.limit + (prev.shared ? 0 : prev.pos);
    if (byteCount > availableByteCount) return; // Cannot compact: not enough writable space.
    writeTo(prev, byteCount);
    pop();
    SegmentPool.recycle(this);
  }

BufferB的结构为[100%, 40%，30%]，此时tail的容量是30%，前驱节点的可用容量是100%-40%=60%，availableByteCount > byteCount，在writeTo方法里面，把当前节点的内容写入prev节点，把2个Segment进行合并，变成[100%, 70%]

加入80%的过程和场景1类似

场景4：拆分segment:当只将source buffer的一部分写入到sink buffer里面时，比如说将source [92%, 82%]写30%到sink [51%, 91%]

写30%的时候，head节点的容量是92%，大于30%，以下判断为true

if (byteCount < (source.head.limit - source.head.pos))

sink的tail节点容量是91%，要写入的容量30% + 91% > 100% ,以下判断为false

 if (tail != null && tail.owner
            && (byteCount + tail.limit - (tail.shared ? 0 : tail.pos) <= Segment.SIZE))

走到split方法里面，把source的head节点拆分成30%和62%

source.head = source.head.split((int) byteCount);

split的实现为：

  public final Segment split(int byteCount) {
    if (byteCount <= 0 || byteCount > limit - pos) throw new IllegalArgumentException();
    Segment prefix;

    // We have two competing performance goals:
    //  - Avoid copying data. We accomplish this by sharing segments.
    //  - Avoid short shared segments. These are bad for performance because they are readonly and
    //    may lead to long chains of short segments.
    // To balance these goals we only share segments when the copy will be large.
    if (byteCount >= SHARE_MINIMUM) {
      prefix = sharedCopy();
    } else {
      prefix = SegmentPool.take();
      System.arraycopy(data, pos, prefix.data, 0, byteCount);
    }

    prefix.limit = prefix.pos + byteCount;
    pos += byteCount;
    prev.push(prefix);
    return prefix;
  }

要复制的容量是30%，大于SHARE_MINIMUM(25%),会创建一个共享的Segment

  final Segment sharedCopy() {
    shared = true;
    return new Segment(data, pos, limit, true, false);
  }

相当于共用了数据，没有进行复制，返回一个Segment prefix，此时source [92%, 82%]变成了source [30%, 62%, 82%]

继续后面的逻辑，此时source.head是30%，把这个节点作为tail节点加入到sink节点里面，sink变成[51%, 91%, 30%] ，source变成[62%, 82%]

在此处，在一个点需要说明一下，在sharedCopy方法里面，会把shared设置为true，这个变量在多个地方会被用到，一个典型的地方是SegmentPool.recycle方法里面

  static void recycle(Segment segment) {
    if (segment.next != null || segment.prev != null) throw new IllegalArgumentException();
    // 共享的Segment，无法回收
    if (segment.shared) return; // This segment cannot be recycled.
    synchronized (SegmentPool.class) {
      if (byteCount + Segment.SIZE > MAX_SIZE) return; // Pool is full.
      byteCount += Segment.SIZE;
      segment.next = next;
      segment.pos = segment.limit = 0;
      next = segment;
    }
  }

chadm

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Okio原理分析之Segment数据管理

数据移动这一块，主要由3个关键类在控制Buffer 里面保存了一个segment双向循环链表,从head segment读取，从tail segment写入Segment 真正保存数据的类，pos和limit保存了可以读写的位置，shared和owner表示是否可以修改此Segment里面的值SegmentPool 保存了一个单向Segment链表，最大包含有8个Segment。recylce方法加入SegmentPool和take从SegmentPool里面读取，都是从next节点(链表的头节
复制链接

扫一扫