(CS144 2024)Lab Checkpoint 1: stitching substrings into a byte stream (详细解析)

需求分析:

目的:        

需要实现一个数据重组器,用于处理乱序,重叠的数据

结构:       

重新组装器,接受来自同一字节流的子字符串序列,并将它们重新组装回原始流中

字节流区间:

 * [0 , unpopped_index)                  已经popped字节流区间

 * [unpopped_index , unassemble_index)   已经pushed,但未popped字节流区间

 * [unassemble_index , capacity_index)   剩余的容量字节流区间

reassembler区间:

reassembler中存储数据区间均在[unassemble_index , capacity_index)

待数据区间从unassemble_index连续就push

 要点:       

(1)由于bytestream有容量限制,只接受在已popped数据尾索引和容量索引区间的数据

 *  [unpopped_index , capacity_index]         总字节流区间

 *  [unpopped_index , unassemble_index)已经储存的区间

 * [unassemble_index , capacity_index)      剩余的容量字节流区

(2)reassembler中的缓冲区数据处理,分为三步:

1.若前一位数据区间包含当前数据区间                        合并前一位数据区间和当前数据区间

2.当前数据区间  包含缓冲区中离散的储存的数据        释放包含所有离散的储存的数据 

3.当前数据包含后一位数据区间                                   合并后一位数据区间和当前数据区间

(3)push的数据区间需要处理在 [unassemble_index , capacity_index)  的区间

(4)记录数据结尾eof为true    (即is_last_substring为true 且数据区间在capacity_index内)

完整代码:

byte_stream.cc :

#include "byte_stream.hh"

using namespace std;

ByteStream::ByteStream( uint64_t capacity ) : capacity_( capacity ) {}

bool Writer::is_closed() const
{
  return close_;
}

void Writer::

push( string data )
{
  if ( is_closed() ) {
    return;
  }
  uint64_t available_space = available_capacity();
  uint64_t data_size = data.size();
  uint64_t push_size = std::min( available_space, data_size );
  bytes_pushed_ += push_size;
  buffer_ += data.substr(0, push_size );
  return;
}

void Writer::close()
{
  close_ = true;
  return;
}

uint64_t Writer::available_capacity() const
{
  return capacity_ - buffer_.size();
}

uint64_t Writer::bytes_pushed() const
{
  return bytes_pushed_;
}

bool Reader::is_finished() const
{
  if ( close_ && !buffer_.size() ) {
    return true;
  } else {
    return false;
  }
}

uint64_t Reader::bytes_popped() const
{
  return bytes_popped_;
}

std::string_view Reader::peek() const
{
  if ( 0 < buffer_.size() ) {
    // 如果缓冲区中有未读取的字节,则返回下一个字节的视图
    return std::string_view( &buffer_[0], buffer_.size());
  } else {
    // 如果缓冲区已经读取完毕,则返回一个空的 string_view
    return std::string_view();
  }
}

void Reader::pop( uint64_t len )
{
  if ( is_finished() ) {
    return;
  }
  uint64_t buffer_size = bytes_buffered();
  buffer_ = buffer_.substr( len, buffer_size );
  bytes_popped_ += len;
  position_+=len;
  return;
}

uint64_t Reader::bytes_buffered() const
{
  return buffer_.size();
}

 byte_stream.hh:

#pragma once

#include <cstdint>
#include <string>
#include <string_view>

class Reader;
class Writer;

class ByteStream
{
public:
  explicit ByteStream( uint64_t capacity );

  // Helper functions (provided) to access the ByteStream's Reader and Writer interfaces
  Reader& reader();
  const Reader& reader() const;
  Writer& writer();
  const Writer& writer() const;

  void set_error() { error_ = true; };       // Signal that the stream suffered an error.
  bool has_error() const { return error_; }; // Has the stream had an error?
  uint64_t unpopped_index()const {return position_;};
  uint64_t capacity()const {return capacity_;};
protected:
  // Please add any additional state to the ByteStream here, and not to the Writer and Reader interfaces.
  uint64_t capacity_;
  std::string buffer_ = "";
  uint64_t position_ = 0;
  uint64_t bytes_popped_ = 0; // Initialize to zero
  uint64_t bytes_pushed_ = 0; // Initialize to zero
  bool close_ = false;        // Initialize to false
  bool error_ = false;        // Initialize to false
};

class Writer : public ByteStream
{
public:
  void push( std::string data ); // Push data to stream, but only as much as available capacity allows.
  void close();                  // Signal that the stream has reached its ending. Nothing more will be written.

  bool is_closed() const;              // Has the stream been closed?
  uint64_t available_capacity() const; // How many bytes can be pushed to the stream right now?
  uint64_t bytes_pushed() const;       // Total number of bytes cumulatively pushed to the stream
};

class Reader : public ByteStream
{
public:
  std::string_view peek() const; // Peek at the next bytes in the buffer
  void pop( uint64_t len );      // Remove `len` bytes from the buffer

  bool is_finished() const;        // Is the stream finished (closed and fully popped)?
  uint64_t bytes_buffered() const; // Number of bytes currently buffered (pushed and not popped)
  uint64_t bytes_popped() const;   // Total number of bytes cumulatively popped from stream
};

/*
 * read: A (provided) helper function thats peeks and pops up to `len` bytes
 * from a ByteStream Reader into a string;
 */
void read( Reader& reader, uint64_t len, std::string& out );

reassembler.hh:

#pragma once

#include "byte_stream.hh"
#include <map>
class Reassembler
{
public:
  // Construct Reassembler to write into given ByteStream.
  explicit Reassembler( ByteStream&& output ) : output_( std::move( output ) ), buffer_{} {}

  /*
   * Insert a new substring to be reassembled into a ByteStream.
   *   `first_index`: the index of the first byte of the substring
   *   `data`: the substring itself
   *   `is_last_substring`: this substring represents the end of the stream
   *   `output`: a mutable reference to the Writer
   *
   * The Reassembler's job is to reassemble the indexed substrings (possibly out-of-order
   * and possibly overlapping) back into the original ByteStream. As soon as the Reassembler
   * learns the next byte in the stream, it should write it to the output.
   *
   * If the Reassembler learns about bytes that fit within the stream's available capacity
   * but can't yet be written (because earlier bytes remain unknown), it should store them
   * internally until the gaps are filled in.
   *
   * The Reassembler should discard any bytes that lie beyond the stream's available capacity
   * (i.e., bytes that couldn't be written even if earlier gaps get filled in).
   *
   * The Reassembler should close the stream after writing the last byte.
   */
  bool merge_prev(uint64_t &first, uint64_t &end,std::string& data);
  bool merge_next(uint64_t &first, uint64_t &end,std::string& data);
  bool delete_repeating(uint64_t &first, uint64_t &end);

  bool merge_buffer_repeating(uint64_t &first, uint64_t &end,std::string& data);

  std::pair<uint64_t, uint64_t> handle_data_overflow( uint64_t first_index, std::string& data );
  void push_buffer_data();
  void insert( uint64_t first_index, std::string data, bool is_last_substring );

  // How many bytes are stored in the Reassembler itself?
  uint64_t bytes_pending() const;

  // Access output stream reader
  Reader& reader() { return output_.reader(); }
  const Reader& reader() const { return output_.reader(); }

  // Access output stream writer, but const-only (can't write from outside)
  const Writer& writer() const { return output_.writer(); }

private:
  ByteStream output_; // the Reassembler writes to this ByteStream
  std::map<uint64_t, std::string> buffer_;
  uint64_t unassemble_index = 0;
  bool eof = false;
};

reassembler.cc:

#include "reassembler.hh"
using namespace std;
#include<iostream>

/**
 * 目的: 调整数据溢出情况,确保数据不会超出缓冲区的容量限制。
 * 功能:  裁剪输入数据以适应剩余空间。
 * 
  writer缓冲区空间:
 * [0 , unpopped_index)                  已经popped字节流区间
 * [unpopped_index , unassemble_index)   已经pushed,但未popped字节流区间
 * [unassemble_index , capacity_index)   剩余的容量字节流区间
 * 
 * @param first_index 数据的起始索引。
 * @param data 需要处理的数据字符串,处理后可能会被修改以适应可存储区大小。
 * @return 调整后的数据范围,表示有效数据的开始和结束索引。
 * 
 */
pair<uint64_t, uint64_t> Reassembler::handle_data_overflow(uint64_t first_index, std::string& data) {
    // 获取当前可存储区的状态
    uint64_t unpopped_index = output_.writer().unpopped_index();  // 可存储区中已pop出的最后位置
    uint64_t capacity_index = unpopped_index + output_.writer().capacity(); // 可存储区的总容量
    uint64_t available_capacity = output_.writer().available_capacity(); // 可存储区剩余可用容量

    // 计算数据的实际结束索引
    uint64_t end_index = first_index + data.size(); // 数据结束索引

    uint64_t end = end_index;
    uint64_t first = first_index;

    // 若 数据为空 并 容量不足 数据超出容量索引 该数据为已经被push段 退出
    if (!data.empty() && (!available_capacity || first_index >= capacity_index || end_index <= unassemble_index)) {
        data.clear(); // 如果数据无法处理,则清空
        return {0, 0}; // 返回无效范围
    }

    // 处理不在可存储区内的数据后部
    if (end > capacity_index){
      end = min(end_index, capacity_index);
      data=data.erase(end - first_index);    
    }
    // 处理不在可存储区内的数据前部
    if (end <= capacity_index){
      first = max(first_index, unassemble_index);
      data=data.erase(0,first- first_index);
    }

    // 返回有效数据的新索引范围
    return {first, end};
}

/*
  * 功能: 用于合并当前数据与前一个buffer里的数据
*/
bool Reassembler::merge_prev(uint64_t &first, uint64_t &end,string& data){
  auto it = buffer_.lower_bound(first);
  if(it != buffer_.begin()){
    --it;
      string p_data = it->second;
      uint64_t p_end = it->first + p_data.size();
      // 该数据已经被存储,返回1以退出insert
      if (!p_data.empty() && p_end >= end) { 
          return 1;
      }
      // 该数据与前段数据无重叠,返回0以继续insert
      if (!p_data.empty() && first >= p_end) { 
          return 0;
      }

      data.erase(0, p_end - first);
      p_data += std::move(data);
      first = it->first;
      data = std::move(p_data);
  }
  return 0;
}

/*
  * 功能: 用于合并当前数据与前一个buffer里的数据
*/
bool Reassembler::merge_next(uint64_t &first, uint64_t &end,string& data){
  auto next = buffer_.upper_bound(first);
  if (next != buffer_.end()) { // 检查是否超出后一个
      uint64_t n_first = next->first;
      string n_data = next->second;

      if (!n_data.empty() && end >= n_first) {
          buffer_.erase(next); // 使用迭代器进行删除操作
          n_data.erase(0, end - n_first);
          data += n_data;
          end += n_data.size();
      }
  } 
  return 1;
}
/*
  * 功能:  删除buffer重叠的数据
*/
bool Reassembler::delete_repeating(uint64_t &first, uint64_t &end){
    auto temp = buffer_.begin();
    while (temp != buffer_.end()) {
        uint64_t temp_end = temp->first + temp->second.size();
        if (temp->first < first) {
            ++temp;
            continue;
        }

        if (first == temp->first && end <= temp_end) {
            return 1;
        }

        if (end < temp_end) {
            return 0;
        }

        if (first <= temp->first && end >= temp_end) {
            temp = buffer_.erase(temp);
        } else {
            ++temp; 
        }
    }
    return false;
}

/*
  * 功能:合并缓冲区中重复数据
*/
bool Reassembler::merge_buffer_repeating(uint64_t &first, uint64_t &end,string& data){
  if(!buffer_.empty()){
      if(merge_prev(first,end,data)){
        return 1;
      }
      if(delete_repeating(first,end)){
        return 1;
      }
      merge_next(first,end,data);
  }
  if(!data.empty()){
      buffer_[first]=std::move(data);
  }
  return 0;
}
/*
  * 功能:  推送buffer里连续数据
*/
void Reassembler::push_buffer_data(){
  auto it_ = buffer_.begin();
  while (!buffer_.empty() && it_->first == unassemble_index) {
      output_.writer().push(it_->second);
      unassemble_index += it_->second.size();
      buffer_.erase(it_++);
  }
}
void Reassembler::insert( uint64_t first_index, string data, bool is_last_substring )
{
  uint64_t unpopped_index = output_.writer().unpopped_index();
  uint64_t capacity_index = unpopped_index+output_.writer().capacity();

  auto [first,end]=handle_data_overflow(first_index,data);


  if(merge_buffer_repeating(first,end,data)){
        return;
  }

  push_buffer_data();
  if(is_last_substring && end <= capacity_index-1){
      eof = true;
  }
  if(eof && buffer_.empty()){
    output_.writer().close();
    return ;
  }

}

uint64_t Reassembler::bytes_pending() const
{
  uint64_t bytes=0;
  for(const auto& stream:buffer_){
    bytes+=stream.second.size();
  }
  return bytes;
}

运行情况:

ubun22@DESKTOP-1VEP92H:/mnt/e/Web/minnow$ cmake --build build --target check1
Test project /mnt/e/Web/minnow/build
      Start  1: compile with bug-checkers
 1/17 Test  #1: compile with bug-checkers ........   Passed    8.86 sec
      Start  3: byte_stream_basics
 2/17 Test  #3: byte_stream_basics ...............   Passed    0.09 sec
      Start  4: byte_stream_capacity
 3/17 Test  #4: byte_stream_capacity .............   Passed    0.10 sec
      Start  5: byte_stream_one_write
 4/17 Test  #5: byte_stream_one_write ............   Passed    0.10 sec
      Start  6: byte_stream_two_writes
 5/17 Test  #6: byte_stream_two_writes ...........   Passed    0.10 sec
      Start  7: byte_stream_many_writes
 6/17 Test  #7: byte_stream_many_writes ..........   Passed    0.13 sec
      Start  8: byte_stream_stress_test
 7/17 Test  #8: byte_stream_stress_test ..........   Passed    0.11 sec
      Start  9: reassembler_single
 8/17 Test  #9: reassembler_single ...............   Passed    0.10 sec
      Start 10: reassembler_cap
 9/17 Test #10: reassembler_cap ..................   Passed    0.13 sec
      Start 11: reassembler_seq
10/17 Test #11: reassembler_seq ..................   Passed    0.14 sec
      Start 12: reassembler_dup
11/17 Test #12: reassembler_dup ..................   Passed    0.13 sec
      Start 13: reassembler_holes
12/17 Test #13: reassembler_holes ................   Passed    0.11 sec
      Start 14: reassembler_overlapping
13/17 Test #14: reassembler_overlapping ..........   Passed    0.16 sec
      Start 15: reassembler_win
14/17 Test #15: reassembler_win ..................   Passed    0.39 sec
      Start 37: compile with optimization
15/17 Test #37: compile with optimization ........   Passed    3.05 sec
      Start 38: byte_stream_speed_test
             ByteStream throughput: 1.09 Gbit/s
16/17 Test #38: byte_stream_speed_test ...........   Passed    0.19 sec
      Start 39: reassembler_speed_test
             Reassembler throughput: 11.94 Gbit/s
17/17 Test #39: reassembler_speed_test ...........   Passed    0.18 sec

100% tests passed, 0 tests failed out of 17

Total Test time (real) =  14.26 sec
Built target check1
ubun22@DESKTOP-1VEP92H:/mnt/e/Web/minnow$ 

  • 5
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值