需求分析:
目的:
需要实现一个数据重组器,用于处理乱序,重叠的数据
结构:
重新组装器,接受来自同一字节流的子字符串序列,并将它们重新组装回原始流中
字节流区间:
* [0 , unpopped_index) 已经popped字节流区间
* [unpopped_index , unassemble_index) 已经pushed,但未popped字节流区间
* [unassemble_index , capacity_index) 剩余的容量字节流区间
reassembler区间:
reassembler中存储数据区间均在[unassemble_index , capacity_index)
待数据区间从unassemble_index连续就push
要点:
(1)由于bytestream有容量限制,只接受在已popped数据尾索引和容量索引区间的数据
* [unpopped_index , capacity_index] 总字节流区间
* [unpopped_index , unassemble_index)已经储存的区间
* [unassemble_index , capacity_index) 剩余的容量字节流区
(2)reassembler中的缓冲区数据处理,分为三步:
1.若前一位数据区间包含当前数据区间 合并前一位数据区间和当前数据区间
2.当前数据区间 包含缓冲区中离散的储存的数据 释放包含所有离散的储存的数据
3.当前数据包含后一位数据区间 合并后一位数据区间和当前数据区间
(3)push的数据区间需要处理在 [unassemble_index , capacity_index) 的区间
(4)记录数据结尾eof为true (即is_last_substring为true 且数据区间在capacity_index内)
完整代码:
byte_stream.cc :
#include "byte_stream.hh"
using namespace std;
ByteStream::ByteStream( uint64_t capacity ) : capacity_( capacity ) {}
bool Writer::is_closed() const
{
return close_;
}
void Writer::
push( string data )
{
if ( is_closed() ) {
return;
}
uint64_t available_space = available_capacity();
uint64_t data_size = data.size();
uint64_t push_size = std::min( available_space, data_size );
bytes_pushed_ += push_size;
buffer_ += data.substr(0, push_size );
return;
}
void Writer::close()
{
close_ = true;
return;
}
uint64_t Writer::available_capacity() const
{
return capacity_ - buffer_.size();
}
uint64_t Writer::bytes_pushed() const
{
return bytes_pushed_;
}
bool Reader::is_finished() const
{
if ( close_ && !buffer_.size() ) {
return true;
} else {
return false;
}
}
uint64_t Reader::bytes_popped() const
{
return bytes_popped_;
}
std::string_view Reader::peek() const
{
if ( 0 < buffer_.size() ) {
// 如果缓冲区中有未读取的字节,则返回下一个字节的视图
return std::string_view( &buffer_[0], buffer_.size());
} else {
// 如果缓冲区已经读取完毕,则返回一个空的 string_view
return std::string_view();
}
}
void Reader::pop( uint64_t len )
{
if ( is_finished() ) {
return;
}
uint64_t buffer_size = bytes_buffered();
buffer_ = buffer_.substr( len, buffer_size );
bytes_popped_ += len;
position_+=len;
return;
}
uint64_t Reader::bytes_buffered() const
{
return buffer_.size();
}
byte_stream.hh:
#pragma once
#include <cstdint>
#include <string>
#include <string_view>
class Reader;
class Writer;
class ByteStream
{
public:
explicit ByteStream( uint64_t capacity );
// Helper functions (provided) to access the ByteStream's Reader and Writer interfaces
Reader& reader();
const Reader& reader() const;
Writer& writer();
const Writer& writer() const;
void set_error() { error_ = true; }; // Signal that the stream suffered an error.
bool has_error() const { return error_; }; // Has the stream had an error?
uint64_t unpopped_index()const {return position_;};
uint64_t capacity()const {return capacity_;};
protected:
// Please add any additional state to the ByteStream here, and not to the Writer and Reader interfaces.
uint64_t capacity_;
std::string buffer_ = "";
uint64_t position_ = 0;
uint64_t bytes_popped_ = 0; // Initialize to zero
uint64_t bytes_pushed_ = 0; // Initialize to zero
bool close_ = false; // Initialize to false
bool error_ = false; // Initialize to false
};
class Writer : public ByteStream
{
public:
void push( std::string data ); // Push data to stream, but only as much as available capacity allows.
void close(); // Signal that the stream has reached its ending. Nothing more will be written.
bool is_closed() const; // Has the stream been closed?
uint64_t available_capacity() const; // How many bytes can be pushed to the stream right now?
uint64_t bytes_pushed() const; // Total number of bytes cumulatively pushed to the stream
};
class Reader : public ByteStream
{
public:
std::string_view peek() const; // Peek at the next bytes in the buffer
void pop( uint64_t len ); // Remove `len` bytes from the buffer
bool is_finished() const; // Is the stream finished (closed and fully popped)?
uint64_t bytes_buffered() const; // Number of bytes currently buffered (pushed and not popped)
uint64_t bytes_popped() const; // Total number of bytes cumulatively popped from stream
};
/*
* read: A (provided) helper function thats peeks and pops up to `len` bytes
* from a ByteStream Reader into a string;
*/
void read( Reader& reader, uint64_t len, std::string& out );
reassembler.hh:
#pragma once
#include "byte_stream.hh"
#include <map>
class Reassembler
{
public:
// Construct Reassembler to write into given ByteStream.
explicit Reassembler( ByteStream&& output ) : output_( std::move( output ) ), buffer_{} {}
/*
* Insert a new substring to be reassembled into a ByteStream.
* `first_index`: the index of the first byte of the substring
* `data`: the substring itself
* `is_last_substring`: this substring represents the end of the stream
* `output`: a mutable reference to the Writer
*
* The Reassembler's job is to reassemble the indexed substrings (possibly out-of-order
* and possibly overlapping) back into the original ByteStream. As soon as the Reassembler
* learns the next byte in the stream, it should write it to the output.
*
* If the Reassembler learns about bytes that fit within the stream's available capacity
* but can't yet be written (because earlier bytes remain unknown), it should store them
* internally until the gaps are filled in.
*
* The Reassembler should discard any bytes that lie beyond the stream's available capacity
* (i.e., bytes that couldn't be written even if earlier gaps get filled in).
*
* The Reassembler should close the stream after writing the last byte.
*/
bool merge_prev(uint64_t &first, uint64_t &end,std::string& data);
bool merge_next(uint64_t &first, uint64_t &end,std::string& data);
bool delete_repeating(uint64_t &first, uint64_t &end);
bool merge_buffer_repeating(uint64_t &first, uint64_t &end,std::string& data);
std::pair<uint64_t, uint64_t> handle_data_overflow( uint64_t first_index, std::string& data );
void push_buffer_data();
void insert( uint64_t first_index, std::string data, bool is_last_substring );
// How many bytes are stored in the Reassembler itself?
uint64_t bytes_pending() const;
// Access output stream reader
Reader& reader() { return output_.reader(); }
const Reader& reader() const { return output_.reader(); }
// Access output stream writer, but const-only (can't write from outside)
const Writer& writer() const { return output_.writer(); }
private:
ByteStream output_; // the Reassembler writes to this ByteStream
std::map<uint64_t, std::string> buffer_;
uint64_t unassemble_index = 0;
bool eof = false;
};
reassembler.cc:
#include "reassembler.hh"
using namespace std;
#include<iostream>
/**
* 目的: 调整数据溢出情况,确保数据不会超出缓冲区的容量限制。
* 功能: 裁剪输入数据以适应剩余空间。
*
writer缓冲区空间:
* [0 , unpopped_index) 已经popped字节流区间
* [unpopped_index , unassemble_index) 已经pushed,但未popped字节流区间
* [unassemble_index , capacity_index) 剩余的容量字节流区间
*
* @param first_index 数据的起始索引。
* @param data 需要处理的数据字符串,处理后可能会被修改以适应可存储区大小。
* @return 调整后的数据范围,表示有效数据的开始和结束索引。
*
*/
pair<uint64_t, uint64_t> Reassembler::handle_data_overflow(uint64_t first_index, std::string& data) {
// 获取当前可存储区的状态
uint64_t unpopped_index = output_.writer().unpopped_index(); // 可存储区中已pop出的最后位置
uint64_t capacity_index = unpopped_index + output_.writer().capacity(); // 可存储区的总容量
uint64_t available_capacity = output_.writer().available_capacity(); // 可存储区剩余可用容量
// 计算数据的实际结束索引
uint64_t end_index = first_index + data.size(); // 数据结束索引
uint64_t end = end_index;
uint64_t first = first_index;
// 若 数据为空 并 容量不足 数据超出容量索引 该数据为已经被push段 退出
if (!data.empty() && (!available_capacity || first_index >= capacity_index || end_index <= unassemble_index)) {
data.clear(); // 如果数据无法处理,则清空
return {0, 0}; // 返回无效范围
}
// 处理不在可存储区内的数据后部
if (end > capacity_index){
end = min(end_index, capacity_index);
data=data.erase(end - first_index);
}
// 处理不在可存储区内的数据前部
if (end <= capacity_index){
first = max(first_index, unassemble_index);
data=data.erase(0,first- first_index);
}
// 返回有效数据的新索引范围
return {first, end};
}
/*
* 功能: 用于合并当前数据与前一个buffer里的数据
*/
bool Reassembler::merge_prev(uint64_t &first, uint64_t &end,string& data){
auto it = buffer_.lower_bound(first);
if(it != buffer_.begin()){
--it;
string p_data = it->second;
uint64_t p_end = it->first + p_data.size();
// 该数据已经被存储,返回1以退出insert
if (!p_data.empty() && p_end >= end) {
return 1;
}
// 该数据与前段数据无重叠,返回0以继续insert
if (!p_data.empty() && first >= p_end) {
return 0;
}
data.erase(0, p_end - first);
p_data += std::move(data);
first = it->first;
data = std::move(p_data);
}
return 0;
}
/*
* 功能: 用于合并当前数据与前一个buffer里的数据
*/
bool Reassembler::merge_next(uint64_t &first, uint64_t &end,string& data){
auto next = buffer_.upper_bound(first);
if (next != buffer_.end()) { // 检查是否超出后一个
uint64_t n_first = next->first;
string n_data = next->second;
if (!n_data.empty() && end >= n_first) {
buffer_.erase(next); // 使用迭代器进行删除操作
n_data.erase(0, end - n_first);
data += n_data;
end += n_data.size();
}
}
return 1;
}
/*
* 功能: 删除buffer重叠的数据
*/
bool Reassembler::delete_repeating(uint64_t &first, uint64_t &end){
auto temp = buffer_.begin();
while (temp != buffer_.end()) {
uint64_t temp_end = temp->first + temp->second.size();
if (temp->first < first) {
++temp;
continue;
}
if (first == temp->first && end <= temp_end) {
return 1;
}
if (end < temp_end) {
return 0;
}
if (first <= temp->first && end >= temp_end) {
temp = buffer_.erase(temp);
} else {
++temp;
}
}
return false;
}
/*
* 功能:合并缓冲区中重复数据
*/
bool Reassembler::merge_buffer_repeating(uint64_t &first, uint64_t &end,string& data){
if(!buffer_.empty()){
if(merge_prev(first,end,data)){
return 1;
}
if(delete_repeating(first,end)){
return 1;
}
merge_next(first,end,data);
}
if(!data.empty()){
buffer_[first]=std::move(data);
}
return 0;
}
/*
* 功能: 推送buffer里连续数据
*/
void Reassembler::push_buffer_data(){
auto it_ = buffer_.begin();
while (!buffer_.empty() && it_->first == unassemble_index) {
output_.writer().push(it_->second);
unassemble_index += it_->second.size();
buffer_.erase(it_++);
}
}
void Reassembler::insert( uint64_t first_index, string data, bool is_last_substring )
{
uint64_t unpopped_index = output_.writer().unpopped_index();
uint64_t capacity_index = unpopped_index+output_.writer().capacity();
auto [first,end]=handle_data_overflow(first_index,data);
if(merge_buffer_repeating(first,end,data)){
return;
}
push_buffer_data();
if(is_last_substring && end <= capacity_index-1){
eof = true;
}
if(eof && buffer_.empty()){
output_.writer().close();
return ;
}
}
uint64_t Reassembler::bytes_pending() const
{
uint64_t bytes=0;
for(const auto& stream:buffer_){
bytes+=stream.second.size();
}
return bytes;
}
运行情况:
ubun22@DESKTOP-1VEP92H:/mnt/e/Web/minnow$ cmake --build build --target check1
Test project /mnt/e/Web/minnow/build
Start 1: compile with bug-checkers
1/17 Test #1: compile with bug-checkers ........ Passed 8.86 sec
Start 3: byte_stream_basics
2/17 Test #3: byte_stream_basics ............... Passed 0.09 sec
Start 4: byte_stream_capacity
3/17 Test #4: byte_stream_capacity ............. Passed 0.10 sec
Start 5: byte_stream_one_write
4/17 Test #5: byte_stream_one_write ............ Passed 0.10 sec
Start 6: byte_stream_two_writes
5/17 Test #6: byte_stream_two_writes ........... Passed 0.10 sec
Start 7: byte_stream_many_writes
6/17 Test #7: byte_stream_many_writes .......... Passed 0.13 sec
Start 8: byte_stream_stress_test
7/17 Test #8: byte_stream_stress_test .......... Passed 0.11 sec
Start 9: reassembler_single
8/17 Test #9: reassembler_single ............... Passed 0.10 sec
Start 10: reassembler_cap
9/17 Test #10: reassembler_cap .................. Passed 0.13 sec
Start 11: reassembler_seq
10/17 Test #11: reassembler_seq .................. Passed 0.14 sec
Start 12: reassembler_dup
11/17 Test #12: reassembler_dup .................. Passed 0.13 sec
Start 13: reassembler_holes
12/17 Test #13: reassembler_holes ................ Passed 0.11 sec
Start 14: reassembler_overlapping
13/17 Test #14: reassembler_overlapping .......... Passed 0.16 sec
Start 15: reassembler_win
14/17 Test #15: reassembler_win .................. Passed 0.39 sec
Start 37: compile with optimization
15/17 Test #37: compile with optimization ........ Passed 3.05 sec
Start 38: byte_stream_speed_test
ByteStream throughput: 1.09 Gbit/s
16/17 Test #38: byte_stream_speed_test ........... Passed 0.19 sec
Start 39: reassembler_speed_test
Reassembler throughput: 11.94 Gbit/s
17/17 Test #39: reassembler_speed_test ........... Passed 0.18 sec
100% tests passed, 0 tests failed out of 17
Total Test time (real) = 14.26 sec
Built target check1
ubun22@DESKTOP-1VEP92H:/mnt/e/Web/minnow$