Lab1 stitching substrings into a byte stream
0 前言
- 在lab0中,我们用了socket接口获取网页和发送邮件信息,基于linux内置的TCP协议。
- 同样我们实现了ByteStream这个类似于socket的抽象
- 在接下来的四周里,你将实现TCP去提供不同主机之间的可靠字节流。
1 概述
1.1 发送过程(send)
-
向ByteStream写入数据
-
TCPSender在合适时机将数据发出
- 收到ack时,将改变发送窗口的大小
- 添加seqno、SYN、payload、FIN、端口号等得到数据报
-
将数据报加上ip头部发送到网络层
1.2 接收过程(recv)
- 数据报从网络层过来
- TCPSender和TCPReceiver分别收到数据(TCPSender在合适时机)
- TCPReceiver在收到数据后通过StreamReassemabler重组成一个完整的字节流
- 将字节流传送到ByteStream中
- 最后应用层从ByteStream读取数据
1.3 未来计划
- 在lab2中,我们将实现StreamReassemabler
- 在lab3中,我们将实现TCPReceiver,
- 在lab4中,我们将实现TCPConnection,TCPSender
2 将切片重组成字节流
2.1 概述
-
在这个和下一个实验中,我们将实现TCP receiver,接收TCP报文段并把他们转换成一个可靠字节流。
-
TCP发送端将字节流切分成小的切片(不超过1460B)使得它们可以被塞进字符串中,但考虑到网络状况,这些切可能被丢失、重复、失序,TCP接收端必须重组这些报文段成一个连续字节流。
-
在本次实验中,你将实现一个切片重组,它将会收到切片从0开始的index,当完成了重组之后,将重组的字节流写入ByteStream,其拥有者可以在想要的时候获取它。
2.2 相关问题
-
什么是容量:滑动窗口的大小
- 起点:第一个没有被读取的字节
- 终点:起点 + capacity
-
整个字节流的第一个字节是什么? 0
-
时间限制是什么?能在0.5s内通过测试用例
-
不一致的字符串如何处理?丢弃
-
你可以用到什么?所有标准库
-
字节什么时候被写到流中?尽可能快的,窗口起点必须移动到第一个没有被push的字符
-
字符串可能会重叠吗?会
-
可以加入私有成员吗?字符串乱序到达,必须缓存
-
能存储重叠的字符吗?不可以
3 具体实现
3.1 几个需要注意的点
-
整字节流共分为以下四个部分(接收下层写入的数据,共给上层读取数据)
- stream start: 字节流的开始
- first unread:stream_start + bytestream.read() – 上层读取的数据
- first unassembled:stream_start + bytestream.write() – 下层的写入的数据
- first unaccpetable: 由于在窗口外无法被接收
-
滑动窗口内有两大部分
- 由于上层空间有限无法被写入的数据
- 由于缺少切片无法被重组的数据
-
什么时候终止输入
- 字节流到达结尾
- 所有的字符写到ByteStream
3.2.1 代码实现(基于unordered_map效率低)
- stream_reassembler.cc
StreamReassembler::StreamReassembler(const size_t capacity) : _output(capacity), _capacity(capacity) {}
void StreamReassembler::push_substring(const string &data, const size_t index, const bool eof) {
if (eof) {
_eof = true;
}
size_t first_index = index;
size_t last_index = index + data.size();
// 最后一个字符的位置
if (last_index > _first_index + _capacity)
last_index = _first_index + _capacity;
if (first_index < _first_index)
first_index = _first_index;
for (size_t i = first_index; i < last_index; i++)
if (!mp.count(i)) {
mp[i] = data[i - index];
_unassembled_bytes++;
}
// 这里要考虑能够写入多少字符串
string str = "";
while (mp.count(_first_index) && str.size() < _output.remaining_capacity()) {
str.push_back(mp[_first_index]);
_first_index++;
_unassembled_bytes--;
}
_output.write(str);
// 网络层停止push,并且所有字节已经重组完成,终止输入
if (_eof && empty())
_output.end_input();
}
size_t StreamReassembler::unassembled_bytes() const {
return _unassembled_bytes;
}
bool StreamReassembler::empty() const {
return _unassembled_bytes == 0;
}
- stream_reassembler.hh
class StreamReassembler {
private:
// Your code here -- add private members as necessary.
ByteStream _output; //!< The reassembled in-order byte stream
size_t _capacity; //!< The maximum number of bytes
bool _eof{};
size_t _first_index = 0;
size_t _unassembled_bytes = 0;
std::unordered_map <int, char> mp{};
3.2.2 代码实现(基于set效率高)
- 这部分参考了leetcode的https://leetcode.cn/problems/count-integers-in-intervals/实现
- stream_reassembler.cc
StreamReassembler::StreamReassembler(const size_t capacity) : _output(capacity), _capacity(capacity) {}
//! \details This function accepts a substring (aka a segment) of bytes,
//! possibly out-of-order, from the logical stream, and assembles any newly
//! contiguous substrings and writes them into the output stream in order.
void StreamReassembler::push_substring(const string &data, const size_t index, const bool eof) {
size_t first_index = index;
size_t last_index = index + data.size() - 1;
// 最后一个字符的位置
if (last_index > _first_index + _capacity - 1)
last_index = _first_index + _capacity - 1;
if (first_index < _first_index)
first_index = _first_index;
// 查找左边的第一个区间
size_t l = first_index, r = last_index;
if (l <= r && data.size() > 0) {
string s = string().assign(data, first_index - index, last_index - first_index + 1);
vector<node> v;
auto it = _interval.lower_bound(node{first_index, 0});
string l_string = "", r_string = "";
if (it != _interval.begin()) {
--it;
if (it->r + 1 >= l) {
v.push_back(*it);
l_string.assign(it->data, 0, l - it->l);
l = min(l, it->l);
_unassembled_bytes -= (it->r - it->l + 1);
}
if (r <= it->r)
r_string.assign(it->data, r + 1 - it->l);
r = max(r, it->r);
++it;
}
while (it != _interval.end() && r >= it->l - 1) {
v.push_back(*it);
if (r <= it->r)
r_string.assign(it->data, r + 1 - it->l);
_unassembled_bytes -= (it->r - it->l + 1);
r = max(r, it->r);
++it;
}
for (auto e:v)
_interval.erase(_interval.find(e));
_interval.insert(node{l, r, l_string + s + r_string});
_unassembled_bytes += r - l + 1;
auto first_interval = _interval.begin();
// cerr << "---" << first_interval->l << " " << first_interval->r << " " << _unassembled_bytes << " " << _first_index << endl;
if (first_interval->l == _first_index) {
size_t sz = min(first_interval->data.size(), _output.remaining_capacity());
_output.write(first_interval->data.substr(0, sz));
_interval.erase(first_interval);
_unassembled_bytes -= sz;
_first_index += sz;
}
}
if (eof)
_eof = true;
// 网络层停止push,并且所有字节已经重组完成,终止输入
if (_eof && empty())
_output.end_input();
}
size_t StreamReassembler::unassembled_bytes() const {
return _unassembled_bytes;
}
bool StreamReassembler::empty() const {
return _unassembled_bytes == 0;
}
- stream_reassembler.hh
class StreamReassembler {
private:
// Your code here -- add private members as necessary.
ByteStream _output; //!< The reassembled in-order byte stream
size_t _capacity; //!< The maximum number of bytes
bool _eof{};
size_t _first_index = 0;
size_t _unassembled_bytes = 0;
struct node {
size_t l = 0;
size_t r = 0;
std::string data = "";
bool operator<(const node &t) const {
if (l != t.l)
return l < t.l;
return r < t.r;
}
};
std::set<node> _interval{};
public:
//! \brief Construct a `StreamReassembler` that will store up to `capacity` bytes.
//! \note This capacity limits both the bytes that have been reassembled,
//! and those that have not yet been reassembled.
StreamReassembler(const size_t capacity);
//! \brief Receive a substring and write any newly contiguous bytes into the stream.
//!
//! The StreamReassembler will stay within the memory limits of the `capacity`.
//! Bytes that would exceed the capacity are silently discarded.
//!
//! \param data the substring
//! \param index indicates the index (place in sequence) of the first byte in `data`
//! \param eof the last byte of `data` will be the last byte in the entire stream
void push_substring(const std::string &data, const uint64_t index, const bool eof);
//! \name Access the reassembled byte stream
//!@{
const ByteStream &stream_out() const { return _output; }
ByteStream &stream_out() { return _output; }
//!@}
//! The number of bytes in the substrings stored but not yet reassembled
//!
//! \note If the byte at a particular index has been pushed more than once, it
//! should only be counted once for the purpose of this function.
size_t unassembled_bytes() const;
//! \brief Is the internal state empty (other than the output stream)?
//! \returns `true` if no substrings are waiting to be assembled
bool empty() const;
size_t get_first_bytes() const;
};