非连续缓冲IOBuf及其相关
Block
定义:
struct IOBuf::Block {
butil::atomic<int> nshared;
uint16_t size;
uint16_t cap;
Block* portal_next;
char data[0];
explicit Block(size_t block_size)
: nshared(1), size(0), cap(block_size - offsetof(Block, data))
, portal_next(NULL) {
assert(block_size <= MAX_BLOCK_SIZE);
iobuf::g_nblock.fetch_add(1, butil::memory_order_relaxed);
iobuf::g_blockmem.fetch_add(block_size, butil::memory_order_relaxed);
}
// 一些引用计数判断即内存空间判断
void inc_ref();
void dec_ref();
int ref_count() const;
bool full() const;
size_t left_space();
};
Block 实际上就是一段内存,默认大小为8k。其中,size表示使用了多少内存,cap为这段内存的容量,数据存储在data,portal_next指向在链表结构下的一块block。
Block 有垃圾回收机制,采用引用计数法,初始为1,有引用会增加1,解除引用会减少1,直至计数减为1,销毁block。
block的写入永远是追加写,不会修改已写入的内容。
BlockRef
定义
struct BlockRef {
// NOTICE: first bit of `offset' is shared with BigView::start
uint32_t offset;
uint32_t length;
Block* block;
};
BlockRef是Block的一个引用,指向Block中的一段区域,内含1个block指针,偏移量offset和长度length。后面要讲的IOBuf中存储的实际上是一些blockref,所以是非连续零拷贝。
IOBuf
brpc使用butil::IOBuf作为一些协议中的附件或http body的数据结构,它是一种非连续零拷贝缓冲,实际上是一系列内存块的引用
成员变量
union {
BigView _bv;
SmallView _sv;
};
其中BigView和SmallView定义:
struct BigView {
int32_t magic; // 用于区分是_bv还是_sv,如果是_sv,那么该字节是refs的地址,将>0
uint32_t start; // refs起始位置,用于pop_front
BlockRef* refs; // BlockRef数据,数据存储位置
uint32_t nref; // refs已用数量
uint32_t cap_mask; // 容量,2^n-1
size_t nbytes; // 总字节数
const BlockRef& ref_at(uint32_t i) const;
BlockRef& ref_at(uint32_t i);
uint32_t capacity() const;
};
struct SmallView {
BlockRef refs[2];
};
不管是BigView 还是SmallView ,一个IOBuf都是由BlockRef列表构成。
主要成员函数
-
append
void IOBuf::append(const IOBuf& other) { const size_t nref = other._ref_num(); for (size_t i = 0; i < nref; ++i) { _push_back_ref(other._ref_at(i)); } } int IOBuf::append(void const* data, size_t count) { if (BAIDU_UNLIKELY(!data)) { return -1; } if (count == 1) { return push_back(*((char const*)data)); } size_t total_nc = 0; while (total_nc < count) { // excluded count == 0 IOBuf::Block* b = iobuf::share_tls_block(); if (BAIDU_UNLIKELY(!b)) { return -1; } const size_t nc = std::min(count - total_nc, b->left_space()); iobuf::cp(b->data + b->size, (char*)data + total_nc, nc); const IOBuf::BlockRef r = { (uint32_t)b->size, (uint32_t)nc, b }; _push_back_ref(r); b->size += nc; total_nc += nc; } return 0; }
当然,append不止这两种参数类型,还重载了很多不同参数的函数。
如果使用IOBuf对象直接进行append,会将other中的所有block加到refs中,other中的block都会引用计数+1。除此之外,_push_back_ref中还考虑了容量不够的情况,如果refs容量不够则新申请一段容量是之前两倍的blockRef空间,迁移原来的数据到新的空间中,并释放原来的数据空间;以及_sv向_bv迁移等。注意,这一种append内存是共享的,是浅拷贝。
如果使用一段内存来进行append,会从公共区域拿出一个可用的Block(iobuf::share_tls_block()),将data拷贝进Block中,生成BlockRef,加入到refs中。这一中append是深拷贝的。
-
pop_front/pop_back
size_t IOBuf::pop_front(size_t n) { const size_t len = length(); if (n >= len) { clear(); return len; } const size_t saved_n = n; while (n) { // length() == 0 does not enter IOBuf::BlockRef &r = _front_ref(); if (r.length > n) { r.offset += n; r.length -= n; if (!_small()) { _bv.nbytes -= n; } return saved_n; } n -= r.length; _pop_front_ref(); } return saved_n; } size_t IOBuf::pop_back(size_t n) { const size_t len = length(); if (n >= len) { clear(); return len; } const size_t saved_n = n; while (n) { // length() == 0 does not enter IOBuf::BlockRef &r = _back_ref(); if (r.length > n) { r.length -= n; if (!_small()) { _bv.nbytes -= n; } return saved_n; } n -= r.length; _pop_back_ref(); } return saved_n; }
删除前/后n个字节。依次从前/后的BlockRef删除数据,直到删够长度。注意,如果对refs中整个BlockRef进行删除,则需要调用_pop_back_ref/_pop_front_ref删除对该BlockRef的引用,引用计数-1。
-
cutn
size_t IOBuf::cutn(IOBuf* out, size_t n) { const size_t len = length(); if (n > len) { n = len; } const size_t saved_n = n; while (n) { // length() == 0 does not enter IOBuf::BlockRef &r = _front_ref(); if (r.length <= n) { out->_push_back_ref(r); n -= r.length; _pop_front_ref(); } else { const IOBuf::BlockRef cr = { r.offset, (uint32_t)n, r.block }; out->_push_back_ref(cr); r.offset += n; r.length -= n; if (!_small()) { _bv.nbytes -= n; } return saved_n; } } return saved_n; }
实际实现还重载了其他参数的函数。该函数与pop_front类似,只是会将删除的数据放在out中返回,主要差别是对引用计数的计算。
-
cut_until
inline int IOBuf::cut_until(IOBuf* out, char const* delim) { if (*delim) { if (!*(delim+1)) { return _cut_by_char(out, *delim); } else { return _cut_by_delim(out, delim, strlen(delim)); } } return -1; } int IOBuf::_cut_by_char(IOBuf* out, char d) { const size_t nref = _ref_num(); size_t n = 0; for (size_t i = 0; i < nref; ++i) { IOBuf::BlockRef const& r = _ref_at(i); char const* const s = r.block->data + r.offset; for (uint32_t j = 0; j < r.length; ++j, ++n) { if (s[j] == d) { // There's no way cutn/pop_front fails cutn(out, n); pop_front(1); return 0; } } } return -1; } int IOBuf::_cut_by_delim(IOBuf* out, char const* dbegin, size_t ndelim) { typedef unsigned long SigType; const size_t NMAX = sizeof(SigType); if (ndelim > NMAX || ndelim > length()) { return -1; } SigType dsig = 0; for (size_t i = 0; i < ndelim; ++i) { dsig = (dsig << CHAR_BIT) | static_cast<SigType>(dbegin[i]); } const SigType SIGMASK = (ndelim == NMAX ? (SigType)-1 : (((SigType)1 << (ndelim * CHAR_BIT)) - 1)); const size_t nref = _ref_num(); SigType sig = 0; size_t n = 0; for (size_t i = 0; i < nref; ++i) { IOBuf::BlockRef const& r = _ref_at(i); char const* const s = r.block->data + r.offset; for (uint32_t j = 0; j < r.length; ++j, ++n) { sig = ((sig << CHAR_BIT) | static_cast<SigType>(s[j])) & SIGMASK; if (sig == dsig) { // There's no way cutn/pop_front fails cutn(out, n + 1 - ndelim); pop_front(ndelim); return 0; } } } return -1; }
切除前若干个子节,直到遇到delim中的字符串(完全匹配)停止切除,delim前的子节会放到out中,delim及其之前的子节都会被切除。
如果delim只有1个字符,那么逐个遍历IOBuf中的所有字节,找到第1个delim,调用cutn即可。
如果delim是若干个字符,要求len(delim)需要小于sizeof(unsigned long)。这里IOBuf中的数据与delim匹配使用的是映射之后的值。即将delim转为一个uint64 dsig,前提delim的长度l不能超过sizeof(uint64)的值。dsig的0~ 7个bit是delim[0]转为uint64的值,8~ 15是delim[1]的值,以此类推。还需要一个掩码SIGMASK,delim有l个字符,那么SIGMASK的0~l*8-1位都是1。那么在遍历IOBuf中的值时,只要也按同样的方式将当前位加到sig中,并和掩码SIGMASK进行或操作,那么就能计算出IOBuf中的前l个子节的映射,如果遍历到第n个byte时与dsig一致,那么则可以停止遍历,对前n+1-l进行cutn操作,将结果放入out,再对前l进行pop_front操作,直接删除。
- pcut_into_file_descriptor
ssize_t IOBuf::pcut_into_file_descriptor(int fd, off_t offset, size_t size_hint) { if (empty()) { return 0; } const size_t nref = std::min(_ref_num(), IOBUF_IOV_MAX); struct iovec vec[nref]; size_t nvec = 0; size_t cur_len = 0; do { IOBuf::BlockRef const& r = _ref_at(nvec); vec[nvec].iov_base = r.block->data + r.offset; vec[nvec].iov_len = r.length; ++nvec; cur_len += r.length; } while (nvec < nref && cur_len < size_hint); ssize_t nw = 0; if (offset >= 0) { static iobuf::iov_function pwritev_func = iobuf::get_pwritev_func(); nw = pwritev_func(fd, vec, nvec, offset); } else { nw = ::writev(fd, vec, nvec); } if (nw > 0) { pop_front(nw); } return nw; }
主要是Socket中用于将IOBuf中的数据写入到套接字fd中。首先,将IOBuf中的数据转化为iovec数据,然后调用writev(offset>0时调用pwrite)将iovec数组的数据写入fd,写成功后,将写成功的部分从IOBuf中pop掉。
TLSData
实际上是一个Block链表,每个线程都有1个TLSData,所以TLSData的声明为
static __thread TLSData g_tls_data = { NULL, 0, false };
声明:
struct TLSData {
IOBuf::Block* block_head; // block链表
int num_blocks; // block数量
bool registered; // 是否注册了线程退出对tls清理的函数
};
g_tls_data维护该线程所有的可用Block列表。
IOBuf::Block* share_tls_block() {
TLSData& tls_data = g_tls_data;
IOBuf::Block* const b = tls_data.block_head;
if (b != NULL && !b->full()) {
return b;
}
IOBuf::Block* new_block = NULL;
if (b) {
new_block = b;
while (new_block && new_block->full()) {
IOBuf::Block* const saved_next = new_block->portal_next;
new_block->dec_ref();
--tls_data.num_blocks;
new_block = saved_next;
}
} else if (!tls_data.registered) {
tls_data.registered = true;
// Only register atexit at the first time
butil::thread_atexit(remove_tls_block_chain);
}
if (!new_block) {
new_block = create_block(); // may be NULL
if (new_block) {
++tls_data.num_blocks;
}
}
tls_data.block_head = new_block;
return new_block;
}
前面提到的share_tls_block即是从g_tls_data中找到一个可用的block。从链表头block_head开始遍历,如果链表头就没有满,那么可以直接返回。否则一直向后遍历,直到找到一个没有满的Block,令该Block为新的链表头。注意,之前已满的Block要解除链表关系,并且要引用计数-1。如果一直未找到未满的Block,则新创建一个Block。
除了share_tls_block外,还有remove_tls_block_chain、release_tls_block、acquire_tls_block等接口。acquire_tls_block与share_tls_block类似,区别是acquire_tls_block会将返回的block从链表中删去。remove_tls_block_chain用于删除链表中的Block,release_tls_block用于归还Block到链表中。
IOPortal
IOBuf的一个派生类,主要用于维护从文字描述符中读取出来的数据。在brpc中,比较常见的用法是用于从socket中读取数据,并存储下来。
class IOPortal : public IOBuf {
public:
IOPortal() : _block(NULL) { }
IOPortal(const IOPortal& rhs) : IOBuf(rhs), _block(NULL) { }
~IOPortal();
IOPortal& operator=(const IOPortal& rhs);
// Read at most `max_count' bytes from file descriptor `fd' and
// append to self.
ssize_t append_from_file_descriptor(int fd, size_t max_count);
// Read at most `max_count' bytes from file descriptor `fd' at a given
// offset and append to self. The file offset is not changed.
// If `offset' is negative, does exactly what append_from_file_descriptor does.
ssize_t pappend_from_file_descriptor(int fd, off_t offset, size_t max_count);
// Read as many bytes as possible from SSL channel `ssl', and stop until `max_count'.
// Returns total bytes read and the ssl error code will be filled into `ssl_error'
ssize_t append_from_SSL_channel(struct ssl_st* ssl, int* ssl_error,
size_t max_count = 1024*1024);
// Remove all data inside and return cached blocks.
void clear();
// Return cached blocks to TLS. This function should be called by users
// when this IOPortal are cut into intact messages and becomes empty, to
// let continuing code on IOBuf to reuse the blocks. Calling this function
// after each call to append_xxx does not make sense and may hurt
// performance. Read comments on field `_block' below.
void return_cached_blocks();
private:
static void return_cached_blocks_impl(Block*);
// Cached blocks for appending. Notice that the blocks are released
// until return_cached_blocks()/clear()/dtor() are called, rather than
// released after each append_xxx(), which makes messages read from one
// file descriptor more likely to share blocks and have less BlockRefs.
Block* _block;
};
主要接口
ssize_t IOPortal::pappend_from_file_descriptor(
int fd, off_t offset, size_t max_count) {
iovec vec[MAX_APPEND_IOVEC];
int nvec = 0;
size_t space = 0;
Block* prev_p = NULL;
Block* p = _block;
// Prepare at most MAX_APPEND_IOVEC blocks or space of blocks >= max_count
do {
if (p == NULL) {
p = iobuf::acquire_tls_block();
if (BAIDU_UNLIKELY(!p)) {
errno = ENOMEM;
return -1;
}
if (prev_p != NULL) {
prev_p->portal_next = p;
} else {
_block = p;
}
}
vec[nvec].iov_base = p->data + p->size;
vec[nvec].iov_len = std::min(p->left_space(), max_count - space);
space += vec[nvec].iov_len;
++nvec;
if (space >= max_count || nvec >= MAX_APPEND_IOVEC) {
break;
}
prev_p = p;
p = p->portal_next;
} while (1);
ssize_t nr = 0;
if (offset < 0) {
nr = readv(fd, vec, nvec);
} else {
static iobuf::iov_function preadv_func = iobuf::get_preadv_func();
nr = preadv_func(fd, vec, nvec, offset);
}
if (nr <= 0) { // -1 or 0
if (empty()) {
return_cached_blocks();
}
return nr;
}
size_t total_len = nr;
do {
const size_t len = std::min(total_len, _block->left_space());
total_len -= len;
const IOBuf::BlockRef r = { _block->size, (uint32_t)len, _block };
_push_back_ref(r);
_block->size += len;
if (_block->full()) {
Block* const saved_next = _block->portal_next;
_block->dec_ref(); // _block may be deleted
_block = saved_next;
}
} while (total_len);
return nr;
}
该接口主要是从fd中使用readv读取数据,放入_refs中。append_from_file_descriptor也类似,只是不支持带offset。
append_from_SSL_channel区别在于是从ssl中读取数据。
IOPortal中有一个_block字段,是用于缓存申请的block空间。在执行readv前会从tls_block中申请一些block空间(这里用的是会从tls中删除block的acquire_tls_block),用于存储readv读取到的数据,这些block空间使用链表的形式存储到_block中,申请时会尽可能多的申请空间。readv执行完后,虽然数据已经放在block空间中,但block还未更新size之类的数据,也需要根据Block生成BlockRef并加到_refs中。在_block链表中的block,如果已满,需要从_block中移除,_block只维护未满的链表,相当于有个局部的tls_block。
IOBufAsZeroCopyInputStream/IOBufAsZeroCopyOutputStream
继承自google::protobuf::io::ZeroCopyInputStream 和google::protobuf::io::ZeroCopyOutputStream。分别用于protobuf的零拷贝写入和写出。
将protobuf序列化及反序列化的函数分别是SerializeToZeroCopyStream及ParseFromZeroCopyStream。这两个应该是protobuf的内部实现,没找到解析或者源码。IOBufAsZeroCopyInputStream/IOBufAsZeroCopyOutputStream重载了基于IOBuf的protobuf写入和写出,这是通过重载Next、BackUp、Skip等函数实现的。
SerializeToZeroCopyStream及ParseFromZeroCopyStream是baidu_std协议下,进行rpc的数据序列化及反序列化方式。
参考
- https://blog.csdn.net/KIDGIN7439/article/details/111560093