一、简介
上文介绍了位图的操作(https://blog.csdn.net/happytree001/article/details/119120808),但是都是单独某个位的设置或者获取,当需要将一批bit设置,这样就需要循环调用setbit命令,这样效率低,大量的网络请求占用带宽等。因此在redis3.2.0中增加了bitfield命令,进行批量对位图的操作。
二、命令简介
BITFIELD
BITFIELD key [GET type offset] [SET type offset value] [INCRBY type offset increment] [OVERFLOW WRAP|SAT|FAIL]
通过参数,可以看出bitfield命令支持四个子命令:
- GET (获取)
- SET (设置)
- INCRBY (自增)
- OVERFLOW (当数据上溢、下溢出时的处理方式)
其中type指数据当作有符号还是无符号进行处理,i(有符号),u(无符号) 以及数据宽度
除了能批量进行设置外,还能批量的获取,以及原子操作的自增,并且还可以提供数据溢出时的处理方式。
GET type offset
GET命令就是在偏移在offset的位置开始读取width位,并且作为一个signed/unsigned的整数,进行返回。
这里offset和width都是bit位数,并且width不能超过64,因为redis协议不能响应无符号64位的整数, 取值范围: 有符号[1, 64], 无符号[1,63]
/* This helper function for BITFIELD parses a bitfield type in the form
* <sign><bits> where sign is 'u' or 'i' for unsigned and signed, and
* the bits is a value between 1 and 64. However 64 bits unsigned integers
* are reported as an error because of current limitations of Redis protocol
* to return unsigned integer values greater than INT64_MAX.
* /
SET type offset value
对于SET,则是将value写入偏移offset的width宽度的有符号/无符号数。
value可能大于width个bit表示的最大范围,这时就会出现溢出,则由OVERFLOW子命令进行处理。
INCRBY type offset increment
和SET命令类似,只是不是单纯的赋值,而是将原始值读取出来,然后累加上increment,最后将新值重新写入。和SET一样可有溢出问题。
OVERFLOW WRAP|SAT|FAIL
当在执行SET、INCRBY子命令时,数据可能会出现溢出,溢出时有三种处理方式:
- WRAP
当没有指定OVERFLOW子命令时,这个选项是默认的。
无符号:
溢出后,类似于对最大数取余。
比如u8(unsigned char [0,255]), 当前值254, 增加3后,257溢出, 最终值为1
有符号:
正数上溢成为负数,负数下溢成为正数。
比如i8(char [-128, 127]), 当前值127, 加1后,上溢为-128。 - SAT
当上溢时,设置为最大值,当下溢时,设置为最小值,后续操作一直溢出,值一直保持最值。
比如i8, 当前120, 增加10, 出现上溢,则设置为127, 后续一直上溢,则一直为127。 - FAIL
设置失败,返回null
三、命令实现
bitops.c
命令处理分了两个部分
- 子命令逐一解析,全部解析完后,构建成了一个操作数组
- 根据操作数组进行逐一的命令处理
这里为啥不子命令解析一个处理一个呢?
- 先解析所有的子命令,能够提前发现子命令中的错误,参数个数不对,参数type不对,value/increment值解析错误等,能够保证整条BITFIELD命令的原子性,而不是执行到一半导致失败,而redis是无法回退的。
- 数据的响应包构建,能批量的构建,批量回复,而不是一个一个的子命令进行恢复。
3.1 上半部:子命令解析
3.1.1 识别子命令
for (j = 2; j < c->argc; j++) {
int remargs = c->argc-j-1; /* Remaining args other than current. */
char *subcmd = c->argv[j]->ptr; /* Current command name. */
int opcode; /* Current operation code. */
long long i64 = 0; /* Signed SET value. */
int sign = 0; /* Signed or unsigned type? */
int bits = 0; /* Bitfield width in bits. */
if (!strcasecmp(subcmd,"get") && remargs >= 2)
opcode = BITFIELDOP_GET;
else if (!strcasecmp(subcmd,"set") && remargs >= 3)
opcode = BITFIELDOP_SET;
else if (!strcasecmp(subcmd,"incrby") && remargs >= 3)
opcode = BITFIELDOP_INCRBY;
else if (!strcasecmp(subcmd,"overflow") && remargs >= 1) {
char *owtypename = c->argv[j+1]->ptr;
j++;
if (!strcasecmp(owtypename,"wrap"))
owtype = BFOVERFLOW_WRAP;
else if (!strcasecmp(owtypename,"sat"))
owtype = BFOVERFLOW_SAT;
else if (!strcasecmp(owtypename,"fail"))
owtype = BFOVERFLOW_FAIL;
else {
addReplyError(c,"Invalid OVERFLOW type specified");
zfree(ops);
return;
}
continue;
} else {
addReplyErrorObject(c,shared.syntaxerr);
zfree(ops);
return;
}
...
3.1.2 解析type
提取有无符号,以及数据宽度
/* Get the type and offset arguments, common to all the ops. */
if (getBitfieldTypeFromArgument(c,c->argv[j+1],&sign,&bits) != C_OK) {
zfree(ops);
return;
}
int getBitfieldTypeFromArgument(client *c, robj *o, int *sign, int *bits) {
char *p = o->ptr;
char *err = "Invalid bitfield type. Use something like i16 u8. Note that u64 is not supported but i64 is.";
long long llbits;
if (p[0] == 'i') {
*sign = 1;
} else if (p[0] == 'u') {
*sign = 0;
} else {
addReplyError(c,err);
return C_ERR;
}
if ((string2ll(p+1,strlen(p+1),&llbits)) == 0 ||
llbits < 1 ||
(*sign == 1 && llbits > 64) ||
(*sign == 0 && llbits > 63))
{
addReplyError(c,err);
return C_ERR;
}
*bits = llbits;
return C_OK;
}
3.1.3 解析offset
if (getBitOffsetFromArgument(c,c->argv[j+2],&bitoffset,1,bits) != C_OK){
zfree(ops);
return;
}
int getBitOffsetFromArgument(client *c, robj *o, uint64_t *offset, int hash, int bits) {
long long loffset;
char *err = "bit offset is not an integer or out of range";
char *p = o->ptr;
size_t plen = sdslen(p);
int usehash = 0;
/* Handle #<offset> form. */
if (p[0] == '#' && hash && bits > 0) usehash = 1;
if (string2ll(p+usehash,plen-usehash,&loffset) == 0) {
addReplyError(c,err);
return C_ERR;
}
/* Adjust the offset by 'bits' for #<offset> form. */
if (usehash) loffset *= bits;
/* Limit offset to server.proto_max_bulk_len (512MB in bytes by default) */
if ((loffset < 0) || (loffset >> 3) >= server.proto_max_bulk_len)
{
addReplyError(c,err);
return C_ERR;
}
*offset = loffset;
return C_OK;
}
对于offset,用户自己计算偏移,当多个子命令进行SET时,计算很麻烦,还容易出错,所以redis增加了一个模式, #offset, 这样将位图当作了一个数组来操作,每个数组元素的大小为type中指定的宽度。
比如使用数组模式的GET如下:
3.1.4 计算最大偏移offset
每个子命令操作的偏移offset不一样,对于写操作(SET/INCRBY),如果操作的offset超出了位图原有最大的边界,将进行自动扩容,所以需要从子命令集中获取最大的offset。对于读操作,如果offset超过位图边界,则当作空串处理,返回0。
if (opcode != BITFIELDOP_GET) {
readonly = 0;
if (highest_write_offset < bitoffset + bits - 1)
highest_write_offset = bitoffset + bits - 1;
...
}
3.1.5 对于写命令,解析value/increment
if (opcode != BITFIELDOP_GET) {
...
/* INCRBY and SET require another argument. */
if (getLongLongFromObjectOrReply(c,c->argv[j+3],&i64,NULL) != C_OK){
zfree(ops);
return;
}
...
}
3.1.6 构建操作数组
/* Populate the array of operations we'll process. */
ops = zrealloc(ops,sizeof(*ops)*(numops+1));
ops[numops].offset = bitoffset;
ops[numops].i64 = i64;
ops[numops].opcode = opcode;
ops[numops].owtype = owtype;
ops[numops].bits = bits;
ops[numops].sign = sign;
numops++;
//跳过解析的参数,开始下一轮开始解析
//这里写法也很有意思,不用if进行区分,正常的SET/INCRBY都是3个参数,而OVERFLOW解析后直接continue,不会到这里,所以代码很简洁。
j += 3 - (opcode == BITFIELDOP_GET);
根据解析逻辑来看, OVERFLOW可以多个,这样多个SET、INCRBY可以用不同的溢出策略进行处理。
对于bitfieldOp结构的思考
struct bitfieldOp {
uint64_t offset; /* Bitfield offset. */
int64_t i64; /* Increment amount (INCRBY) or SET value */
int opcode; /* Operation id. */
int owtype; /* Overflow type to use. */
int bits; /* Integer bitfield bits width. */
int sign; /* True if signed, otherwise unsigned op. */
};
#define BITFIELDOP_GET 0
#define BITFIELDOP_SET 1
#define BITFIELDOP_INCRBY 2
#define BFOVERFLOW_WRAP 0
#define BFOVERFLOW_SAT 1
#define BFOVERFLOW_FAIL 2 /* Used by the BITFIELD command implementation. */
- 数据类型
对于sign只是表示是否为有无符号,值为0和1, 而定义的是int类型;
bits表示数据宽度,在64之内,而定义的也是一个int类型;
owtype为溢出时的操作类型,值为0,1,2的宏定义,而定义的也是一个int类型;
opcode表示子命令,值为0,1,2的宏定义,而定义的也是一个int类型;
个人觉得使用位域操作就可以了,可以节省空间
噢, offset和i64已经16字节了,为了字节对齐,刚好后面四个字段,使用4字节的int刚好16字节,整个结构体32字节,哇 - 空间分配
每个子命令解析后,都需要对ops调用realloc进行数组扩容,这样很容易导致数据的拷贝以及内存碎片。
个人觉得可以使用一个静态数组,数组大小可以根据业务使用情况统计估计,甚至可以提供一个参数进行配置
3.2 下半部:命令执行
3.2.1 获取根据key获取到位图对象
- 当key不存在,创建位图对象
- 当位图位数不够时,自动扩容
这个函数就是当时setbit命令中调用的函数。
/* Lookup by making room up to the farest bit reached by
* this operation. */
if ((o = lookupStringForBitCommand(c,
highest_write_offset)) == NULL) {
zfree(ops);
return;
}
robj *lookupStringForBitCommand(client *c, uint64_t maxbit) {
size_t byte = maxbit >> 3;
robj *o = lookupKeyWrite(c->db,c->argv[1]);
if (checkType(c,o,OBJ_STRING)) return NULL;
if (o == NULL) {
o = createObject(OBJ_STRING,sdsnewlen(NULL, byte+1));
dbAdd(c->db,c->argv[1],o);
} else {
o = dbUnshareStringValue(c->db,c->argv[1],o);
o->ptr = sdsgrowzero(o->ptr,byte+1);
}
return o;
}
3.2.2 构建响应包数组长度
BITFIELD命令返回的是一个数组,所以根据redis的数据传输协议,需要先发送一个数组的长度,其中*
表示数组,后面的数字表示数组的大小
addReplyArrayLen(c,numops);
void addReplyArrayLen(client *c, long length) {
addReplyAggregateLen(c,length,'*');
}
void addReplyAggregateLen(client *c, long length, int prefix) {
serverAssert(length >= 0);
addReplyLongLongWithPrefix(c,length,prefix);
}
/* Add a long long as integer reply or bulk len / multi bulk count.
* Basically this is used to output <prefix><long long><crlf>. */
void addReplyLongLongWithPrefix(client *c, long long ll, char prefix) {
char buf[128];
int len;
/* Things like $3\r\n or *2\r\n are emitted very often by the protocol
* so we have a few shared objects to use if the integer is small
* like it is most of the times. */
if (prefix == '*' && ll < OBJ_SHARED_BULKHDR_LEN && ll >= 0) {
addReply(c,shared.mbulkhdr[ll]);
return;
} else if (prefix == '$' && ll < OBJ_SHARED_BULKHDR_LEN && ll >= 0) {
addReply(c,shared.bulkhdr[ll]);
return;
}
buf[0] = prefix;
len = ll2string(buf+1,sizeof(buf)-1,ll);
buf[len+1] = '\r';
buf[len+2] = '\n';
addReplyProto(c,buf,len+3);
}
在addReplyLongLongWithPrefix函数中,也做了很多优化,使用共享对象shared.mbulkhdr和shared.bulkhdr,这样减少了很多计算。
3.2.3 循环执行子命令
对于子命令的执行,分为读命令
和写命令
。
而对于数据的处理又细分为有符号数据
和无符号数
据的处理。
for (j = 0; j < numops; j++) {
if (thisop->opcode == BITFIELDOP_SET ||
thisop->opcode == BITFIELDOP_INCRBY)//写命令
{
if (thisop->sign) {//有符号
...
} else {//无符号
...
}
}
else {//读命令
if (thisop->sign) {//有符号
...
} else { //无符号
...
}
}
}
3.2.3.1 写命令
if (thisop->opcode == BITFIELDOP_SET ||
thisop->opcode == BITFIELDOP_INCRBY)
3.2.3.1.1 有符号
(1)获取有符号字段值
这步的主要目的
- 对于SET子命令,将返回旧值
- 对于INCRBY子命令,用于作为起始值进行 累加处理
int64_t oldval, newval, wrapped, retval;
int overflow;
oldval = getSignedBitfield(o->ptr,thisop->offset, thisop->bits);
int64_t getSignedBitfield(unsigned char *p, uint64_t offset, uint64_t bits) {
int64_t value;
union {uint64_t u; int64_t i;} conv;
/* Converting from unsigned to signed is undefined when the value does
* not fit, however here we assume two's complement and the original value
* was obtained from signed -> unsigned conversion, so we'll find the
* most significant bit set if the original value was negative.
*
* Note that two's complement is mandatory for exact-width types
* according to the C99 standard. */
conv.u = getUnsignedBitfield(p,offset,bits);
value = conv.i;
/* If the top significant bit is 1, propagate it to all the
* higher bits for two's complement representation of signed
* integers. */
if (bits < 64 && (value & ((uint64_t)1 << (bits-1))))
value |= ((uint64_t)-1) << bits;
return value;
}
uint64_t getUnsignedBitfield(unsigned char *p, uint64_t offset, uint64_t bits) {
uint64_t byte, bit, byteval, bitval, j, value = 0;
for (j = 0; j < bits; j++) {
byte = offset >> 3;
bit = 7 - (offset & 0x7);
byteval = p[byte];
bitval = (byteval >> bit) & 1;
value = (value<<1) | bitval;
offset++;
}
return value;
}
从代码调用关系中可以看到,获取有符号数是通过先获取无符号数据,然后进行转换为有符号数据。
对于最高位是1并且bit位数小于64的,将剩余的位都设置成1。
比如offset=6, type=i7
(2)检查有符号数值的溢出
对于SET子命令直接检查的是新值value,而INCRBY子命令则是检查的oldval和increment, 溢出后,根据溢出策略获取到新的需要设置的值。
if (thisop->opcode == BITFIELDOP_INCRBY) {
newval = oldval + thisop->i64;
overflow = checkSignedBitfieldOverflow(oldval,
thisop->i64,thisop->bits,thisop->owtype,&wrapped);
if (overflow) newval = wrapped;
retval = newval;
} else {
newval = thisop->i64;
overflow = checkSignedBitfieldOverflow(newval,
0,thisop->bits,thisop->owtype,&wrapped);
if (overflow) newval = wrapped;
retval = oldval;
}
int checkSignedBitfieldOverflow(int64_t value, int64_t incr, uint64_t bits, int owtype, int64_t *limit) {
int64_t max = (bits == 64) ? INT64_MAX : (((int64_t)1<<(bits-1))-1);
int64_t min = (-max)-1;
/* Note that maxincr and minincr could overflow, but we use the values
* only after checking 'value' range, so when we use it no overflow
* happens. */
int64_t maxincr = max-value;
int64_t minincr = min-value;
if (value > max || (bits != 64 && incr > maxincr) || (value >= 0 && incr > 0 && incr > maxincr))
{//上溢出
if (limit) {
if (owtype == BFOVERFLOW_WRAP) { //环绕
goto handle_wrap;
} else if (owtype == BFOVERFLOW_SAT) {//设置最大值
*limit = max;
}
}
return 1;
} else if (value < min || (bits != 64 && incr < minincr) || (value < 0 && incr < 0 && incr < minincr)) { //下溢出
if (limit) {
if (owtype == BFOVERFLOW_WRAP) {//环绕
goto handle_wrap;
} else if (owtype == BFOVERFLOW_SAT) {//设置最小值
*limit = min;
}
}
return -1;
}
return 0;
handle_wrap:
{
uint64_t msb = (uint64_t)1 << (bits-1);
uint64_t a = value, b = incr, c;
c = a+b; /* Perform addition as unsigned so that's defined. */
/* If the sign bit is set, propagate to all the higher order
* bits, to cap the negative value. If it's clear, mask to
* the positive integer limit. */
if (bits < 64) {
uint64_t mask = ((uint64_t)-1) << bits;
if (c & msb) {
c |= mask;
} else {
c &= ~mask;
}
}
*limit = c;
}
return 1;
}
(3)填充响应包数据,并设置有符号新值
如果数据溢出并且溢出策略是BFOVERFLOW_FAIL,则响应null,并且设置失败;其他情况则正常响应以及设置新值。
/* On overflow of type is "FAIL", don't write and return
* NULL to signal the condition. */
if (!(overflow && thisop->owtype == BFOVERFLOW_FAIL)) {
addReplyLongLong(c,retval);
setSignedBitfield(o->ptr,thisop->offset,
thisop->bits,newval);
} else {
addReplyNull(c);
}
设置值过程也是先将有符号数值转换为无符号的,调用无符号设置函数
void setSignedBitfield(unsigned char *p, uint64_t offset, uint64_t bits, int64_t value) {
uint64_t uv = value; /* Casting will add UINT64_MAX + 1 if v is negative. */
setUnsignedBitfield(p,offset,bits,uv);
}
void setUnsignedBitfield(unsigned char *p, uint64_t offset, uint64_t bits, uint64_t value) {
uint64_t byte, bit, byteval, bitval, j;
for (j = 0; j < bits; j++) {
bitval = (value & ((uint64_t)1<<(bits-1-j))) != 0;
byte = offset >> 3;
bit = 7 - (offset & 0x7);
byteval = p[byte];
byteval &= ~(1 << bit);
byteval |= bitval << bit;
p[byte] = byteval & 0xff;
offset++;
}
}
3.2.3.1.2 无符号
(1)获取无符号字段值
此函数就是获取有符号字段值中调用中的一环。
uint64_t oldval, newval, wrapped, retval;
int overflow;
oldval = getUnsignedBitfield(o->ptr,thisop->offset,
thisop->bits);
(2) 检查无符号数值的溢出
if (thisop->opcode == BITFIELDOP_INCRBY) {
newval = oldval + thisop->i64;
overflow = checkUnsignedBitfieldOverflow(oldval,
thisop->i64,thisop->bits,thisop->owtype,&wrapped);
if (overflow) newval = wrapped;
retval = newval;
} else {
newval = thisop->i64;
overflow = checkUnsignedBitfieldOverflow(newval,
0,thisop->bits,thisop->owtype,&wrapped);
if (overflow) newval = wrapped;
retval = oldval;
}
int checkUnsignedBitfieldOverflow(uint64_t value, int64_t incr, uint64_t bits, int owtype, uint64_t *limit) {
uint64_t max = (bits == 64) ? UINT64_MAX : (((uint64_t)1<<bits)-1);
int64_t maxincr = max-value;
int64_t minincr = -value;
if (value > max || (incr > 0 && incr > maxincr)) {//上溢出
if (limit) {
if (owtype == BFOVERFLOW_WRAP) {//环绕
goto handle_wrap;
} else if (owtype == BFOVERFLOW_SAT) {//设置最大值
*limit = max;
}
}
return 1;
} else if (incr < 0 && incr < minincr) {//下溢出
if (limit) {
if (owtype == BFOVERFLOW_WRAP) {//环绕
goto handle_wrap;
} else if (owtype == BFOVERFLOW_SAT) {//设置最小值,即0
*limit = 0;
}
}
return -1;
}
return 0;
handle_wrap:
{
uint64_t mask = ((uint64_t)-1) << bits;
uint64_t res = value+incr;
res &= ~mask;
*limit = res;
}
return 1;
}
对于无符号的处理相对于有符号的处理更简单。
(3)填充响应包数据,并设置无符号新值
对于溢出并且溢出策略为BFOVERFLOW_FAIL的将响应null,否则正常响应值,并且设置新的值。
/* On overflow of type is "FAIL", don't write and return
* NULL to signal the condition. */
if (!(overflow && thisop->owtype == BFOVERFLOW_FAIL)) {
addReplyLongLong(c,retval);
setUnsignedBitfield(o->ptr,thisop->offset,
thisop->bits,newval);
} else {
addReplyNull(c);
}
3.2.3.1.3 写命令增加
统计其中写命令的个数,用于后面的数据修改的统计以及通知信息。
changes++;
3.2.3.2 读命令
3.2.3.2.1 获取位图字符串地址
对于字符串编码的直接返回地址,如果是整数编码的则转换问字符串
unsigned char buf[9];
long strlen = 0;
unsigned char *src = NULL;
char llbuf[LONG_STR_SIZE];
if (o != NULL)
src = getObjectReadOnlyString(o,&strlen,llbuf);
unsigned char *getObjectReadOnlyString(robj *o, long *len, char *llbuf) {
serverAssert(o->type == OBJ_STRING);
unsigned char *p = NULL;
/* Set the 'p' pointer to the string, that can be just a stack allocated
* array if our string was integer encoded. */
if (o && o->encoding == OBJ_ENCODING_INT) {
p = (unsigned char*) llbuf;
if (len) *len = ll2string(llbuf,LONG_STR_SIZE,(long)o->ptr);
} else if (o) {
p = (unsigned char*) o->ptr;
if (len) *len = sdslen(o->ptr);
} else {
if (len) *len = 0;
}
return p;
}
3.2.3.2.2 从字符串中提取offset开始的最大9字节
后续的操作就从这9个字节中进行计算读取
/* For GET we use a trick: before executing the operation
* copy up to 9 bytes to a local buffer, so that we can easily
* execute up to 64 bit operations that are at actual string
* object boundaries. */
memset(buf,0,9);
int i;
uint64_t byte = thisop->offset >> 3;
for (i = 0; i < 9; i++) {
if (src == NULL || i+byte >= (uint64_t)strlen) break;
buf[i] = src[i+byte];
}
(1)读取有符号数值,并构建响应包
int64_t val = getSignedBitfield(buf,thisop->offset-(byte*8),
thisop->bits);
addReplyLongLong(c,val);
(2) 读取无符号数值,并构建响应包
uint64_t val = getUnsignedBitfield(buf,thisop->offset-(byte*8),
thisop->bits);
addReplyLongLong(c,val);
3.2.4 其他
- 如果有写子命令,则进行key修改通知,以及统计数据修改统计,用于后续写aof文件等
- 释放临时分配的子命令操作数组
if (changes) {
signalModifiedKey(c,c->db,c->argv[1]);
notifyKeyspaceEvent(NOTIFY_STRING,"setbit",c->argv[1],c->db->id);
server.dirty += changes;
}
zfree(ops);
3.3 BITFIELD_RO key GET type offset
对于从只读replica等实例上读取数据,为了防止意外修改,在redis6.2.0中增加只读版本的BITFILED_RO版本,只能接受GET子命令,能安全的读取数据。