VENUS: The Root Cause of MSMR Performance Issue

VENUS: The Root Cause of MSMR Performance Issue


(BTW: I use VENUS to mark the report to wish we will run to the goal smoothly.)


High-level Description

My mentor Heming and I have found that the root cause of MSMR performance issue is: 

During the process of sending operation from proxy module to consensus module, Nagle algorithm and TCP Delayed Acknowledgement will cause a 40ms delay. So on proxy side, packets will wait for about 40ms until Delayed ACK from consensus side arrives at proxy side because of 40ms timeout. While the reason why consensus side holds ACK for 40ms is that consensus side expects to receive more data. In fact, proxy side sometimes will only send a small amount of data to consensus side. And only proxy side receives the ACK, then next data can be sent. (In details, you can see this link:http://jerrypeng.me/2013/08/mythical-40ms-delay-and-tcp-nodelay/)




In Details

Firstly, through experiments, we make sure that in some case the process of sending operation from proxy to consensus spend too much time.

In our MSMR, when we increase concurrency, for example apache_ab.cfg, server_count = 1, client_count = 1, ab -n3000 -c6

I suppose that it is likely to just send a small amount of data to consensus side, while more data will be buffered by output evbuffer of proxy side. As the deeper reason why only a small amount of data is sent, I also don't know, but maybe it is related to many connections (P_CONNECT, P_CLOSE, P_SEND) use bufferevent_write concurrently.


And through the following experiment data:

Warning: output evbuffer has 72 bytes left when P_CLOSE 
Warning: P_CLOSE timestamp: 1422506138.861455 
Warning: output evbuffer has 144 bytes left when P_CLOSE 
Warning: P_CLOSE timestamp: 1422506138.861485 
Warning: output evbuffer has 72 bytes left when P_CONNECT 
Warning: P_CONNECT timestamp: 1422506138.861739 
Warning: output evbuffer has 72 bytes left when P_CONNECT 
Warning: P_CONNECT timestamp: 1422506138.861895 
Warning: output evbuffer has 144 bytes left when P_CLOSE 
Warning: P_CLOSE timestamp: 1422506138.861979 
Warning: output evbuffer has 72 bytes left when P_CONNECT 
Warning: P_CONNECT timestamp: 1422506138.862039 
Warning: output evbuffer has 80 bytes left when P_SEND 
Warning: P_SEND timestamp: 1422506138.862140 
Warning: output evbuffer has 242 bytes left when P_SEND 
Warning: P_SEND timestamp: 1422506138.862165 
Warning: output evbuffer has 80 bytes left when P_SEND 
Warning: P_SEND timestamp: 1422506138.862276 
Warning: output evbuffer has 242 bytes left when P_SEND 
Warning: P_SEND timestamp: 1422506138.862300 
Warning: output evbuffer has 80 bytes left when P_SEND 
Warning: P_SEND timestamp: 1422506138.862317 
Warning: output evbuffer has 162 bytes left when P_SEND 
Warning: P_SEND timestamp: 1422506138.862353 
Warning: replica_on_read time 1422506138.899324 
Warning: consensus input evbuffer has 1158 bytes left 
Warning from proxy to consensus: 37889, timestamp: 1422506138.899344 
Warning from proxy to consensus: 38062, timestamp: 1422506138.899547 
Warning from proxy to consensus: 37874, timestamp: 1422506138.899613 
Warning from proxy to consensus: 37819, timestamp: 1422506138.899714 
Warning from proxy to consensus: 37798, timestamp: 1422506138.899777 
Warning from proxy to consensus: 37796, timestamp: 1422506138.899835 
Warning from proxy to consensus: 37753, timestamp: 1422506138.899893 
Warning from proxy to consensus: 37849, timestamp: 1422506138.900014 
Warning from proxy to consensus: 37868, timestamp: 1422506138.900144 
Warning from proxy to consensus: 37960, timestamp: 1422506138.900260 
Warning from proxy to consensus: 38084, timestamp: 1422506138.900401 
Warning from proxy to consensus: 38165, timestamp: 1422506138.900518 




We could see the 12 abnormal operations all spend about 40ms from proxy to consensus. And according to the record data of the output evbuffer, we can see that on proxy side the 12 operations have left the output evbuffer at the beginning during the about 40ms. And on consensus side, I set timeout EVENT, find that those operations which have left the output evbuffer of proxy side don't arrive at the input evbuffer of consensus side. And from the above data, we can only see one line about input evbuffer of consensus side. So the problem lies in at-ground level reason. Now the reason is Nagle algorithm and TCP Delayed Acknowledgement cause a 40ms delay.




Solution

Use TCP_NODELAY to disable Nagle.

And you can get the updated msmr project from thebcmatrix branch of msmr on Heming's github

Or just get the updated libevent_paxos from https://github.com/bluecloudmatrix/libevent_paxos.git which fork from Cheng's github.


In proxy.c 

change connect_consensus to

//void consensus_on_read(struct bufferevent* bev,void*);
void connect_consensus(proxy_node* proxy){
    // tom add 20150129
    evutil_socket_t fd;
    fd = socket(AF_INET, SOCK_STREAM, 0);
    proxy->con_conn = bufferevent_socket_new(proxy->base,fd,BEV_OPT_CLOSE_ON_FREE);
    // end tom add
    // proxy->con_conn = bufferevent_socket_new(proxy->base,-1,BEV_OPT_CLOSE_ON_FREE);
    bufferevent_setcb(proxy->con_conn,NULL,NULL,consensus_on_event,proxy);
    bufferevent_enable(proxy->con_conn,EV_READ|EV_WRITE|EV_PERSIST);
    bufferevent_socket_connect(proxy->con_conn,(struct sockaddr*)&proxy->sys_addr.c_addr,proxy->sys_addr.c_sock_len);
    // tom add 20150129
    int enable = 1;
    if(setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (void*)&enable, sizeof(enable)) < 0)
        printf("Proxy-side: TCP_NODELAY SETTING ERROR!\n");
    // end tom add

    return;
}


In replica.c

Add this code

    int enable = 1;
    if(setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (void*)&enable, sizeof(enable)) < 0)
        printf("Consensus-side: TCP_NODELAY SETTING ERROR!\n");


Results

apache_ab.cfg     s1c1 n3000c6    add TCP_NODELAY



apache_ab.cfg     s1c1 n3000c6    NO TCP_NODELAY



We could see when we add TCP_NODELAY, it has a good performance, and it is stable.




  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值