这几天在调试ScheduledThreadPoolThread的大数据并发。测试的case很简单,代码如下:
ScheduledExecutorService pool = st(Executors)::newScheduledThreadPool();
long time = st(System)::currentTimeMillis();
for(int i = 0; i < 32*1024;i++) {
pool->schedule(createMyLoopSubmit(),0);
}
同一时间发送了32K个action,这个时候发现这个for循环耗时在3S左右~。额,这个性能实在是太差,32K就要花费3S,如果大并发的话,基本就凉凉了。好吧,我们开始用perf来调试一下,看看具体是什么情况.
1.安装perf
sudo apt-get install perf-tools-unstable
2.抓取perf数据
$ sudo perf record -e cpu-clock -g ./mytest
3.分析perf数据
$ sudo perf report -g -i perf.data
我们可以看到下面cpu占比较高的处理:
Samples: 25K of event 'cpu-clock', Event count (approx.): 6382250000
Children Self Command Shared Object Symbol
+ 93.50% 0.00% mytest libpthread-2.27.so [.] start_thread
+ 93.50% 0.00% mytest libobotcha.so [.] obotcha::_Thread::localRun
+ 86.65% 0.07% mytest libobotcha.so [.] obotcha::_ScheduledThreadPoolThread::run
+ 82.76% 0.02% mytest libobotcha.so [.] obotcha::_ArrayList<obotcha::sp<obotcha::_WaitingTask> >::removeAt
+ 82.69% 0.01% mytest libobotcha.so [.] std::vector<obotcha::sp<obotcha::_WaitingTask>, std::allocator<obotcha::sp<obotcha::_WaitingTask> > >::erase
+ 82.64% 0.02% mytest libobotcha.so [.] std::vector<obotcha::sp<obotcha::_WaitingTask>, std::allocator<obotcha::sp<obotcha::_WaitingTask> > >::_M_erase
+ 82.56% 0.02% mytest libobotcha.so [.] std::move<__gnu_cxx::__normal_iterator<obotcha::sp<obotcha::_WaitingTask>*, std::vector<obotcha::sp<obotcha::_WaitingTask>, std::allocator<obotcha::
+ 82.53% 0.03% mytest libobotcha.so [.] std::__copy_move_a2<true, __gnu_cxx::__normal_iterator<obotcha::sp<obotcha::_WaitingTask>*, std::vector<obotcha::sp<obotcha::_WaitingTask>, std::all
+ 82.48% 0.02% mytest libobotcha.so [.] std::__copy_move_a<true, obotcha::sp<obotcha::_WaitingTask>*, obotcha::sp<obotcha::_WaitingTask>*>
+ 79.04% 3.91% mytest libobotcha.so [.] std::__copy_move<true, false, std::random_access_iterator_tag>::__copy_m<obotcha::sp<obotcha::_WaitingTask>*, obotcha::sp<obotcha::_WaitingTask>*>
+ 71.24% 12.47% mytest libobotcha.so [.] obotcha::sp<obotcha::_WaitingTask>::operator=
+ 37.14% 3.63% mytest mytest [.] obotcha::Object::incStrong
+ 34.33% 34.32% mytest mytest [.] std::__atomic_base<int>::operator--
+ 25.72% 4.51% mytest mytest [.] obotcha::Object::decStrong
+ 22.99% 22.99% mytest mytest [.] std::__atomic_base<int>::operator--
+ 6.51% 0.02% mytest libobotcha.so [.] obotcha::_ThreadCachedPoolExecutorHandler::run
好吧,我们发现ArrayList的removeAt竟然占比在80%以上,这个应该就是for循环耗时的主要原因。
记下来,我们可以用上下按键将光标移动到removeAt上,回车查看removeAt的具体耗时:
- 82.76% 0.02% mytest libobotcha.so [.] obotcha::_ArrayList<obotcha::sp<obotcha::_WaitingTask> >::removeAt ▒
- 82.75% obotcha::_ArrayList<obotcha::sp<obotcha::_WaitingTask> >::removeAt ▒
- 82.68% std::vector<obotcha::sp<obotcha::_WaitingTask>, std::allocator<obotcha::sp<obotcha::_WaitingTask> > >::erase ▒
- 82.64% std::vector<obotcha::sp<obotcha::_WaitingTask>, std::allocator<obotcha::sp<obotcha::_WaitingTask> > >::_M_erase ▒
- 82.55% std::move<__gnu_cxx::__normal_iterator<obotcha::sp<obotcha::_WaitingTask>*, std::vector<obotcha::sp<obotcha::_WaitingTask>, std::allocator<obotcha::sp<obotcha::_WaitingTask> > > >, _▒
- 82.52% std::__copy_move_a2<true, __gnu_cxx::__normal_iterator<obotcha::sp<obotcha::_WaitingTask>*, std::vector<obotcha::sp<obotcha::_WaitingTask>, std::allocator<obotcha::sp<obotcha::_Wa▒
- 82.47% std::__copy_move_a<true, obotcha::sp<obotcha::_WaitingTask>*, obotcha::sp<obotcha::_WaitingTask>*> ▒
- 79.03% std::__copy_move<true, false, std::random_access_iterator_tag>::__copy_m<obotcha::sp<obotcha::_WaitingTask>*, obotcha::sp<obotcha::_WaitingTask>*> ▒
- 70.22% obotcha::sp<obotcha::_WaitingTask>::operator= ▒
+ 33.31% obotcha::Object::incStrong ▒
+ 22.80% obotcha::Object::decStrong ▒
1.72% std::__atomic_base<int>::operator-- ▒
0.79% std::__atomic_base<int>::operator-- ▒
1.79% obotcha::Object::incStrong ▒
1.70% obotcha::Object::decStrong ▒
1.34% std::move<obotcha::sp<obotcha::_WaitingTask>&>
两个耗时点竟然是incStrong/decStrong,原来removeAt移除一个item之后,其他的所有的item都需要移动(析构/创建),这个时候就会触发计数。32K的数据,计算量基本在1+2+3+4+45......32K,运算量的确是非常大的。
4.如何优化
最先想到的是使用RBTree,这样每次移除item之后,不需要做item的移动。但是我在STL里面没有找到相关的接口。所以只能退而求其次,将数据放到HashMap中,这样每次erase也不会做item的移动。
相关的修改参看:
https://github.com/wangsun1983/Obotcha/commit/44d067cdda087c1049d5aed8ae7d2fb6ab6f3ce3