记一次 Xgboost算法过慢问题(及XGboost调优相关分析)
环境:
Python3.7.4
xgboost-1.4.2
k8s 1.21.6
ENV OMP_NUM_THREADS=10
问题现象:
在K8S环境(1)中使用一个10C,10G 的容器跑Xgboost算法时,发现消耗时间和本地测试差异过大,相差大概550倍左右。
测试数据:
trees 10
(K8s pod) 10C 10G
xgb Count: 10 Mean: 59.16686143875122 Std: 11.287767932928924
catboost Count: 10 Mean: 1.2580039262771605 Std: 0.24807552537562824
lgb Count: 10 Mean: 0.07808444499969483 Std: 0.20500893014009666
PC
xgb Count: 10 Mean: 0.0768716812133789 Std: 0.2131022629363367
catboost Count: 10 Mean: 0.023910140991210936 Std: 0.019320902641279603
lgb Count: 10 Mean: 0.015625333786010744 Std: 0.020853957416939885
问题排查:
1. 因K8s集群节点未配置CPU独占,猜测容器CPU是否独占在Node资源较为紧张时,频繁切片,导致效率较低,随后在K8s1集群 启动一个Pod进行测试,发现在跑算法的时,集群CPU并不高,任务耗时处于不正常水平。
2. 启动一个Docker容器,配置CPU独占,发现耗时较短
docker run --rm -it --cpus="10" -m="5G" --cpuset-cpus="0-9" registry.aps.datacanvas.com:5000/aps/module/base/baseimage-dl-cpu:6.2.0 bash
xgb Run 9: time 0.016218900680541992
xgb Count: 10 Mean: 0.32392058372497556 Std: 0.9691304524983718
3. 在k8s2集群启动一个任务,测试算法任务耗时,发现耗时较短,两个K8s集群都未配置CPU独占,后查阅资料发现k8s在request==limit的情况下默认为 Guaranteed Qos类型,独占CPU,并不会共享,遂排除因CPU独占引起的性能问题(在启动的Docker容器内,线程也较少)。
4. 在耗时较短的Pod内查看进程状态,发现启动的线程数较少,与CPU核数接近,在耗时长的集群POD内查看进程状态发现线程较多,后经查阅源码后发现可通过 n_jobs设置启动线程数(该版本默认为当前全部CPU核数的线程数),设置后测试正常。
5. 测试代码
import time
import xgboost
if __name__ == '__main__':
print("睡眠开始")
time.sleep(15)
print("睡眠结束")
验证: pid=`docker top fad7c792ccf35b65ddd | grep test.py | awk '{print $2}'`;cat /proc/$pid/status
问题关键点:
1. n_jobs
2. OMP_NUM_THREADS
3. cat /proc/$pid/status
4. cpu独占
5. numa
解决方案&参考链接:
1. 配置n_jobs参数 XGBClassifier(n_estimators=10,n_jobs=1)
https://cloud.tencent.com/developer/article/1773845
https://blog.csdn.net/zhou_438/article/details/108143617
https://datascience.stackexchange.com/questions/74253/what-needs-to-be-done-to-make-n-jobs-work-properly-on-sklearn-in-particular-on
https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html#joblib.Parallel
https://xgboost.readthedocs.io/en/latest/tutorials/kubernetes.html
https://blog.csdn.net/zhou_438/article/details/108143617
2. 开启CPU独占
https://kubernetes.io/zh-cn/docs/tasks/administer-cluster/cpu-management-policies/
https://cloud.tencent.com/developer/article/1773846
3. 性能对比
https://benchmark.laurae.design/speed_r_perf_analysis.html
https://medium.com/data-design/getting-the-most-of-xgboost-and-lightgbm-speed-compiler-cpu-pinning-374c38d82b86
-design/getting-the-most-of-xgboost-and-lightgbm-speed-compiler-cpu-pinning-374c38d82b86