XGBoost4jJava代码的底层是C++,计算效率很快,但是C++代码接口并不是线程安全的这是得其应用受限,在XGboost4j jar中直接在模型预估函数接口加了锁,所以这块工程效率低,很难满足快速计算的工程化需求。
private synchronized float[][] predict(DMatrix data,
boolean outputMargin,
int treeLimit,
boolean predLeaf,
boolean predContribs) throws XGBoostError {
int optionMask = 0;
if (outputMargin) {
optionMask = 1;
}
if (predLeaf) {
optionMask = 2;
}
if (predContribs) {
optionMask = 4;
}
float[][] rawPredicts = new float[1][];
XGBoostJNI.checkCall(XGBoostJNI.XGBoosterPredict(handle, data.getHandle(), optionMask,
treeLimit, rawPredicts));
int row = (int) data.rowNum();
int col = rawPredicts[0].length / row;
float[][] predicts = new float[row][col];
int r, c;
for (int i = 0; i < rawPredicts[0].length; i++) {
r = i / col;
c = i % col;
predicts[r][c] = rawPredicts[0][i];
}
return predicts;
}
若去掉锁,提高其效率,需要对C++代码重新编译;
运行环境——Mac,jdk1.8,cmake, GCC
(一)C++代码修改编译:
1. xgboost C++ 代码介绍
Booster ,对外接口都在c_api;DMatrix相关在data.h
预估函数都在p