xgboost之java、python安装 mac为例子

之前用过python下的xgboost,现在想在自己的电脑(os)上折腾下jave版本的xgboost,碰到不少坑,记录下,


1.下载xgboost库

git clone --recursive https://github.com/dmlc/xgboost


2.编译xgboost

查看自己电脑上是否有g++  gcc,在/usr/lib下查看,如果没有,则要安装g++,gcc
brew install gcc --without-multilib     

这个过程时间很长,我电脑上装了两个小时

然后
ls /usr/local/bin/*

2.1官网提供两种编译xgboost的方式,在mac上,一种支持多线程,另外一种不支持多线程

支持多线程用下面方式进行编译:
cd xgboost
cp make/minimum.mk ./config.mk
make -j4



不支持多线程的编译方式:
cd xgboost
cp make/config.mk ./config.mk
make -j4

若成功安装了gcc的,应该能顺利编译成功,如果报clang: error: : errorunsupported option '-fopenmp'  这种错误,
则在config.mk文件中加入
$ export CC=/usr/local/bin/gcc-7

$ export CC=/usr/local/bin/g++-7

我gcc已经到7了



3.1安装python xgboost,则如下:

cd python-package; sudo python setup.py install

在我的电脑上可以成功:

可以正常使用,测试代码:
import numpy as np
import xgboost as xgb
data = np.loadtxt('train.csv', delimiter=',',converters={14: lambda x:int(x == '?'), 15: lambda x:int(x) } )
sz = data.shape
np.random.shuffle(data) #数据随机打乱,测试数据否则抽取的全部是0
train = data[:int(sz[0] * 0.7), :]
test = data[int(sz[0] * 0.7):, :]
train_X = train[:,0:14]
train_Y = train[:, 15]
print(type(train_Y))
test_X = test[:,0:14]
test_Y = test[:, 15]
xg_train = xgb.DMatrix( train_X, label=train_Y)
xg_test = xgb.DMatrix(test_X, label=test_Y)
params={
    'booster':'gbtree',
    'objective':'binary:logistic',
    'early_stopping_rounds':100,
    'scale_pos_weight':1,
    'eval_metric':'auc',
    'gamma':'0.1',
    'max_depth':8,
    'lambda':550,
    'subsample':0.7,
    'colsample_bytree':0.4,
    'min_child_weight':3,
    'eta':0.02,
    'seed':27,
    'nthread':7,
    }
watchlist = [ (xg_train,'train'), (xg_test, 'test') ]
xgboost_model = xgb.train(params,xg_train,num_boost_round=3000,evals=watchlist)
xgboost_model.save_model('xgb.model')
pred= xgboost_model.predict(xg_test)
print(pred)




4.安装java版本的xgboost


则进入cd jvm-package   
mvn package

如果报如下错误:

   
   
  1. Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.8.0:check (checkstyle)
  2. on project xgboost-jvm: Execution checkstyle of goal org.scalastyle:scalastyle-maven-plugin:0.8.0:check
  3. failed: A required class was missing while executing org.scalastyle:scalastyle-maven-plugin:0.8.0:check: scala/xml/Node

则把jvm-package目录下pom.xml文件下的插件注释既可以,:
    
    
  1. <!-- <plugin>
  2. <groupId>org.apache.maven.plugins</groupId>
  3. <artifactId>maven-checkstyle-plugin</artifactId>
  4. <version>2.17</version>
  5. <configuration>
  6. <configLocation>checkstyle.xml</configLocation>
  7. <failOnViolation>true</failOnViolation>
  8. </configuration>
  9. <executions>
  10. <execution>
  11. <id>checkstyle</id>
  12. <phase>validate</phase>
  13. <goals>
  14. <goal>check</goal>
  15. </goals>
  16. </execution>
  17. </executions>
  18. </plugin> -->


5.改变scala版本的编译

在这里改变既可以:
    
    
  1. <properties>
  2. <spark.version>2.0.1</spark.version>
  3. <flink.suffix>_2.11</flink.suffix>
  4. <scala.version>2.10.6</scala.version>
  5. <scala.binary.version>2.10</scala.binary.version>
  6. </properties>


再进行打包用maven   clean  install 打包,不出意外的话在xgboost4j下面会生成有两个jar包,一个是单纯的xgboost,一个是含有依赖的
含有依赖的两个jar包增加下面两个jar包

6.xgboost依赖的两个jar包

    
    
  1. <dependency>
  2. <groupId>org.apache.commons</groupId>
  3. <artifactId>commons-lang3</artifactId>
  4. <version>3.4</version>
  5. </dependency>
  6. <dependency>
  7. <groupId>commons-logging</groupId>
  8. <artifactId>commons-logging</artifactId>
  9. <version>1.2</version>
  10. </dependency>

7.将编译好的jar包安装到maven仓库里面去

    
    
  1. mvn install:install-file -Dfile=xgboost4j-0.7-jar-with-dependencies.jar -DgroupId=ml.dmlc -DartifactId=xgboost4j -Dversion=0.7 -Dpackaging=jar


8.在自己的maven项目中添加xgboot4j依赖,并做测试


<dependency>
			<groupId>ml.dmlc</groupId>
			<artifactId>xgboost4j</artifactId>
			<version>0.7</version>
		</dependency>


package com.meituan.model.xgboost;

import java.util.HashMap;
import java.util.ArrayList;
import java.util.List;
import java.util.Arrays;
import java.util.Map;
import ml.dmlc.xgboost4j.java.Booster;
import ml.dmlc.xgboost4j.java.DMatrix;
import ml.dmlc.xgboost4j.java.XGBoost;
import ml.dmlc.xgboost4j.java.XGBoostError;

public class PredictFirstNtree {
	private static String path = "/Users/shuubiasahi/Documents/workspace/xgboost/demo/data/";
	private static String trainString = "agaricus.txt.train";
	private static String testString = "agaricus.txt.test";

	public static void main(String[] args) throws XGBoostError {

		DMatrix trainMat = new DMatrix(path + trainString);
		DMatrix testMat = new DMatrix(path + testString);

		// specify parameters
		Map<String, Object> params = new HashMap<String, Object>();
		params.put("eta", 1.0);
		params.put("max_depth", 2);
		params.put("silent", 1);
		params.put("objective", "binary:logistic");

		// specify watchList
		HashMap<String, DMatrix> watches = new HashMap<String, DMatrix>();
		watches.put("train", trainMat);
		watches.put("test", testMat);

		// train a booster
		int round = 3;
		Booster booster = XGBoost.train(trainMat, params, round, watches, null,
				null);

		// predict using first 2 tree
		float[][] leafindex = booster.predictLeaf(testMat, 2);
		for (float[] leafs : leafindex) {
			System.out.println(Arrays.toString(leafs));
		}

		// predict all trees
		leafindex = booster.predictLeaf(testMat, 0);
		for (float[] leafs : leafindex) {
			System.out.println(Arrays.toString(leafs));
		}

	}

}





[5.0, 4.0, 5.0]

[3.0, 3.0, 3.0]

[5.0, 4.0, 5.0]

[3.0, 3.0, 3.0]

[0] test-error:0.042831 train-error:0.046522

[1] test-error:0.021726 train-error:0.022263

[2] test-error:0.006207 train-error:0.007063



如果看完还不会弄,可以私我。



  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 6
    评论
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值