xgboost之java、python安装 mac为例子

最新推荐文章于 2024-07-31 15:53:45 发布

旭旭_哥

最新推荐文章于 2024-07-31 15:53:45 发布

阅读量2.6k

点赞数 1

分类专栏： java 机器学习

本文链接：https://blog.csdn.net/luoyexuge/article/details/71260410

版权

机器学习同时被 2 个专栏收录

114 篇文章 7 订阅

订阅专栏

java

87 篇文章 1 订阅

订阅专栏

之前用过python下的xgboost，现在想在自己的电脑(os)上折腾下jave版本的xgboost，碰到不少坑，记录下,

1.下载xgboost库

git clone --recursive https://github.com/dmlc/xgboost

2.编译xgboost

查看自己电脑上是否有g++ gcc,在/usr/lib下查看，如果没有，则要安装g++，gcc

brew install gcc --without-multilib

这个过程时间很长，我电脑上装了两个小时

然后

ls /usr/local/bin/*

2.1官网提供两种编译xgboost的方式，在mac上，一种支持多线程，另外一种不支持多线程

支持多线程用下面方式进行编译：

cd xgboost
cp make/minimum.mk ./config.mk
make -j4

不支持多线程的编译方式:

cd xgboost
cp make/config.mk ./config.mk
make -j4

若成功安装了gcc的，应该能顺利编译成功，如果报clang: error: : errorunsupported option '-fopenmp' 这种错误，

则在config.mk文件中加入

$ export CC=/usr/local/bin/gcc-7

$ export CC=/usr/local/bin/g++-7

我gcc已经到7了

3.1安装python xgboost，则如下：

cd python-package; sudo python setup.py install

在我的电脑上可以成功：

可以正常使用，测试代码：

import numpy as np
import xgboost as xgb
data = np.loadtxt('train.csv', delimiter=',',converters={14: lambda x:int(x == '?'), 15: lambda x:int(x) } )
sz = data.shape
np.random.shuffle(data) #数据随机打乱，测试数据否则抽取的全部是0
train = data[:int(sz[0] * 0.7), :]
test = data[int(sz[0] * 0.7):, :]
train_X = train[:,0:14]
train_Y = train[:, 15]
print(type(train_Y))
test_X = test[:,0:14]
test_Y = test[:, 15]
xg_train = xgb.DMatrix( train_X, label=train_Y)
xg_test = xgb.DMatrix(test_X, label=test_Y)
params={
    'booster':'gbtree',
    'objective':'binary:logistic',
    'early_stopping_rounds':100,
    'scale_pos_weight':1,
    'eval_metric':'auc',
    'gamma':'0.1',
    'max_depth':8,
    'lambda':550,
    'subsample':0.7,
    'colsample_bytree':0.4,
    'min_child_weight':3,
    'eta':0.02,
    'seed':27,
    'nthread':7,
    }
watchlist = [ (xg_train,'train'), (xg_test, 'test') ]
xgboost_model = xgb.train(params,xg_train,num_boost_round=3000,evals=watchlist)
xgboost_model.save_model('xgb.model')
pred= xgboost_model.predict(xg_test)
print(pred)

4.安装java版本的xgboost

则进入cd jvm-package
mvn package

如果报如下错误:

   
   Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.8.0:check (checkstyle) 
on project xgboost-jvm: Execution checkstyle of goal org.scalastyle:scalastyle-maven-plugin:0.8.0:check 
failed: A required class was missing while executing org.scalastyle:scalastyle-maven-plugin:0.8.0:check: scala/xml/Node

则把jvm-package目录下pom.xml文件下的插件注释既可以，：

    
     <!-- <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-checkstyle-plugin</artifactId>
                <version>2.17</version>
                <configuration>
                    <configLocation>checkstyle.xml</configLocation>
                    <failOnViolation>true</failOnViolation>
                </configuration>
                <executions>
                    <execution>
                        <id>checkstyle</id>
                        <phase>validate</phase>
                        <goals>
                            <goal>check</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin> -->

5.改变scala版本的编译

在这里改变既可以:

    
      <properties>
                <spark.version>2.0.1</spark.version>
                <flink.suffix>_2.11</flink.suffix>
                <scala.version>2.10.6</scala.version>
                <scala.binary.version>2.10</scala.binary.version>
            </properties>

再进行打包用maven clean install 打包，不出意外的话在xgboost4j下面会生成有两个jar包，一个是单纯的xgboost，一个是含有依赖的

含有依赖的两个jar包增加下面两个jar包

6.xgboost依赖的两个jar包

    
    <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-lang3</artifactId>
            <version>3.4</version>
        </dependency>
        <dependency>
            <groupId>commons-logging</groupId>
            <artifactId>commons-logging</artifactId>
            <version>1.2</version>
        </dependency>

7.将编译好的jar包安装到maven仓库里面去

    
    mvn install:install-file    -Dfile=xgboost4j-0.7-jar-with-dependencies.jar     -DgroupId=ml.dmlc    -DartifactId=xgboost4j     -Dversion=0.7   -Dpackaging=jar

8.在自己的maven项目中添加xgboot4j依赖，并做测试

<dependency>
			<groupId>ml.dmlc</groupId>
			<artifactId>xgboost4j</artifactId>
			<version>0.7</version>
		</dependency>

package com.meituan.model.xgboost;

import java.util.HashMap;
import java.util.ArrayList;
import java.util.List;
import java.util.Arrays;
import java.util.Map;
import ml.dmlc.xgboost4j.java.Booster;
import ml.dmlc.xgboost4j.java.DMatrix;
import ml.dmlc.xgboost4j.java.XGBoost;
import ml.dmlc.xgboost4j.java.XGBoostError;

public class PredictFirstNtree {
	private static String path = "/Users/shuubiasahi/Documents/workspace/xgboost/demo/data/";
	private static String trainString = "agaricus.txt.train";
	private static String testString = "agaricus.txt.test";

	public static void main(String[] args) throws XGBoostError {

		DMatrix trainMat = new DMatrix(path + trainString);
		DMatrix testMat = new DMatrix(path + testString);

		// specify parameters
		Map<String, Object> params = new HashMap<String, Object>();
		params.put("eta", 1.0);
		params.put("max_depth", 2);
		params.put("silent", 1);
		params.put("objective", "binary:logistic");

		// specify watchList
		HashMap<String, DMatrix> watches = new HashMap<String, DMatrix>();
		watches.put("train", trainMat);
		watches.put("test", testMat);

		// train a booster
		int round = 3;
		Booster booster = XGBoost.train(trainMat, params, round, watches, null,
				null);

		// predict using first 2 tree
		float[][] leafindex = booster.predictLeaf(testMat, 2);
		for (float[] leafs : leafindex) {
			System.out.println(Arrays.toString(leafs));
		}

		// predict all trees
		leafindex = booster.predictLeaf(testMat, 0);
		for (float[] leafs : leafindex) {
			System.out.println(Arrays.toString(leafs));
		}

	}

}

[5.0, 4.0, 5.0]

[3.0, 3.0, 3.0]

[5.0, 4.0, 5.0]

[3.0, 3.0, 3.0]

[0] test-error:0.042831 train-error:0.046522

[1] test-error:0.021726 train-error:0.022263

[2] test-error:0.006207 train-error:0.007063

如果看完还不会弄，可以私我。