机器学习模型工程化

最新推荐文章于 2024-06-09 19:26:48 发布

Henzox

最新推荐文章于 2024-06-09 19:26:48 发布

阅读量2.5k

点赞数 3

分类专栏：机器学习文章标签： Tensorflow Java

本文链接：https://blog.csdn.net/Henzox/article/details/82152129

版权

机器学习专栏收录该内容

0 篇文章 0 订阅

订阅专栏

背景

机器学习把 Python 的历史地位推向了一个小高峰，我们习惯于用 Python 去编写机器学习的代码，但是在实际工程中，业务方用 Java 的可能性会比较大，那么如何用 Java 调用 Python 写的已经训练好的机器学习模型呢。本文接下来后以代码的形式极简演示，基于 Tensorflow 训练好的分类模型，如何使用 Java 应用调用。

环境

很奇怪，对于这样的问题，网络上很少有正确的例子，或者说很少有基于 Tensorflow 高级分类器 DNNClassifier 导出的模型，用 Java 调用的例子，为了减少麻烦，如果要实验的情况下，请以下面环境为准。

Python: 3.6
Tensorflow: 1.10.0
Java 工程中的 gradle 依赖:
compile 'org.tensorflow:tensorflow:1.10.0' compile 'org.tensorflow:proto:1.10.0'

代码演示

Python 代码如下，以著名的 iris 代码为例，用 DNNClassifier 进行分类

#
# Created on: 2018/8/23.
# Author:     Heng Xiangzhong
# 
# May the code be with you!
#

import pandas as pd
import tensorflow as tf


CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
data_train = pd.read_csv('./data/iris_training.csv', names=CSV_COLUMN_NAMES, header=0)
data_test = pd.read_csv('./data/iris_test.csv', names=CSV_COLUMN_NAMES, header=0)
#print(data_train.head())

x_train = data_train
y_train = data_train.pop('Species')
x_test = data_test
y_test = data_test.pop('Species')

feature_columns = []
for key in x_train.keys():
    feature_columns.append(tf.feature_column.numeric_column(key=key))
classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns, hidden_units=[10, 10], n_classes=3)
# 用来保存为 saved_model
feature_spec = tf.feature_column.make_parse_example_spec(feature_columns)
serving_input_receiver_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)


def train_func(x, y):
    dataset = tf.data.Dataset.from_tensor_slices((dict(x), y))
    dataset = dataset.shuffle(1000).repeat().batch(100)
    return dataset


classifier.train(input_fn=lambda: train_func(x_train, y_train), steps=1000)
export_dir = classifier.export_savedmodel(".\model", serving_input_receiver_fn)
print(export_dir)

def eval_input_func(features, labels, batch_size):
    features = dict(features)
    if labels is None:
        inputs = features
    else:
        inputs = (features, labels)
    dataset = tf.data.Dataset.from_tensor_slices(inputs)

    assert batch_size is not None, "batch_size must not be None"
    dataset = dataset.batch(batch_size)
    return dataset


predict_arr = []
predictions = classifier.predict(input_fn=lambda: eval_input_func(x_test, labels=y_test, batch_size=100))
for predict in predictions:
    predict_arr.append(predict['probabilities'].argmax())
result = predict_arr == y_test
result1 = [w for w in result if w == True]
print("准确率为 %s" % str((len(result1)/len(result))))


# 如果想用 savedmodel 去加载，可以用下面的方法
def savedmodel_predict_demo():
    with tf.Session(graph=tf.Graph()) as sess:
        predictor_func = tf.contrib.predictor.from_saved_model(
            ".\\model\\1535446427\\")

        feature = {}
        feature["SepalLength"] = tf.train.Feature(float_list=tf.train.FloatList(value=[6.4]))
        feature["SepalWidth"] = tf.train.Feature(float_list=tf.train.FloatList(value=[2.8]))
        feature["PetalLength"] = tf.train.Feature(float_list=tf.train.FloatList(value=[5.6]))
        feature["PetalWidth"] = tf.train.Feature(float_list=tf.train.FloatList(value=[2.2]))
        examples = []
        example = tf.train.Example(features=tf.train.Features(feature=feature))
        examples.append(example.SerializeToString())

        result = predictor_func({"inputs": examples})

        print(result)

其中比较重要的接口有，classifier.export_savedmodel, 它会保存模型为 pb 格式到相应目录，这种格式方便于在各种语言之间传播，适用于生产环境。代码片段中，savedmodel_predict_demo 函数演示了如何用 python 加载一个通过 savedmodel 保存的模型并作预测。
下文的 Java 代码演示了，用 Java 语言加载该模型并进行预测。

package org.henzox.test;

/*
 * Created on: 2018/8/28.
 * Author:     Heng Xiangzhong
 *
 * May the code be with you!
 */

import org.tensorflow.SavedModelBundle;
import org.tensorflow.Session;
import org.tensorflow.Tensor;
import org.tensorflow.example.*;

public class LoadModelDemo {

    public static void main(String[] args) {

        Session session = SavedModelBundle.load(".\\model\\1535446427\\", "serve").session();

        Features.Builder builder = Features.newBuilder();
        builder.putFeature("SepalLength", Feature.newBuilder().setFloatList(FloatList.newBuilder().addValue(6.4f).build()).build());
        builder.putFeature("SepalWidth", Feature.newBuilder().setFloatList(FloatList.newBuilder().addValue(2.8f).build()).build());
        builder.putFeature("PetalLength", Feature.newBuilder().setFloatList(FloatList.newBuilder().addValue(5.6f).build()).build());
        builder.putFeature("PetalWidth", Feature.newBuilder().setFloatList(FloatList.newBuilder().addValue(2.2f).build()).build());

        Example.Builder exampleBuilder = Example.newBuilder().setFeatures(builder.build());

        byte[] str = exampleBuilder.build().toByteArray();
        byte[][] input = new byte[][]{str};
        Tensor<?> x = Tensor.create(input);

        float[][] y = session.runner().feed("input_example_tensor", x).fetch("dnn/head/predictions/probabilities").run().get(0).copyTo(new float[1][3]);
    }

}

上面代码片段可对照用 python 方式加载模型的方法，其实是一样的步骤。值得注意的是在建立 Tensor 时，其实是字节数组，且最终给模型中的值都是经过 pb 序列化的。

后文

代码中涉及到一些硬编码，比如 “input_example_tensor” 和 “dnn/head/predictions/probabilities” 这些字符串，它的由来可以通过
*** model\1535446427>saved_model_cli show --dir ./ --all
得到，输出为

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['classification']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['inputs'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: input_example_tensor:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['classes'] tensor_info:
        dtype: DT_STRING
        shape: (-1, 3)
        name: dnn/head/Tile:0
    outputs['scores'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 3)
        name: dnn/head/predictions/probabilities:0
  Method name is: tensorflow/serving/classify

signature_def['predict']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['examples'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: input_example_tensor:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['class_ids'] tensor_info:
        dtype: DT_INT64
        shape: (-1, 1)
        name: dnn/head/predictions/ExpandDims:0
    outputs['classes'] tensor_info:
        dtype: DT_STRING
        shape: (-1, 1)
        name: dnn/head/predictions/str_classes:0
    outputs['logits'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 3)
        name: dnn/logits/BiasAdd:0
    outputs['probabilities'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 3)
        name: dnn/head/predictions/probabilities:0
  Method name is: tensorflow/serving/predict

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['inputs'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: input_example_tensor:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['classes'] tensor_info:
        dtype: DT_STRING
        shape: (-1, 3)
        name: dnn/head/Tile:0
    outputs['scores'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 3)
        name: dnn/head/predictions/probabilities:0
  Method name is: tensorflow/serving/classify

我理解为，模型其实是一些方法声明，包括名称及调用约定，当实验报错时，可以沿着这些思路排查到底哪个地方出现了问题。
仅以此让大家少踩同样的坑。

Henzox

关注

3
点赞
踩
3

收藏

觉得还不错? 一键收藏
4
评论
机器学习模型工程化

背景机器学习把 Python 的历史地位推向了一个小高峰，我们习惯于用 Python 去编写机器学习的代码，但是在实际工程中，业务方用 Java 的可能性会比较大，那么如何用 Java 调用 Python 写的已经训练好的机器学习模型呢。本文接下来后以代码的形式极简演示，基于 Tensorflow 训练好的分类模型，如何使用 Java 应用调用。环境很奇怪，对于这样的问题，网络上很少有...
复制链接

扫一扫