将python算法转为scala_PyJava: PyJava 是一个用于在 Java/Scala 和 Python 之间转换数据的库...

最新推荐文章于 2023-12-11 13:42:53 发布

我甜死了

最新推荐文章于 2023-12-11 13:42:53 发布

阅读量424

点赞数

文章标签：将python算法转为scala

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_35867979/article/details/112921413

版权

PyJava

This library is an ongoing effort towards bringing the data exchanging ability

between Java/Scala and Python. PyJava introduces Apache Arrow as the exchanging data format,

this means we can avoid ser/der between Java/Scala and Python which can really speed up the

communication efficiency than traditional way.

When you invoke python code in Java/Scala side, PyJava will start some python workers automatically

and send the data to python worker, and once they are processed, send them back. The python workers are reused

by default.

The initial code in this lib is from Apache Spark.

Install

Setup python(>= 3.6) Env(Conda is recommended):

pip uninstall pyjava && pip installpyjava

Setup Java env(Maven is recommended):

tech.mlsql

pyjava-2.4_2.12

0.2.8.0

Using python code snippet to process data in Java/Scala

With pyjava, you can run any python code in your Java/Scala application.

val envs = new util.HashMap[String, String]()

// prepare python environment

envs.put(str(PythonConf.PYTHON_ENV), "source activate dev && export ARROW_PRE_0_15_IPC_FORMAT=1 ")

// describe the data which will be transfered to python

val sourceSchema = StructType(Seq(StructField("value", StringType)))

val batch = new ArrowPythonRunner(

Seq(ChainedPythonFunctions(Seq(PythonFunction(

"""

|import pandas as pd

|import numpy as np

|

|def process():

| for item in context.fetch_once_as_rows():

| item["value1"] = item["value"] + "_suffix"

| yield item

|

|context.build_result(process())

""".stripMargin, envs, "python", "3.6")))), sourceSc

最低0.47元/天解锁文章

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
将python算法转为scala_PyJava: PyJava 是一个用于在 Java/Scala 和 Python 之间转换数据的库...

PyJavaThis library is an ongoing effort towards bringing the data exchanging abilitybetween Java/Scala and Python. PyJava introduces Apache Arrow as the exchanging data format,this means we can avoid ...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。