1、环境说明
以下测试基于centos7.8+flink1.14+jdk1.8+python3.8+pyflink1.14,本文只介绍python3.8和pyflink1.14的安装,其余环境的安装自行百度。
2、Python3.8的环境安装
- Python3.8的安装包 下载链接:百度网盘下载 提取码:f25h
上传并解压安装包
tar -zxvf Python-3.8.0
环境准备
安装python3.8所需要的环境GCC
yum -y install gcc
安装python3所需要的组件(一路y)
yum install openssl-devel bzip2-devel expat-develgdbm-devel
yum install readline-develsqlite*-develmysql-devellibffi-devel
在Python解压包的目录下执行编译、安装脚本,依次执行以下脚本:
sudo ./configure
sudo make
sudo make install
查看python安装结果
如果有版本信息则安装成功。
python3 --version
查看系统默认的版本,centos7一般是python默认的版本是2.7
python --version
与pyflink要求的版本信息不符,需要创建软连接,让python默认使用的是3.8的版本
创建软连接
在/usr/bin下将python备份,不然的话会报python已存在
mv python python_bak
ln -s /usr/local/bin/python3 /usr/bin/python # python指向python3,bin要看环境的位置,不一定在你的安装包下
出现下图则表示python安装已完成:
3.安装pyflink1.14
使用pip命令安装python的一些库时,由于默认服务器在国外,因此下载需要很长时间,本文建议采用清华镜像源。
分别执行下列命令即可永久替换服务器为清华镜像源
pip3 install pip -U
pip3 config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
# "pip install pip -U" 是用于执行升级pip的命令;
安装pyflink1.14
python -m pip install apache-flink==1.14.4
如果不报错就说明安装成功
4、测试用例
编写java的flink任务,如下
package com.xw.flink;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.python.PythonOptions;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
public class testPyhton {
public static void main(String[] args) throws Exception {
// StreamExecutionEnvironment env = StreamExecutionEnvironment.createRemoteEnvironment("172.16.100.9",8081);
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// EnvironmentSettings build = EnvironmentSettings.newInstance().inStreamingMode().useBlinkPlanner().build();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
Configuration configuration = tableEnv.getConfig().getConfiguration();
//任务名称设定
configuration.setString("pipeline.name", "dddddd");
// python-udf文件在服务器上的位置
configuration.setString(PythonOptions.PYTHON_FILES, "/usr/local/phfile/test1.py");
// python的执行环境
configuration.setString(PythonOptions.PYTHON_CLIENT_EXECUTABLE, "/usr/local/bin/python3");
configuration.setString(PythonOptions.PYTHON_EXECUTABLE,"/usr/local/bin/python3");
tableEnv.executeSql("CREATE TEMPORARY SYSTEM FUNCTION FunctionName AS 'test1.FunctionName' LANGUAGE PYTHON");
String sourceDDL = "CREATE TABLE table_source(" +
"column_name INT, " +
"vonvon STRING, " +
"bbm DOUBLE )" +
"WITH(" +
" 'connector' = 'jdbc', " +
" 'driver'='com.mysql.cj.jdbc.Driver', " +
" 'url'='jdbc:mysql://172.16.0.68:3306/titandb?rewriteBatchedStatements=true&useUnicode=true&characterEncoding=utf8&autoReconnect=true&useSSL=false&zeroDateTimeBehavior=convertToNull&allowMultiQueries=true&serverTimezone=GMT%2B8', " +
" 'table-name'='abcd', " +
" 'username'='root', " +
" 'password'='root'," +
" 'sink.buffer-flush.max-rows' = '20000'," +
" 'sink.buffer-flush.interval' = '3000'" +
")";
// 创建表
tableEnv.executeSql(sourceDDL);
String tableSink = "CREATE TABLE table_sink ("+
" a INT,"+
"b STRING,"+
"c DOUBLE"+
") WITH ("+
" 'connector'='print')";
System.out.println(tableSink);
// 创建表
tableEnv.executeSql(tableSink);
// 这里采用了python的自定义函数
String inserttable = "insert into table_sink(a,b,c) select column_name,FunctionName(vonvon),bbm from table_source";
tableEnv.executeSql(inserttable);
// env.execute();
}
}
python的自定义函数
#!/usr/bin/env python3
from pyflink.table import DataTypes
from pyflink.table.udf import udf, ScalarFunction
@udf(input_types=DataTypes.STRING(), result_type=DataTypes.STRING())
def FunctionName(id):
# output temperature: pandas.Series
return f"{id}|123"
打包之后提交到flink-web上运行,在taskmanager的控制台就可以看到输出了,以上代码示例根据自己的实际调整,当然flink的lib下需要有相应的jar包。当然table API也可正常调用python-udf的函数,已验证。