Gateway for flinksql
简介
github: flink-sql-gateway
Flink SQL gateway is a service that allows other applications to easily interact with a Flink cluster through a REST API.
User applications (e.g. Java/Python/Shell program, Postman) can use the REST API to submit queries, cancel jobs, retrieve results, etc.
Flink JDBC driver enables JDBC clients to connect to Flink SQL gateway based on the REST API.
Currently, the REST API is a set of internal APIs and we recommend users to interact with the gateway through JDBC API. Flink SQL gateway stores the session properties in memory now. If the service is stopped or crashed, all properties are lost. We will improve this in the future.
This project is at an early stage. Feel free to file an issue if you meet any problems or have any suggestions.
主要特性
- 开放restful接口以支持用户提交sql、启动任务、查询任务等操作
- 支持 beeline + flink jdbc driver提供交互式操作
可改进点
- sessionStore暂时是inmemory的,状态没有持久化
- session建立的时候executeType是既定的,stream或batch
- 接口不友好,需要优化。如session heartbeat接口对于已经close或者不存在的session会返回异常route栈
- catalog默认inmemory。可以考虑接入hive metastore或自建
- job store依赖session管理,跨session不能访问
- 缺乏namespace等租户相关设计
简要架构
- Handlers: 同router绑定以处理不同path的请求
- SessionManager: 处理session相关事件,提供inmemory的sessionStore
- Session: 用于处理一个交互周期内的相关statement等事件,会绑定execType指定处理流请求还是批请求。内嵌sessionCtx,存储了session级别的tableEnv,flinkConfig等信息
- catalogManager: 同catalog交互,用于处理ddl相关请求
- catalog: 默认inmemoryCatalog,可切换为hiveMetastoreCatalog
- sqlCommandParser: 用于解析sql类型,确定operator类型
- operationFactory: 具体生成operator
- Operator: 根据statement确定具体行为并执行。可以直接进行catalog update或者构建pipeline(streamgraph)给programDeployer部署
- programDeployer: 生成具体的jobgraph并根据pipelineExecutor类型提交到不同平台
GetStart
- 下载flink 1.12
- 启动flink cluster
- build flinksql-gateway或者直接下载 flinksql-gateway release 1.12
- 下载相关包到ref_dir,如kafka-clients,flink-connector-kafka等
- 启动gateway,./bin/sql-gateway.sh -l ref_dir
具体可以参照官方文档进行启动
获取gateway信息
req: GET /v1/info
rsp:
{
"product_name": "Apache Flink",
"version": "1.12.2"
}
建立session
req: POST /v1/sessions
rsp:
{
"session_id": "52a56a2c3b25932e9249807786b1595d"
}
执行sql statement
req: /v1/sessions/:session_id/statements
req_para: session_id = 52a56a2c3b25932e9249807786b1595d
建表
req_body:
{
"statement":"CREATE TABLE Orders (\n `user` BIGINT,\n product STRING,\n order_time TIMESTAMP(3)\n) WITH ( \n 'connector' = 'kafka',\n 'scan.startup.mode' = 'earliest-offset',\n 'topic' = 'user_behavior',\n 'properties.bootstrap.servers' = 'localhost:9092',\n 'properties.group.id' = 'testGroup',\n 'scan.startup.mode' = 'latest-offset',\n 'format' = 'csv'\n)",
"execution_timeout":"10000"
}
rsp
{
"results": [
{
"result_kind": "SUCCESS",
"columns": [
{
"name": "result",
"type": "VARCHAR(2)"
}
],
"data": [
[
"OK"
]
]
}
],
"statement_types": [
"CREATE_TABLE"
]
}
查表
req_body:
{
"statement":"show tables",
"execution_timeout":"10000"
}
rsp:
{
"results": [
{
"result_kind": "SUCCESS_WITH_CONTENT",
"columns": [
{
"name": "tables",
"type": "VARCHAR(6) NOT NULL"
}
],
"data": [
[
"Orders"
]
]
}
],
"statement_types": [
"SHOW_TABLES"
]
}
执行任务
req_body:
{
"statement":" select * from Orders",
"execution_timeout":"10000"
}
rsp:
{
"results": [
{
"result_kind": "SUCCESS_WITH_CONTENT",
"columns": [
{
"name": "job_id",
"type": "VARCHAR(32) NOT NULL"
}
],
"data": [
[
"e19ac03b71e7ec9768ffba72e89a10ad"
]
]
}
],
"statement_types": [
"SELECT"
]
}
查询job状态
req: GET /v1/sessions/:session_id/jobs/:job_id/status
req_param:
session_id="9a3d56be5772ad88ae7ee819f5f1b581"
job_id="e19ac03b71e7ec9768ffba72e89a10ad"
rsp:
{
"status": "RUNNING"
}