Flink on yarn
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/resource-providers/yarn/
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/config/#memory-configuration
准备yarn集群
下载安装flink
$ wget https://www.apache.org/dyn/closer.lua/flink/flink-1.17.1/flink-1.17.1-bin-scala_2.12.tgz
$ tar -xzf flink-1.17.1-bin-scala_2.12.tgz
$ useradd flink
$ hdfs dfs -mkdir /user/flink
$ hdfs dfs -chown flink /user/flink
Session Mode
flink-1.17.1]$ export HADOOP_CLASSPATH=`hadoop classpath`
flink-1.17.1]$ bin/yarn-session.sh --detached
# 对应 flink-conf.yaml 中的下面两个配置
# jobmanager.memory.process.size: 1600m
# taskmanager.memory.process.size: 1728m
2023-08-09 16:32:06,449 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - The configured JobManager memory is 1600 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 448 MB may not be used by Flink.
2023-08-09 16:32:06,449 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - The configured TaskManager memory is 1728 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 320 MB may not be used by Flink.
2023-08-09 16:32:31,211 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Found Web Interface datanode01.example.io:44386 of application 'application_1691567651792_0002'.
JobManager Web Interface: http://datanode01.example.io:44386
2023-08-09 16:32:31,676 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli [] - The Flink YARN session cluster has been started in detached mode. In order to stop Flink gracefully, use the following command:
$ echo "stop" | ./bin/yarn-session.sh -id application_1691567651792_0002
If this should not be possible, then you can also kill Flink via YARN's web interface or via:
$ yarn application -kill application_1691567651792_0002
Note that killing Flink might not clean up all job artifacts and temporary files.
flink-1.17.1]$ bin/flink run -d ./examples/streaming/TopSpeedWindowing.jar
flink-1.17.1]$ echo "stop" | ./bin/yarn-session.sh -id application_1691567651792_0002
Application Mode
flink-1.17.1]$ export HADOOP_CLASSPATH=`hadoop classpath`
flink-1.17.1]$ bin/flink run-application -t yarn-application -d ./examples/streaming/TopSpeedWindowing.jar
Flink Table API
Hive Connector
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/table/hive/overview/
下载 bundled hive jar 到 $FLINK_HOME/lib 目录
- flink-sql-connector-hive-3.1.3_2.12-1.17.1.jar
flink-1.17.1]$ cp /opt/apache-hive-3.1.3-bin/lib/hive-exec-3.1.3.jar .
flink-1.17.1]$ cp /opt/apache-hive-3.1.3-bin/lib/antlr-runtime-3.5.2.jar .
flink-1.17.1]$ mv opt/flink-table-planner_2.12-1.17.1.jar lib/
flink-1.17.1]$ mv lib/flink-table-planner-loader-1.17.1.jar opt/
# 没有这个包的话会出现 ClassNotFoundException: org.apache.hadoop.mapred.JobConf
flink-1.17.1]$ cp $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.3.6.jar lib/
JDBC Connector
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/table/jdbc/
下载相关依赖包 到 $FLINK_HOME/lib 目录
- flink-connector-jdbc-3.1.0-1.17.jar
- mysql-connector-java-8.0.27.jar
SQL Client
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/sqlclient/
flink-1.17.1]$ export HADOOP_CLASSPATH=`hadoop classpath`
flink-1.17.1]$ bin/yarn-session.sh --detached
flink-1.17.1]$ bin/sql-client.sh
SET 'sql-client.execution.result-mode' = 'tableau';
CREATE TABLE Orders (
order_number BIGINT,
price DECIMAL(32,2),
first_name STRING,
last_name STRING,
order_time TIMESTAMP(3)
) WITH (
'connector' = 'datagen',
'number-of-rows' = '10'
);
CREATE TABLE MyOrders (
order_number BIGINT,
price DECIMAL,
first_name STRING,
last_name STRING,
order_time TIMESTAMP
) WITH (
'connector' = 'jdbc',
'url' = 'jdbc:mysql://192.168.36.128:3306/demo',
'table-name' = 'Orders',
'username' = 'root',
'password' = 'root'
);
INSERT INTO MyOrders SELECT * FROM Orders;
CREATE CATALOG hive WITH (
'type' = 'hive',
'default-database' = 'default',
'hive-conf-dir' = '/etc/hive/conf'
);
USE CATALOG hive;
SELECT * FROM Orders;
Flink HistoryServer
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/advanced/historyserver/
# 编辑 flink-conf.yaml,追加 historyserver 相关配置
flink-1.17.1]$ vi conf/flink-conf.yaml
jobmanager.archive.fs.dir: hdfs:///tmp/flink/completed-jobs
historyserver.web.address: 0.0.0.0
historyserver.web.port: 8082
historyserver.archive.fs.dir: hdfs:///tmp/flink/completed-jobs
historyserver.archive.fs.refresh-interval: 10000
historyserver.log.jobmanager.url-pattern: http://edge.example.io:8082/<jobid>
historyserver.log.taskmanager.url-pattern: http://edge.example.io:8082/<jobid>/<tmid>
flink-1.17.1]$ export HADOOP_CLASSPATH=`hadoop classpath`
flink-1.17.1]$ bin/historyserver.sh start
通过访问 http://edge.example.io:8082 地址即可访问 flink 的历史作业