Flink学习 - 3. Flink安装部署
downloads
官网进行下载: https://flink.apache.org/downloads.html
环境要求
- java 8 以上
java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)
Local
解压tar包:
cd ~/Downloads
tar -zxvf flink-1.9.1-bin-scala_2.12.tgz flink-1.9.1
启动:
cd flink-1.9.1
./bin/start-cluster.sh
成功后有如下返回:
Starting cluster.
Starting standalonesession daemon on host Jerome-zimu.local.
Starting taskexecutor daemon on host Jerome-zimu.local.
web 访问
打开http://localhost:8081 能够看到如下所示:
同样可以在log目录下查看执行日志:
$ tail log/flink-*-standalonesession-*.log
INFO ... - Rest endpoint listening at localhost:8081
INFO ... - http://localhost:8081 was granted leadership ...
INFO ... - Web frontend listening at http://localhost:8081.
INFO ... - Starting RPC endpoint for StandaloneResourceManager at akka://flink/user/resourcemanager .
INFO ... - Starting RPC endpoint for StandaloneDispatcher at akka://flink/user/dispatcher .
INFO ... - ResourceManager akka.tcp://flink@localhost:6123/user/resourcemanager was granted leadership ...
INFO ... - Starting the SlotManager.
INFO ... - Dispatcher akka.tcp://flink@localhost:6123/user/dispatcher was granted leadership ...
INFO ... - Recovering all persisted jobs.
INFO ... - Registering TaskManager ... at ResourceManager
关闭服务:
./bin/stop-cluster.sh
StandaloneCluster
修改配置文件
- flink-conf.yaml
vim flink-conf.yaml
修改一下重要参数:
jobmanager.rpc.address:10.0.0.1 # 设置成你master节点的IP地址
jobmanager.rpc.port:6123
jobmanager.heap.mb:1024
taskmanager.heap.mb:1024
taskmanager.numberOfTaskSlots:2
parallelism.default:2
taskmanager.tmp.dirs:/tmp
jobmanager.web.port: 8081 # web ui端口号
- slaves
vim slaves
将要作为worker节点的IP(或者是hostname)地址存放在conf/slaves文件中
-
设置JAVA_HOME 环境变量,或者设置env.java.home为jdk的路径
-
将对应的flink文件夹发到对应的worker节点中
启动
bin/start-cluster.sh
停止
bin/stop-cluster.sh
在集群上添加 JobManager
bin/jobmanager.sh ((start|start-foreground) [host] [webui-port])|stop|stop-all
添加TaskManager
bin/taskmanager.sh start|start-foreground|stop|stop-all
Yarn
要求集群环境中已正确安装hadoop并配置好相应的环境及变量
export SCALA_HOME=/data/scala
export HADOOP_CONF_DIR=/etc/hadoop/conf
export YARN_CONF_DIR=/etc/hadoop/conf
export FLINK_HOME=/data/flink/latest
yarn-session
启动一个YARN session(Start a long-running Flink cluster on YARN)
bin/yarn-session.sh -n 4 -jm 1024 -tm 2048 -s 2
上面的命令表示:
jobManager 1G
4个TaskManager,每个TaskManager内存为2G,
且开启了2 TaskSlots。
通过Flink的web ui可以看到启动集群的物理和环境信息。
- 在这个session环境下,可以通过flink run命令,启动Flink任务:
flink run -m hnode4:34707 /data/flink/latest/examples/batch/WordCount.jar --input hdfs:///tmp/yarn/photo_test.csv
run on yarn
flink run \
-m yarn-cluster \
-yqu test \
-yD name=hadoop \
-p 2 \
-yjm 1024 \
-c com.jerome.zimu.WordCount \
-ynm Flink_WordCount_Test \
-yD HADOOP_HOME=/usr/local/service/hadoop \
/data/jerome/run_jar/jerome-zimu-0.1.jar --bootstrapServers local:9092 --sinkBootstrapServers local:9092 --zookeeperServers local:2181 --sourceTopic jerome_test --sinkTopic jerome_sink_test --groupId test_groupid --offsetReset earliest