Spring Cloud DataFlow可以deploy到YARN集群中
1)下载源代码并编译
git clone https://github.com/spring-cloud/spring-cloud-dataflow-server-yarn.git
mvn clean package -DskipTests
在dist项目中可以可以找到安装包:
spring-cloud-dataflow-server-yarn-dist-1.2.3.BUILD-SNAPSHOT.zip
2)复制到安装主机,解压并配置config/servers.yml,主要是hadoop配置
hadoop:
fsUri: hdfs://${NN}:8020
resourceManagerHost: ${RM}
resourceManagerPort: 8050
resourceManagerSchedulerAddress: ${RM}:8030
# Configured for Redis running on localhost. Replace at least host property when running in a
.....
datasource: //默认使用H2数据库,在测试环境下不用修改配置
url: jdbc:h2:tcp://localhost:19092/mem:dataflow
username: sa
password:
driverClassName: org.h2.Driver
3)配置Hadoop环境,创建目录及上传以来Jar包
在HDFS中创建默认目录
hdfs dfs -mkdir -p /dataflow/apps/tasks
将task appmaster jar包复制到目录中
hdfs dfs -put spring-cloud-deployer-yarn-tasklauncherappmaster-1.2.2.RELEASE.jar /dataflow/apps/task/
3)启动spring-cloud dataflow server
bin/dataflow-server-yarn
4)启动spring-cloud dataflow shell
bin/dataflow-shell
5) 在yarn中创建及启动task
dataflow:>app register --name timestamp --type task --uri maven://org.springframework.cloud.task.app:timestamp-task:1.3.0.RELEASE
Successfully registered application 'task:timestamp'
dataflow:>app list
dataflow:>task create --name printTimeStamp --definition "timestamp"
Created new task 'printTimeStamp'
dataflow:>task list
dataflow:>task launch printTimeStamp
Launched task 'printTimeStamp'
dataflow:>task execution list
6)在YARN 中查看tasks的执行
yarn ui
执行成功。打开yarn container日志
- yarn appMaster日志
vim Appmaster.stdout
Starting TaskAppmasterApplication v1.2.2.RELEASE on cmhhost2.novalocal with PID 5887 (/hadoop/yarn/local/usercache/root/appcache/application_1531745702560_0004/filecache/10/spring-cloud-deployer-yarn-tasklauncherappmaster-1.2.2.RELEASE.jar started by yarn in /hadoop/yarn/local/usercache/root/appcache/application_1531745702560_0004/container_1531745702560_0004_01_000001)
o.s.c.d.s.y.tasklauncher.TaskAppmaster : Using command list for task container: $JAVA_HOME/bin/java,,-Dserver.port=0,-Dspring.jmx.enabled=false,-Dspring.config.location=servers.yml,-jar,timestamp-task-1.3.0.RELEASE.jar,--spring.datasource.driverClassName='org.h2.Driver',--spring.datasource.username='sa',--spring.cloud.task.name='printTimeStamp',--spring.datasource.url='jdbc:h2:tcp://localhost:19092/mem:dataflow','--spring.cloud.task.executionid=1',1><LOG_DIR>/Container.stdout,2><LOG_DIR>/Container.stderr
- 打开container日志
Executed SQL script from class path resource [org/springframework/cloud/task/schema-h2.sql]
TimestampTaskConfiguration$TimestampTask : 2018-07-19 01:25:04.643
参考链接:
https://github.com/spring-cloud/spring-cloud-dataflow-server-yarn