azkaban实例

最新推荐文章于 2023-05-16 16:37:47 发布

livia*

最新推荐文章于 2023-05-16 16:37:47 发布

阅读量141

点赞数

分类专栏： MySQL

本文链接：https://blog.csdn.net/Hi_this_is_ID/article/details/103398234

版权

MySQL 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

本文详细介绍了如何使用Azkaban执行不同类型的工作流，包括单job、多job有依赖关系的流程，HDFS操作任务以及MapReduce任务。通过创建.job文件，打包上传到Azkaban的web管理平台进行执行，展示了Azkaban在调度和管理任务中的应用。

摘要由CSDN通过智能技术生成

1、Command 类型之单 job 工作流案例
在本地创建后缀名为.job的文本文件command.job
文本文件内容为：

#command.job
type=command
command=echo 'hello'

在把文件打成zip压缩包
在这里插入图片描述
登录到https://hdp-3:8443，通过 azkaban 的 web 管理平台创建 project 并上传 job 压缩包

job1完成！
2、Command 类型之多 job 工作流案例
创建有依赖关系的多个 job 描述
新建文本文件foo.job

type=command
command=echo foo

在新建一个文本文件bar.job，并依赖与foo.job

type=command
dependencies=foo
command=echo bar

把两个文本文件打包成一个.zip包
按照上面的步骤：通过 azkaban 的 web 管理平台创建 project 并上传 job 压缩包
结果：
在这里插入图片描述

3、HDFS 操作任务
创建 job 描述文件fs.job

type=command
command=hadoop fs -mkdir /azkaban

通过 azkaban 的 web 管理平台创建 project 并上传 job 压缩包
启动执行该 job
查看结果：
在这里插入图片描述

4、MapReduce 任务
Mr 任务依然可以使用 command 的 job 类型来执行
准备工作：在hdfs的azkaban目录下创建input，并在input下上传文件azkabanmrwc.data
在linux下hdp-3根目录下创建文件azkabanmrwc.data

[root@hdp-3 ~]# vi azkabanmrwc.data

内容：

1 blue 20
2 yellow 25
3 red 18
4 blacke 10
5 orange 15
6 white 23
7 green 9

方式一：在linux下输入命令
在hdfs上新建input路径：

hadoop fs -mkdir /azkaban/input

上传azkabanmrwc.data文件到input下：

[root@hdp-3 ~]# hadoop fs -put /root/azkabanmrwc.data /azkaban/input/

方式二：
新建shangchuan1.job（功能：创建input）

type=command
command=hadoop fs -mkdir /azkaban/input

新建shuangchuan2.job（功能：上传文件azkabanmrwc.data ）

type=command
dependencies=fs2
command=hadoop fs -put /root/azkabanmrwc.data /azkaban/input/

打成zip压缩包，通过 azkaban 的 web 管理平台创建 project 并上传 job 压缩包运行即可！

创建 job 描述文件，及 mr 程序 jar 包（示例中直接使用 hadoop 自带的 example jar）
mrwc.job

type=command
command=hadoop jar hadoop-mapreduce-examples-2.8.1.jar wordcount /azkaban/input/azkabanmrwc.data /azkaban/output

与hadoop-mapreduce-examples-2.8.1.jar打成压缩包
在这里插入图片描述
放到azkaban上运行：
中间查看运行过程时会有mapreduce过程：

成功结果：

05-12-2019 14:13:26 CST mrwc INFO - Starting job mrwc at 1575526406250
05-12-2019 14:13:26 CST mrwc INFO - Building command job executor. 
05-12-2019 14:13:26 CST mrwc INFO - 1 commands to execute.
05-12-2019 14:13:26 CST mrwc INFO - Command: hadoop jar hadoop-mapreduce-examples-2.8.1.jar wordcount /azkaban/input/azkabanmrwc.data /azkaban/output
05-12-2019 14:13:26 CST mrwc INFO - Environment variables: {JOB_OUTPUT_PROP_FILE=/root/apps/azkaban/azkaban-executor-2.5.0/executions/9/mrwc_output_4723153487441072185_tmp, JOB_PROP_FILE=/root/apps/azkaban/azkaban-executor-2.5.0/executions/9/mrwc_props_3535293413267978868_tmp, JOB_NAME=mrwc}
05-12-2019 14:13:26 CST mrwc INFO - Working directory: /root/apps/azkaban/azkaban-executor-2.5.0/executions/9
05-12-2019 14:13:28 CST mrwc ERROR - 19/12/05 14:13:28 INFO client.RMProxy: Connecting to ResourceManager at hdp-1/192.168.150.151:8032
05-12-2019 14:13:29 CST mrwc ERROR - 19/12/05 14:13:29 INFO input.FileInputFormat: Total input files to process : 1
05-12-2019 14:13:29 CST mrwc ERROR - 19/12/05 14:13:29 INFO mapreduce.JobSubmitter: number of splits:1
05-12-2019 14:13:30 CST mrwc ERROR - 19/12/05 14:13:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1575510666483_0005
05-12-2019 14:13:30 CST mrwc ERROR - 19/12/05 14:13:30 INFO impl.YarnClientImpl: Submitted application application_1575510666483_0005
05-12-2019 14:13:30 CST mrwc ERROR - 19/12/05 14:13:30 INFO mapreduce.Job: The url to track the job: http://hdp-1:8088/proxy/application_1575510666483_0005/
05-12-2019 14:13:30 CST mrwc ERROR - 19/12/05 14:13:30 INFO mapreduce.Job: Running job: job_1575510666483_0005
05-12-2019 14:13:37 CST mrwc ERROR - 19/12/05 14:13:37 INFO mapreduce.Job: Job job_1575510666483_0005 running in uber mode : false
05-12-2019 14:13:37 CST mrwc ERROR - 19/12/05 14:13:37 INFO mapreduce.Job:  map 0% reduce 0%
05-12-2019 14:13:45 CST mrwc ERROR - 19/12/05 14:13:45 INFO mapreduce.Job:  map 100% reduce 0%
05-12-2019 14:13:51 CST mrwc ERROR - 19/12/05 14:13:51 INFO mapreduce.Job:  map 100% reduce 100%
05-12-2019 14:13:51 CST mrwc ERROR - 19/12/05 14:13:51 INFO mapreduce.Job: Job job_1575510666483_0005 completed successfully
05-12-2019 14:13:51 CST mrwc ERROR - 19/12/05 14:13:51 INFO mapreduce.Job: Counters: 49
05-12-2019 14:13:52 CST mrwc ERROR - 	File System Counters
05-12-2019 14:13:52 CST mrwc ERROR - 		FILE: Number of bytes read=208
05-12-2019 14:13:52 CST mrwc ERROR - 		FILE: Number of bytes written=272765
05-12-2019 14:13:52 CST mrwc ERROR - 		FILE: Number of read operations=0
05-12-2019 14:13:52 CST mrwc ERROR - 		FILE: Number of large read operations=0
05-12-2019 14:13:52 CST mrwc ERROR - 		FILE: Number of write operations=0
05-12-2019 14:13:52 CST mrwc ERROR - 		HDFS: Number of bytes read=189
05-12-2019 14:13:52 CST mrwc ERROR - 		HDFS: Number of bytes written=118
05-12-2019 14:13:52 CST mrwc ERROR - 		HDFS: Number of read operations=6
05-12-2019 14:13:52 CST mrwc ERROR - 		HDFS: Number of large read operations=0
05-12-2019 14:13:52 CST mrwc ERROR - 		HDFS: Number of write operations=2
05-12-2019 14:13:52 CST mrwc ERROR - 	Job Counters 
05-12-2019 14:13:52 CST mrwc ERROR - 		Launched map tasks=1
05-12-2019 14:13:52 CST mrwc ERROR - 		Launched reduce tasks=1
05-12-2019 14:13:52 CST mrwc ERROR - 		Rack-local map tasks=1
05-12-2019 14:13:52 CST mrwc ERROR - 		Total time spent by all maps in occupied slots (ms)=4526
05-12-2019 14:13:52 CST mrwc ERROR - 		Total time spent by all reduces in occupied slots (ms)=3723
05-12-2019 14:13:52 CST mrwc ERROR - 		Total time spent by all map tasks (ms)=4526
05-12-2019 14:13:52 CST mrwc ERROR - 		Total time spent by all reduce tasks (ms)=3723
05-12-2019 14:13:52 CST mrwc ERROR - 		Total vcore-milliseconds taken by all map tasks=4526
05-12-2019 14:13:52 CST mrwc ERROR - 		Total vcore-milliseconds taken by all reduce tasks=3723
05-12-2019 14:13:52 CST mrwc ERROR - 		Total megabyte-milliseconds taken by all map tasks=4634624
05-12-2019 14:13:52 CST mrwc ERROR - 		Total megabyte-milliseconds taken by all reduce tasks=3812352
05-12-2019 14:13:52 CST mrwc ERROR - 	Map-Reduce Framework
05-12-2019 14:13:52 CST mrwc ERROR - 		Map input records=7
05-12-2019 14:13:52 CST mrwc ERROR - 		Map output records=21
05-12-2019 14:13:52 CST mrwc ERROR - 		Map output bytes=160
05-12-2019 14:13:52 CST mrwc ERROR - 		Map output materialized bytes=208
05-12-2019 14:13:52 CST mrwc ERROR - 		Input split bytes=113
05-12-2019 14:13:52 CST mrwc ERROR - 		Combine input records=21
05-12-2019 14:13:52 CST mrwc ERROR - 		Combine output records=21
05-12-2019 14:13:52 CST mrwc ERROR - 		Reduce input groups=21
05-12-2019 14:13:52 CST mrwc ERROR - 		Reduce shuffle bytes=208
05-12-2019 14:13:52 CST mrwc ERROR - 		Reduce input records=21
05-12-2019 14:13:52 CST mrwc ERROR - 		Reduce output records=21
05-12-2019 14:13:52 CST mrwc ERROR - 		Spilled Records=42
05-12-2019 14:13:52 CST mrwc ERROR - 		Shuffled Maps =1
05-12-2019 14:13:52 CST mrwc ERROR - 		Failed Shuffles=0
05-12-2019 14:13:52 CST mrwc ERROR - 		Merged Map outputs=1
05-12-2019 14:13:52 CST mrwc ERROR - 		GC time elapsed (ms)=134
05-12-2019 14:13:52 CST mrwc ERROR - 		CPU time spent (ms)=1130
05-12-2019 14:13:52 CST mrwc ERROR - 		Physical memory (bytes) snapshot=291848192
05-12-2019 14:13:52 CST mrwc ERROR - 		Virtual memory (bytes) snapshot=4161122304
05-12-2019 14:13:52 CST mrwc ERROR - 		Total committed heap usage (bytes)=139329536
05-12-2019 14:13:52 CST mrwc ERROR - 	Shuffle Errors
05-12-2019 14:13:52 CST mrwc ERROR - 		BAD_ID=0
05-12-2019 14:13:52 CST mrwc ERROR - 		CONNECTION=0
05-12-2019 14:13:52 CST mrwc ERROR - 		IO_ERROR=0
05-12-2019 14:13:52 CST mrwc ERROR - 		WRONG_LENGTH=0
05-12-2019 14:13:52 CST mrwc ERROR - 		WRONG_MAP=0
05-12-2019 14:13:52 CST mrwc ERROR - 		WRONG_REDUCE=0
05-12-2019 14:13:52 CST mrwc ERROR - 	File Input Format Counters 
05-12-2019 14:13:52 CST mrwc ERROR - 		Bytes Read=76
05-12-2019 14:13:52 CST mrwc ERROR - 	File Output Format Counters 
05-12-2019 14:13:52 CST mrwc ERROR - 		Bytes Written=118
05-12-2019 14:13:52 CST mrwc INFO - Process completed successfully in 26 seconds.
05-12-2019 14:13:52 CST mrwc INFO - Finishing job mrwc at 1575526432377 with status SUCCEEDED

这时在hdfs上的azkaban下就有了output等等
在这里插入图片描述
再回到hdp-3上查看：

[root@hdp-3 ~]# hadoop fs -cat /azkaban/output/part-r-00000
1       1
10      1
15      1
18      1
2       1
20      1
23      1
25      1
3       1
4       1
5       1
6       1
7       1
9       1
blacke  1
blue    1
green   1
orange  1
red     1
white   1
yellow  1
[root@hdp-3 ~]#

可以看到已经根据空格切开！

livia*

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
azkaban实例

1、Command 类型之单 job 工作流案例在本地创建后缀名为.job的文本文件command.job文本文件内容为：#command.jobtype=commandcommand=echo 'hello'在把文件打成zip压缩包登录到https://hdp-3:8443，通过 azkaban 的 web 管理平台创建 project 并上传 job 压缩包...
复制链接

扫一扫

专栏目录