1、Command 类型之单 job 工作流案例
在本地创建后缀名为.job的文本文件command.job
文本文件内容为:
#command.job
type=command
command=echo 'hello'
在把文件打成zip压缩包
登录到https://hdp-3:8443,通过 azkaban 的 web 管理平台创建 project 并上传 job 压缩包
job1完成!
2、Command 类型之多 job 工作流案例
创建有依赖关系的多个 job 描述
新建文本文件foo.job
type=command
command=echo foo
在新建一个文本文件bar.job,并依赖与foo.job
type=command
dependencies=foo
command=echo bar
把两个文本文件打包成一个.zip包
按照上面的步骤:通过 azkaban 的 web 管理平台创建 project 并上传 job 压缩包
结果:
3、HDFS 操作任务
创建 job 描述文件fs.job
type=command
command=hadoop fs -mkdir /azkaban
通过 azkaban 的 web 管理平台创建 project 并上传 job 压缩包
启动执行该 job
查看结果:
4、MapReduce 任务
Mr 任务依然可以使用 command 的 job 类型来执行
准备工作:在hdfs的azkaban目录下创建input,并在input下上传文件azkabanmrwc.data
在linux下hdp-3根目录下创建文件azkabanmrwc.data
[root@hdp-3 ~]# vi azkabanmrwc.data
内容:
1 blue 20
2 yellow 25
3 red 18
4 blacke 10
5 orange 15
6 white 23
7 green 9
方式一:在linux下输入命令
在hdfs上新建input路径:
hadoop fs -mkdir /azkaban/input
上传azkabanmrwc.data文件到input下:
[root@hdp-3 ~]# hadoop fs -put /root/azkabanmrwc.data /azkaban/input/
方式二:
新建shangchuan1.job(功能:创建input)
type=command
command=hadoop fs -mkdir /azkaban/input
新建shuangchuan2.job(功能:上传文件azkabanmrwc.data )
type=command
dependencies=fs2
command=hadoop fs -put /root/azkabanmrwc.data /azkaban/input/
打成zip压缩包,通过 azkaban 的 web 管理平台创建 project 并上传 job 压缩包运行即可!
创建 job 描述文件,及 mr 程序 jar 包(示例中直接使用 hadoop 自带的 example jar)
mrwc.job
type=command
command=hadoop jar hadoop-mapreduce-examples-2.8.1.jar wordcount /azkaban/input/azkabanmrwc.data /azkaban/output
与hadoop-mapreduce-examples-2.8.1.jar打成压缩包
放到azkaban上运行:
中间查看运行过程时会有mapreduce过程:
成功结果:
05-12-2019 14:13:26 CST mrwc INFO - Starting job mrwc at 1575526406250
05-12-2019 14:13:26 CST mrwc INFO - Building command job executor.
05-12-2019 14:13:26 CST mrwc INFO - 1 commands to execute.
05-12-2019 14:13:26 CST mrwc INFO - Command: hadoop jar hadoop-mapreduce-examples-2.8.1.jar wordcount /azkaban/input/azkabanmrwc.data /azkaban/output
05-12-2019 14:13:26 CST mrwc INFO - Environment variables: {JOB_OUTPUT_PROP_FILE=/root/apps/azkaban/azkaban-executor-2.5.0/executions/9/mrwc_output_4723153487441072185_tmp, JOB_PROP_FILE=/root/apps/azkaban/azkaban-executor-2.5.0/executions/9/mrwc_props_3535293413267978868_tmp, JOB_NAME=mrwc}
05-12-2019 14:13:26 CST mrwc INFO - Working directory: /root/apps/azkaban/azkaban-executor-2.5.0/executions/9
05-12-2019 14:13:28 CST mrwc ERROR - 19/12/05 14:13:28 INFO client.RMProxy: Connecting to ResourceManager at hdp-1/192.168.150.151:8032
05-12-2019 14:13:29 CST mrwc ERROR - 19/12/05 14:13:29 INFO input.FileInputFormat: Total input files to process : 1
05-12-2019 14:13:29 CST mrwc ERROR - 19/12/05 14:13:29 INFO mapreduce.JobSubmitter: number of splits:1
05-12-2019 14:13:30 CST mrwc ERROR - 19/12/05 14:13:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1575510666483_0005
05-12-2019 14:13:30 CST mrwc ERROR - 19/12/05 14:13:30 INFO impl.YarnClientImpl: Submitted application application_1575510666483_0005
05-12-2019 14:13:30 CST mrwc ERROR - 19/12/05 14:13:30 INFO mapreduce.Job: The url to track the job: http://hdp-1:8088/proxy/application_1575510666483_0005/
05-12-2019 14:13:30 CST mrwc ERROR - 19/12/05 14:13:30 INFO mapreduce.Job: Running job: job_1575510666483_0005
05-12-2019 14:13:37 CST mrwc ERROR - 19/12/05 14:13:37 INFO mapreduce.Job: Job job_1575510666483_0005 running in uber mode : false
05-12-2019 14:13:37 CST mrwc ERROR - 19/12/05 14:13:37 INFO mapreduce.Job: map 0% reduce 0%
05-12-2019 14:13:45 CST mrwc ERROR - 19/12/05 14:13:45 INFO mapreduce.Job: map 100% reduce 0%
05-12-2019 14:13:51 CST mrwc ERROR - 19/12/05 14:13:51 INFO mapreduce.Job: map 100% reduce 100%
05-12-2019 14:13:51 CST mrwc ERROR - 19/12/05 14:13:51 INFO mapreduce.Job: Job job_1575510666483_0005 completed successfully
05-12-2019 14:13:51 CST mrwc ERROR - 19/12/05 14:13:51 INFO mapreduce.Job: Counters: 49
05-12-2019 14:13:52 CST mrwc ERROR - File System Counters
05-12-2019 14:13:52 CST mrwc ERROR - FILE: Number of bytes read=208
05-12-2019 14:13:52 CST mrwc ERROR - FILE: Number of bytes written=272765
05-12-2019 14:13:52 CST mrwc ERROR - FILE: Number of read operations=0
05-12-2019 14:13:52 CST mrwc ERROR - FILE: Number of large read operations=0
05-12-2019 14:13:52 CST mrwc ERROR - FILE: Number of write operations=0
05-12-2019 14:13:52 CST mrwc ERROR - HDFS: Number of bytes read=189
05-12-2019 14:13:52 CST mrwc ERROR - HDFS: Number of bytes written=118
05-12-2019 14:13:52 CST mrwc ERROR - HDFS: Number of read operations=6
05-12-2019 14:13:52 CST mrwc ERROR - HDFS: Number of large read operations=0
05-12-2019 14:13:52 CST mrwc ERROR - HDFS: Number of write operations=2
05-12-2019 14:13:52 CST mrwc ERROR - Job Counters
05-12-2019 14:13:52 CST mrwc ERROR - Launched map tasks=1
05-12-2019 14:13:52 CST mrwc ERROR - Launched reduce tasks=1
05-12-2019 14:13:52 CST mrwc ERROR - Rack-local map tasks=1
05-12-2019 14:13:52 CST mrwc ERROR - Total time spent by all maps in occupied slots (ms)=4526
05-12-2019 14:13:52 CST mrwc ERROR - Total time spent by all reduces in occupied slots (ms)=3723
05-12-2019 14:13:52 CST mrwc ERROR - Total time spent by all map tasks (ms)=4526
05-12-2019 14:13:52 CST mrwc ERROR - Total time spent by all reduce tasks (ms)=3723
05-12-2019 14:13:52 CST mrwc ERROR - Total vcore-milliseconds taken by all map tasks=4526
05-12-2019 14:13:52 CST mrwc ERROR - Total vcore-milliseconds taken by all reduce tasks=3723
05-12-2019 14:13:52 CST mrwc ERROR - Total megabyte-milliseconds taken by all map tasks=4634624
05-12-2019 14:13:52 CST mrwc ERROR - Total megabyte-milliseconds taken by all reduce tasks=3812352
05-12-2019 14:13:52 CST mrwc ERROR - Map-Reduce Framework
05-12-2019 14:13:52 CST mrwc ERROR - Map input records=7
05-12-2019 14:13:52 CST mrwc ERROR - Map output records=21
05-12-2019 14:13:52 CST mrwc ERROR - Map output bytes=160
05-12-2019 14:13:52 CST mrwc ERROR - Map output materialized bytes=208
05-12-2019 14:13:52 CST mrwc ERROR - Input split bytes=113
05-12-2019 14:13:52 CST mrwc ERROR - Combine input records=21
05-12-2019 14:13:52 CST mrwc ERROR - Combine output records=21
05-12-2019 14:13:52 CST mrwc ERROR - Reduce input groups=21
05-12-2019 14:13:52 CST mrwc ERROR - Reduce shuffle bytes=208
05-12-2019 14:13:52 CST mrwc ERROR - Reduce input records=21
05-12-2019 14:13:52 CST mrwc ERROR - Reduce output records=21
05-12-2019 14:13:52 CST mrwc ERROR - Spilled Records=42
05-12-2019 14:13:52 CST mrwc ERROR - Shuffled Maps =1
05-12-2019 14:13:52 CST mrwc ERROR - Failed Shuffles=0
05-12-2019 14:13:52 CST mrwc ERROR - Merged Map outputs=1
05-12-2019 14:13:52 CST mrwc ERROR - GC time elapsed (ms)=134
05-12-2019 14:13:52 CST mrwc ERROR - CPU time spent (ms)=1130
05-12-2019 14:13:52 CST mrwc ERROR - Physical memory (bytes) snapshot=291848192
05-12-2019 14:13:52 CST mrwc ERROR - Virtual memory (bytes) snapshot=4161122304
05-12-2019 14:13:52 CST mrwc ERROR - Total committed heap usage (bytes)=139329536
05-12-2019 14:13:52 CST mrwc ERROR - Shuffle Errors
05-12-2019 14:13:52 CST mrwc ERROR - BAD_ID=0
05-12-2019 14:13:52 CST mrwc ERROR - CONNECTION=0
05-12-2019 14:13:52 CST mrwc ERROR - IO_ERROR=0
05-12-2019 14:13:52 CST mrwc ERROR - WRONG_LENGTH=0
05-12-2019 14:13:52 CST mrwc ERROR - WRONG_MAP=0
05-12-2019 14:13:52 CST mrwc ERROR - WRONG_REDUCE=0
05-12-2019 14:13:52 CST mrwc ERROR - File Input Format Counters
05-12-2019 14:13:52 CST mrwc ERROR - Bytes Read=76
05-12-2019 14:13:52 CST mrwc ERROR - File Output Format Counters
05-12-2019 14:13:52 CST mrwc ERROR - Bytes Written=118
05-12-2019 14:13:52 CST mrwc INFO - Process completed successfully in 26 seconds.
05-12-2019 14:13:52 CST mrwc INFO - Finishing job mrwc at 1575526432377 with status SUCCEEDED
这时在hdfs上的azkaban下就有了output等等
再回到hdp-3上查看:
[root@hdp-3 ~]# hadoop fs -cat /azkaban/output/part-r-00000
1 1
10 1
15 1
18 1
2 1
20 1
23 1
25 1
3 1
4 1
5 1
6 1
7 1
9 1
blacke 1
blue 1
green 1
orange 1
red 1
white 1
yellow 1
[root@hdp-3 ~]#
可以看到已经根据空格切开!