-
1.准备MR执行的数据
MR的程序可以是自己写的,也可以是hadoop工程自带的。这里选用hadoop工程自带的MR程序来运行wordcount的示例
准备以下数据上传到HDFS的/oozie/input路径下去hdfs dfs -mkdir -p /oozie/input vim wordcount.txt
hello world hadoop spark hive hadoop
hdfs dfs -put wordcount.txt /oozie/input
将数据上传到hdfs对应目录 -
2.执行官方测试案例
yarn jar /export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.0.jar wordcount /oozie/input/ /oozie/output
-
3.准备我们调度的资源
将需要调度的资源都准备好放到一个文件夹下面去,包括jar包、ob.properties以及workflow.xml
拷贝MR的任务模板cd /export/servers/oozie-4.1.0-cdh5.14.0 cp -ra examples/apps/map-reduce/ oozie_works/
删掉MR任务模板lib目录下自带的jar包
cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib rm -rf oozie-examples-4.1.0-cdh5.14.0.jar
拷贝jar包到对应目录
从上一步的删除当中,可以看到需要调度的jar包存放在了/export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib
目录下,所以把需要调度的jar包也放到这个路径下即可
cp /export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.0.jar /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib/
-
4.修改配置文件
修改job.properties
cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce vim job.properties
nameNode=hdfs://node01:8020 jobTracker=node01:8032 queueName=default examplesRoot=oozie_works oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/map-reduce/workflow.xml outputDir=/oozie/output inputdir=/oozie/input
修改workflow.xml
cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce vim workflow.xml
<?xml version="1.0" encoding="UTF-8"?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <workflow-app xmlns="uri:oozie:workflow:0.5" name="map-reduce-wf"> <start to="mr-node"/> <action name="mr-node"> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/${outputDir}"/> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <!--把这些原有的配置注释掉--> <!-- <property> <name>mapred.mapper.class</name> <value>org.apache.oozie.example.SampleMapper</value> </property> <property> <name>mapred.reducer.class</name> <value>org.apache.oozie.example.SampleReducer</value> </property> <property> <name>mapred.map.tasks</name> <value>1</value> </property> <property> <name>mapred.input.dir</name> <value>/user/${wf:user()}/${examplesRoot}/input-data/text</value> </property> <property> <name>mapred.output.dir</name> <value>/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}</value> </property> --> <!-- 开启使用新的API来进行配置 --> <property> <name>mapred.mapper.new-api</name> <value>true</value> </property> <property> <name>mapred.reducer.new-api</name> <value>true</value> </property> <!-- 指定MR的输出key的类型 --> <property> <name>mapreduce.job.output.key.class</name> <value>org.apache.hadoop.io.Text</value> </property> <!-- 指定MR的输出的value的类型--> <property> <name>mapreduce.job.output.value.class</name> <value>org.apache.hadoop.io.IntWritable</value> </property> <!-- 指定输入路径 --> <property> <name>mapred.input.dir</name> <value>${nameNode}/${inputdir}</value> </property> <!-- 指定输出路径 --> <property> <name>mapred.output.dir</name> <value>${nameNode}/${outputDir}</value> </property> <!-- 指定执行的map类 --> <property> <name>mapreduce.job.map.class</name> <value>org.apache.hadoop.examples.WordCount$TokenizerMapper</value> </property> <!-- 指定执行的reduce类 --> <property> <name>mapreduce.job.reduce.class</name> <value>org.apache.hadoop.examples.WordCount$IntSumReducer</value> </property> <!-- 配置map task的个数 --> <property> <name>mapred.map.tasks</name> <value>1</value> </property> </configuration> </map-reduce> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
-
5.上传调度任务到hdfs对应目录
cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works hdfs dfs -put map-reduce/ /user/root/oozie_works/
-
6.执行调度任务
执行调度任务,然后通过oozie的11000端口进行查看任务结果
cd /export/servers/oozie-4.1.0-cdh5.14.0 bin/oozie job -oozie http://node03:11000/oozie -config oozie_works/map-reduce/job.properties -run
【Hadoop离线基础总结】oozie调度MapReduce任务
最新推荐文章于 2021-02-20 16:52:18 发布