需求:
1.基于hadoop jar 执行hadoop的job
2.参数也要可输入
3.shell脚本可供crontab调度
方式:
1.java解析输入的参数,并对参数进行规范定义
2.shell实现hadoop jar命令行执行,调度脚本用shell实现
3.crontab调度调度脚本
实现:
java解析输入参数:
/**
* <pre>
* 获取命令行参数,命令行job参数格式如下:
* --param1 val1 \
* --param2 val2 \
* </pre>
*
* @param args 命令行参数
* @return 返回map参数映射对
* @date 2013-11-13
*/
public Map<String, String> parseMRCommands(String[] args) {
Map<String, String> commands = new HashMap<String, String>();
String key = null;
for (String cmdStr : args) {
if (cmdStr.startsWith("--")) {
if (key != null) {
commands.put(key, "");
}
key = cmdStr.substring(2);
} else {
// add new command key:value
commands.put(key, cmdStr);
// clear key
key = null;
}
}
return commands;
}
输入的参数规范如下:
--param1 val1 \
shell执行脚本run.sh:
#! /bin/bash
hadoop jar ../lib/test-SNAPSHOT.jar com.test.TTask \
--input.path.key /user/input/texts \
--output.path.key /user/output/texts.out \
shell调度脚本cron-run.sh:
#!/bin/sh
#File:cron-run.sh
source /user/.bash_profile
cd $DEV_WORKING/mapred/bin
process_id=`jps -m | grep "TTask" | awk '{print $1}'`
process_id=${process_id:=0}
date
if [ $process_id -gt 0 ]
then
echo "job is running, pid = $process_id"
else
echo "pid is null, job runing now, start..."
nohup ./run.sh > run.log 2>&1 &
fi
然后就可以直接在crontab中对cron-run.sh做周期性调度