本文将分析hadoop的启动脚本,针对的目标版本是1.2.1.
启动Hadoop时,经常以bin目录下的start-all.sh作为启动入口.本文将从start-all.sh入手,分析Hadoop的启动过程.
-
start-all.sh
start-all.sh的内容如下:
# Start all hadoop daemons. Run this on master node.
bin=`dirname "$0"`
bin=`cd "$bin"; pwd`
if [ -e "$bin/../libexec/hadoop-config.sh" ]; then
. "$bin"/../libexec/hadoop-config.sh
else
. "$bin/hadoop-config.sh"
fi
# start dfs daemons
"$bin"/start-dfs.sh --config $HADOOP_CONF_DIR
# start mapred daemons
"$bin"/start-mapred.sh --config $HADOOP_CONF_DIR
该文件的内容较为简单,主要用于启动hdfs的守护线程和mapreduce的守护线程.
hdfs的守护线程通过调用start-dfs.sh来启动,mapreduce的守护线程通过调用start-mapred.sh来启动.
-
start-dfs.sh
hadoop通过该文件来启动HDFS的守护线程.该文件有两个可选的参数:-upgrad和-rollback分别用于hadoop的升级和回滚.如果这两个参数都没有配置,则正常启动HDFS的守护线程.
该文件负责启动的内容包括:启动NameNode的守护线程、启动所有DataNode的守护线程以及启动NameNode的备份节点SecondaryNameNode的守护线程.这三组线程的启动脚本如下:
# start dfs daemons
# start namenode after datanodes, to minimize time namenode is up w/o data
# note: datanodes will log connection errors until namenode starts
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode $nameStartOpt
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode $dataStartOpt
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters start secondarynamenode
start-dfs.sh通过调用hadoop-daemon.sh来启动namenode守护线程,通过调用hadoop-daemons.sh来启动所有的DataNode和SecondaryNameNode守护线程.
- hadoop-daemon.sh
该文件通过调用bin目录下的hadoop文件来启动或停止指定节点上的守护线程.其核心代码如下:
start(启动守护线程的核心脚本):
(start) mkdir -p "$HADOOP_PID_DIR" if [ -f $pid ]; then if kill -0 `cat $pid` > /dev/null 2>&1; then echo $command running as process `cat $pid`. Stop it first. exit 1 fi fi if [ "$HADOOP_MASTER" != "" ]; then echo rsync from $HADOOP_MASTER rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $HADOOP_MASTER/ "$HADOOP_HOME" fi hadoop_rotate_log $log echo starting $command, logging to $log cd "$HADOOP_PREFIX" nohup nice -n $HADOOP_NICENESS "$HADOOP_PREFIX"/bin/hadoop --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null & echo $! > $pid sleep 1 head "$log" # capture the ulimit output if [ "true" = "$starting_secure_dn" ]; then echo "ulimit -a for secure datanode user $HADOOP_SECURE_DN_USER" >> $log # capture the ulimit info for the appropriate user su --shell=/bin/bash $HADOOP_SECURE_DN_USER -c 'ulimit -a' >> $log 2>&1 else echo "ulimit -a for user $USER" >> $log ulimit -a >> $log 2>&1 fi ;;
其中通过调用bin目录下的hadoop文件来执行制定的启动命令.nohup nice -n $HADOOP_NICENESS "$HADOOP_PREFIX"/bin/hadoop --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null & echo $! > $pid
stop(停止守护线程的核心脚本):
(stop) if [ -f $pid ]; then TARGET_PID=`cat $pid` if kill -0 $TARGET_PID > /dev/null 2>&1; then echo stopping $command kill $TARGET_PID sleep $HADOOP_STOP_TIMEOUT if kill -0 $TARGET_PID > /dev/null 2>&1; then echo "$command did not stop gracefully after $HADOOP_STOP_TIMEOUT seconds: killing with kill -9" kill -9 $TARGET_PID fi else echo no $command to stop fia else echo no $command to stop fi ;; (*) echo $usage exit 1 ;;
停止守护线程主要通过杀死线程的方式来实现.
- hadoop-daemons.sh
该文件主要用来启动所有DataNode节点的守护线程和NameNode备份节点SecondaryNameNode的守护线程.该文件通过调用slaves.sh和hadoop-daemon.sh来实现其功能.slaves.sh的主要功能是通过配置的或制定的slave节点地址,使用SSH登录到远程节点上,然后执行hadoop-daemon.sh来启动制定节点的守护线程.
注:DataNode的Slave节点启动过程容易理解,但SecondaryNameNode的启动仍不太理解.只知道hadoop-deamon.sh中有用于文件同步的代码,不知道是否与SecondaryNameNode有关.
从上面的描述可以看出,所有守护线程的启动和停止都是有hadoop-daemon.sh来触发执行的.而hadoop-daemon.sh又是调用bin目录下的hadoop文件来执行指定操作的,因此所有守护线程启动的具体工作将在bin目录下的hadoop文件中进行完成(本文稍后讲解).
-
start-mapred.sh
该文件负责启动MapReduce的JobTracker守护线程和各个节点上TaskTracker守护线程.该文件的核心代码如下:
# start mapred daemons
# start jobtracker first to minimize connection errors at startup
"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start tasktracker
该文件仍是调用hadoop-daemon.sh和hadoop-daemons.sh来完成工作的,其过程与NameNode与DataNode的启动过程类似.
-
hadoop文件
该文件放置在hadoop安装目录的bin目录下.所有守护线程的启动操作都将通过该文件来负责执行,该文件还提供了各个服务对外访问的接口.用户可以通过该文件,并传入不同的命令和参数来使用Hadoop提供的不同功能.该文件的开头部分,输出了该文件的功能范围,脚本代码如下:
print_usage()
{
echo "Usage: hadoop [--config confdir] COMMAND"
echo "where COMMAND is one of:"
echo " namenode -format format the DFS filesystem"
echo " secondarynamenode run the DFS secondary namenode"
echo " namenode run the DFS namenode"
echo " datanode run a DFS datanode"
echo " dfsadmin run a DFS admin client"
echo " mradmin run a Map-Reduce admin client"
echo " fsck run a DFS filesystem checking utility"
echo " fs run a generic filesystem user client"
echo " balancer run a cluster balancing utility"
echo " oiv apply the offline fsimage viewer to an fsimage"
echo " fetchdt fetch a delegation token from the NameNode"
echo " jobtracker run the MapReduce job Tracker node"
echo " pipes run a Pipes job"
echo " tasktracker run a MapReduce task Tracker node"
echo " historyserver run job history servers as a standalone daemon"
echo " job manipulate MapReduce jobs"
echo " queue get information regarding JobQueues"
echo " version print the version"
echo " jar <jar> run a jar file"
echo " distcp <srcurl> <desturl> copy file or directories recursively"
echo " distcp2 <srcurl> <desturl> DistCp version 2"
echo " archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive"
echo " classpath prints the class path needed to get the"
echo " Hadoop jar and the required libraries"
echo " daemonlog get/set the log level for each daemon"
echo " or"
echo " CLASSNAME run the class named CLASSNAME"
echo "Most commands print help when invoked w/o parameters."
}
从上述代码中可以看出,hadoop文件可以用来运行namenode、datanode、secondarynamenode、jobtracker、tasktracker、fs、job等等命令.这些命令上又可以传入不同的参数来执行不同的操作.针对这些命令,hadoop文件分别提供了不同的处理脚本.下文将namenode命令为实例来进行说明:
- namenode
namenode的处理脚本如下:
elif [ "$COMMAND" = "namenode" ] ; then
CLASS='org.apache.hadoop.hdfs.server.namenode.NameNode'
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_OPTS"
即,如果用户传入的命令是namenode,则启动org.apache.hadoop.hdfs.server.namenode.NameNode类来处理用户的请求,用户请求的参数可以通过HADOOP_OPTS来制定.由于参数可以由用户自己指定,因此用户就可以进行灵活的控制了.此处有个常用的操作,即远程调试,如果用户想远程调试hadoop源代码,则在HADOOP_OPTS追加远程调试参数即可:追加后的参数大致形式如下:
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8000"
这样就可以远程调试Hadoop了,具体如何调试,我将会在以后的文章中进行说明,读者也可以自行在网上搜索资料.