1.下载hadoop安装包
2.解压
tar zxvf hadoop-0.20.203.0rc1.tar.gz
3.设置环境变量
将解压出来的hadoop目录export到环境变量中
4.在hadoop环境变量中设置JAVA_HOME
vi hadoop-env.sh中写入
export JAVA_HOME=/home/ymkyve/ytool/jdk1.6.0_24
5.安装ssh 及设置自动登录
$ sudo apt-get install ssh
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ ssh localhost 不需要输密码就可以了
单机模式就可以了,在hadoop安装目录下执行这些命令看看效果就可以了
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*
在单机模式下只要修改hadoop脚本就可以debug了
elif [ "$COMMAND" = "jar" ] ; then
CLASS=org.apache.hadoop.util.RunJar
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
HADOOP_OPTS="$HADOOP_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=y"
CLASS=org.apache.hadoop.util.RunJar
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
HADOOP_OPTS="$HADOOP_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=y"
加上这段
input/test.txt
aa b a b d c
执行go.sh脚本即可
rm -rf output
hadoop jar hadoop-test.jar input output
hadoop jar hadoop-test.jar input output
查看output的结果
#vi part-r-00000
a 1
aa 1
b 2
c 1
d 1