一. 单机模式:Local (Standalone) Mode
直接使用hadoop命令执行jar包,以下是官方例子:
root@ubuntu:~/hadoop/output# cd $HADOOP_HOME
root@ubuntu:~/hadoop# mkdir input
root@ubuntu:~/hadoop# cp etc/hadoop/*.xml input
root@ubuntu:~/hadoop# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dsf[a-z.]+'
root@ubuntu:~/hadoop# cat output/*
二.伪分布式模式:Pseudo-Distributed Mode
1. ssh设置免密登录
root@ubuntu:~# ssh localhost
2. 如果提示输入密码,需要进行导入公钥
root@ubuntu:~# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
SHA256:9xUa1hH5XJJQYr3G5AU5rapYcYXNICK8hIgshfI/SWQ root@ubuntu
The key's randomart image is:
+---[DSA 1024]----+
|.+.. o. . . =O=B |
|=.. E o. . o.o%.+|
|o. o . . o=oO.|
| . . . ...o*.o|
| o . S .o.o. |
| + ..... |
| . o .. |
| . . |
| |
+----[SHA256]-----+
root@ubuntu:~# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized.keys
root@ubuntu:~# chmod 0600 ~/.ssh/authorized.keys
root@ubuntu:~# ssh localhost
root@localhost's password:
PS:按照官方文档,并不能无密登录,换成RSA就能成功了(表示疑惑,两者感觉并无不同啊??)
root@ubuntu:~# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:bc4qQR2NMHsGVL6NzNSJ5ycuUBiqXtS9jB1BQJGXxsM root@ubuntu
The key's randomart image is:
+---[RSA 2048]----+
| oXX++ |
| o.OE+.. |
| o ++Xo+ |
| o .@.X |
| . ..o S * . |
| . . .. = o |
| . .. + |
| . o |
| .. |
+----[SHA256]-----+
root@ubuntu:~# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
root@ubuntu:~# chmod 0600 ~/.ssh/authorized_keys
root@ubuntu:~# ssh localhost
Welcome to Ubuntu 16.04 LTS (GNU/Linux 4.4.0-21-generic x86_64)
* Documentation: https://help.ubuntu.com/
Last login: Mon Jun 27 16:57:53 2016 from 192.168.80.1
3. 格式化文件系统
root@ubuntu:~# hdfs namenode -format
PS:http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml根据文档配置说明,hadoop.tmp.dir 的默认值是/tmp/hadoop-${user.name},如果重启系统后,文件系统会被删除,所以避免这种情况可以更改此配置文件etc/hadoop/core-sites.xml,加入以下配置:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop-${user.name}</value>
</property>
</configuration>
4. 启动NameNode和DataNode守护进程
root@ubuntu:~# start-dfs.sh
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []
localhost: Error: JAVA_HOME is not set and could not be found.
localhost: Error: JAVA_HOME is not set and could not be found.
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Error: JAVA_HOME is not set and could not be found.
ERROR1:提示JAVA_HOME找不到,但是前文已经配置了。
解决问题步骤如下:
1) 先搜索错误提示经查证,发现问题出在hadoop-config.sh脚本。
root@ubuntu:~/hadoop# grep -R "JAVA_HOME is not set and could not be found" .
./libexec/hadoop-config.sh: echo "Error: JAVA_HOME is not set and could not be found." 1>&2
2)hadoop-config.sh中的$JAVA_HOME变量又是执行etc/hadoop/hadoop-env.sh脚本export的,将${JAVA_HOME}改成绝对路径。
# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/root/jdk
root@ubuntu:~/hadoop# start-dfs.sh
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []
localhost: starting namenode, logging to /root/hadoop/logs/hadoop-root-namenode-ubuntu.out
localhost: starting datanode, logging to /root/hadoop/logs/hadoop-root-datanode-ubuntu.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/hadoop/logs/hadoop-root-secondarynamenode-ubuntu.out
0.0.0.0: Exception in thread "main" java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority.
0.0.0.0: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:471)
0.0.0.0: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:461)
0.0.0.0: at org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress(NameNode.java:454)
0.0.0.0: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:229)
0.0.0.0: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNameNode.java:192)
0.0.0.0: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:671)
ERROR2:再次启动,发生文件系统URI无效
解决问题步骤如下:
root@ubuntu:~/hadoop# vi etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop-${user.name}</value>
</property>
<!-- 加入以下配置 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
5. 启动成功
root@ubuntu:~/hadoop# start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /root/hadoop/logs/hadoop-root-namenode-ubuntu.out
localhost: starting datanode, logging to /root/hadoop/logs/hadoop-root-datanode-ubuntu.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/hadoop/logs/hadoop-root-secondarynamenode-ubuntu.out
6. 通过web接口查看NameNode:http://localhost:50070/
7.创建HDFS目录
root@ubuntu:~/hadoop# hdfs dfs -mkdir /user
root@ubuntu:~/hadoop# hdfs dfs -mkdir /user/root
8.把文件put到dfs中
root@ubuntu:~/hadoop# hdfs dfs -put etc/hadoop input
root@ubuntu:~/hadoop# hdfs dfs -ls
Found 1 items
drwxr-xr-x - root supergroup 0 2016-06-28 09:44 input
9.用hadoop提供的examples运行mapreduce,做一个检查的搜索字符串
root@ubuntu:~/hadoop# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
10.查看运行结果
1)可以直接在dfs中查看
root@ubuntu:~/hadoop# hdfs dfs -cat output/*
6 dfs.audit.logger
4 dfs.class
3 dfs.server.namenode.
2 dfs.period
2 dfs.audit.log.maxfilesize
2 dfs.audit.log.maxbackupindex
1 dfsmetrics.log
1 dfsadmin
1 dfs.servers
1 dfs.file
2)也可以先从dfs中拿到本地目录,再查看
root@ubuntu:~/hadoop# hdfs dfs -get output output
root@ubuntu:~/hadoop# cat output/*
6 dfs.audit.logger
4 dfs.class
3 dfs.server.namenode.
2 dfs.period
2 dfs.audit.log.maxfilesize
2 dfs.audit.log.maxbackupindex
1 dfsmetrics.log
1 dfsadmin
1 dfs.servers
1 dfs.file
总结
严格按照文档操作,仍会出现一些意想不到的事情。类似上文中的第4点,启动NameNode和DataNode守护进程失败,遇到这类情况,如果有报错信息,可以按图索骥,一步一步检查修正。如果不能自己解决,网上也有许多资料可供参考和答疑的。
TOO ME,KEEP GOING,JUST DO IT!!!