Hadoop 的运行模式有三种,本地模式、伪分布式模式、完全分布式模式
伪分布式模式:是在一台机器上执行Hadoop的MapReduce任务,该模式下hadoop的各种后台程序都是以单独的Java进程运行,互相独立。
环境:
CentOS release 5.11 (Final)
hadoop-2.5.0
jdk-8u102-linux-i586
下面将介绍伪分布模式的配置使用过程:
- 确认服务器上是否已经安装了ssh、rsync软件
[yh.zeng@namenode1 hadoop]$ rpm -qa | grep -i ssh [yh.zeng@namenode1 hadoop]$ rpm -qa | grep -i rsync
- 没有安装的话,可以使用以下命令进行安装
[yh.zeng@namenode1 hadoop]$ yum install -y ssh [yh.zeng@namenode1 hadoop]$ yum install -y rsync
- 安装配置JDK,相信大家都挺熟悉这个步骤,略过
- 配置SSH免密码登陆本机
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keys
[yh.zeng@namenode1 ~]$ ssh namenode1 Last login: Sat Aug 20 23:17:59 2016 from namenode1 [yh.zeng@namenode1 ~]$
- 解压hadoop-2.5.0,并创建软链接hadoop文件夹指向解压的目录,创建软链接是为了以后haddop版本升级的时候,重新创建软链接,而不需要再次修改系统环境变量!
[yh.zeng@namenode1 local]$ ll 总计 88 drwxr-xr-x 2 root root 4096 2011-05-11 bin drwxr-xr-x 2 root root 4096 2011-05-11 etc drwxr-xr-x 2 root root 4096 2011-05-11 games lrwxrwxrwx 1 yh.zeng yh.zeng 12 08-20 17:56 hadoop -> hadoop-2.5.0 drwxrwxr-x 9 yh.zeng yh.zeng 4096 08-20 17:38 hadoop-2.5.0 drwxr-xr-x 2 root root 4096 2011-05-11 include drwxr-xr-x 2 root root 4096 2011-05-11 lib drwxr-xr-x 2 root root 4096 2011-05-11 lib64 drwxr-xr-x 2 root root 4096 2011-05-11 libexec drwxr-xr-x 2 root root 4096 2011-05-11 sbin drwxr-xr-x 4 root root 4096 07-09 20:42 share drwxr-xr-x 2 root root 4096 2011-05-11 src [yh.zeng@namenode1 local]$ pwd /usr/local
- 修改 etc/hadoop/hadoop-env.sh 文件,修改该配置文件里面的JDK安装路径
# The java implementation to use. export JAVA_HOME=/usr/java/jdk1.8.0_102/
- 修改/usr/local/hadoop/etc/hadoop 目录下的core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml
/usr/local/hadoop/etc/hadoop/core-site.xml: <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://namenode1:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property> <property> <name>fs.trash.interval</name> <value>10080</value> <description> 文件存放到回收站的有效时间,单位是分钟,超过了这个有效时间,则会自动删除该文件! 默认值为0,表示删除的文件不会放到回收站,直接删除!! </description> </property> <property> <name>fs.trash.checkpoint.interval</name> <value>10080</value> <description> 回收站文件的检查间隔,应该是小于或者等于fs.trash.interval,即检查回收站的文件是否到了存放的 有效时间。默认值为0,表示等于检查间隔和fs.trash.interval的值。 </description> </property> </configuration>
/usr/local/hadoop/etc/hadoop/hdfs-site.xml: <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>heartbeat.recheckinterval</name> <value>10</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>namenode1:50090</value> </property> </configuration>
/usr/local/hadoop/etc/hadoop/yarn-site.xml:/usr/local/hadoop/etc/hadoop/mapred-site.xml: <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>namenode1:10020</value> <description>MapReduce JobHistory Server IPC host:port</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>namenode1:19888</value> <description>MapReduce JobHistory Server Web UI host:port</description> </property> </configuration>
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>namenode1</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> <description> 开启日志聚集功能,任务执行完之后,将日志文件自动上传到文件系统(如HDFS文件系统), 否则通过namenode1:8088页面查看日志文件的时候,会报错 "Aggregation is not enabled. Try the nodemanager at namenode1:54951" </description> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>302400</value> <description> 日志文件保存在文件系统(如HDFS文件系统)的最长时间,默认值是-1,即永久有效。 这里配置的值是:7天 = 3600 * 24 * 7 = 302400 </description> </property> </configuration>
- 接下来,启动hadoop的hdfs文件系统和yarn资源管理器
首次部署,应该格式化hdfs文件系统: [yh.zeng@namenode1 hadoop]$ bin/hadoop namenode -format 启动namenode和datanode: [yh.zeng@namenode1 hadoop]$ sbin/start-dfs.sh 启动yarn: [yh.zeng@namenode1 hadoop]$ sbin/start-yarn.sh 查看是否都启动成功: [yh.zeng@namenode1 hadoop]$ jps 4800 SecondaryNameNode 9523 NodeManager 4579 NameNode 4678 DataNode 9419 ResourceManager 9871 JobHistoryServer 9903 Jps 也可以分别通过hdfs和yarn的web端界面查看启动状况 hadfs的web端地址: http://localhost:50070/ yarn资源管理器的web端地址: http://localhost:8088/
- 以hadoop-mapreduce-examples-2.5.0.jar自带的统计单词个数的demo为例子,执行Map Reduce任务
创建word.txt 文件,随便写几个单词: [yh.zeng@namenode1 hadoop]$ mkdir wcinut [yh.zeng@namenode1 hadoop]$ cd wcinut/ [yh.zeng@namenode1 wcinut]$ touch word.txt [yh.zeng@namenode1 wcinut]$ vi word.txt abc java android c++ tomcat javascript c oralce ~ ~ 将新建的word.txt文件上传到hdfs文件系统上: [yh.zeng@namenode1 hadoop]$ bin/hdfs dfs -mkdir -p /user/testdir/mapreduce/wordcount/input [yh.zeng@namenode1 hadoop]$ bin/hdfs dfs -put wcinut/word.txt /user/testdir/mapreduce/wordcount/input [yh.zeng@namenode1 hadoop]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /user/testdir/mapreduce/wordcount/input/ /user/testdir/mapreduce/wordcount/output
- 查看上面demo执行的结果
[yh.zeng@namenode1 hadoop]$ bin/hdfs dfs -ls /user/testdir/mapreduce/wordcount/output 16/08/21 00:30:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items -rw-r--r-- 1 yh.zeng supergroup 0 2016-08-21 00:28 /user/testdir/mapreduce/wordcount/output/_SUCCESS -rw-r--r-- 1 yh.zeng supergroup 64 2016-08-21 00:28 /user/testdir/mapreduce/wordcount/output/part-r-00000 [yh.zeng@namenode1 hadoop]$ bin/hdfs dfs -cat /user/testdir/mapreduce/wordcount/output/part-r-00000 16/08/21 00:31:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable abc 1 android 1 c 1 c++ 1 java 1 javascript 1 oralce 1 tomcat 1
- 停掉namenode、datanode、yarn的顺序如下
[yh.zeng@namenode1 hadoop]$ sbin/stop-dfs.sh [yh.zeng@namenode1 hadoop]$ sbin/stop-yarn.sh