VMWare9下基于Ubuntu12.10搭建Hadoop-1.2.1集群
下一篇:VMWare9下基于Ubuntu12.10搭建Hadoop-1.2.1集群—整合Zookeeper和Hbase
最近在学习Hadoop,把hadoop集群环境搭建的过程记录一下,方便查询,方案中有好多细节的东西,可能会比较啰嗦,对于新手来说或许更有帮助,闲话不多说,进入正题。
搭建5个节点的Hadoop集群环境
1. 环境说明
使用VMWare创建5台Ubuntu虚拟机,环境详细信息如下:
虚拟机 | 操作系统 | JDK | Hadoop |
VMWare Workstation 9 | ubuntu-12.10-server-amd64 | jdk-7u51-linux-x64 | hadoop-1.2.1 |
主机名 | IP地址 | 虚拟机名 | 节点内容 |
master | 192.168.1.30 | Ubuntu64-Master | namenode, Jobtracker |
secondary | 192.168.1.39 | Ubuntu64-Secondary | secondarynamenode |
slaver1 | 192.168.1.31 | Ubuntu64-slaver1 | datanode, tasktracker |
slaver2 | 192.168.1.32 | Ubuntu64-slaver2 | datanode, tasktracker |
slaver3 | 192.168.1.33 | Ubuntu64-slaver3 | datanode, tasktracker |
2. 搭建虚拟机系统
下载Ubuntu server版64位系统,iso版本,方便在vmware上安装。
每台虚拟机配置1个双核cpu,1G RAM,20G硬盘,设置ShareFolder为共享文件夹,方便Windows主机向虚拟机传送文件包。
Ubuntu系统easy方式安装,创建hadoop用户,后续hadoop,zookeeper,hbase都用hadoop用户来部署。
可以先安装一台主机,以master为模板,配置好了之后,用vmvare的克隆功能复制出其它主机,然后调整下ip和主机名。
创建用户
先创建用户组
sudo addgroup hadoop |
然后创建用户
sudo adduser -ingroup hadoop hadoop |
更新安装源
先备份系统自带源内容(hadoop用户登录,所以要sudo)。
sudo cp /etc/apt/sources.list /etc/apt/sources.list.backup |
修改源内容
sudo vi /etc/apt/sources.list |
从网上搜索到的源内容,复制到vi中
##Ubuntu 官方更新服务器(欧洲,此为官方源,国内较慢,但无同步延迟问题,电信、移动/铁通、联通等公网用户可以使用): deb http://archive.ubuntu.com/ubuntu/ quantal main restricted universe multiverse deb http://archive.ubuntu.com/ubuntu/ quantal-security main restricted universe multiverse deb http://archive.ubuntu.com/ubuntu/ quantal-updates main restricted universe multiverse deb http://archive.ubuntu.com/ubuntu/ quantal-proposed main restricted universe multiverse deb http://archive.ubuntu.com/ubuntu/ quantal-backports main restricted universe multiverse deb-src http://archive.ubuntu.com/ubuntu/ quantal main restricted universe multiverse deb-src http://archive.ubuntu.com/ubuntu/ quantal-security main restricted universe multiverse deb-src http://archive.ubuntu.com/ubuntu/ quantal-updates main restricted universe multiverse deb-src http://archive.ubuntu.com/ubuntu/ quantal-proposed main restricted universe multiverse deb-src http://archive.ubuntu.com/ubuntu/ quantal-backports main restricted universe multiverse
##Ubuntu官方提供的其他软件(第三方闭源软件等): deb http://archive.canonical.com/ubuntu/ quantal partner deb http://extras.ubuntu.com/ubuntu/ quantal main
##骨头兄亲自搭建并维护的 Ubuntu 源(该源位于浙江杭州百兆共享宽带的电信机房),包含 Deepin 等镜像: deb http://ubuntu.srt.cn/ubuntu/ quantal main restricted universe multiverse deb http://ubuntu.srt.cn/ubuntu/ quantal-security main restricted universe multiverse deb http://ubuntu.srt.cn/ubuntu/ quantal-updates main restricted universe multiverse deb http://ubuntu.srt.cn/ubuntu/ quantal-proposed main restricted universe multiverse deb http://ubuntu.srt.cn/ubuntu/ quantal-backports main restricted universe multiverse deb-src http://ubuntu.srt.cn/ubuntu/ quantal main restricted universe multiverse deb-src http://ubuntu.srt.cn/ubuntu/ quantal-security main restricted universe multiverse deb-src http://ubuntu.srt.cn/ubuntu/ quantal-updates main restricted universe multiverse deb-src http://ubuntu.srt.cn/ubuntu/ quantal-proposed main restricted universe multiverse deb-src http://ubuntu.srt.cn/ubuntu/ quantal-backports main restricted universe multiverse
##搜狐更新服务器(山东联通千兆接入,官方中国大陆地区镜像跳转至此) ,包含其他开源镜像: deb http://mirrors.sohu.com/ubuntu/ quantal main restricted universe multiverse deb http://mirrors.sohu.com/ubuntu/ quantal-security main restricted universe multiverse deb http://mirrors.sohu.com/ubuntu/ quantal-updates main restricted universe multiverse deb http://mirrors.sohu.com/ubuntu/ quantal-proposed main restricted universe multiverse deb http://mirrors.sohu.com/ubuntu/ quantal-backports main restricted universe multiverse deb-src http://mirrors.sohu.com/ubuntu/ quantal main restricted universe multiverse deb-src http://mirrors.sohu.com/ubuntu/ quantal-security main restricted universe multiverse deb-src http://mirrors.sohu.com/ubuntu/ quantal-updates main restricted universe multiverse deb-src http://mirrors.sohu.com/ubuntu/ quantal-proposed main restricted universe multiverse deb-src http://mirrors.sohu.com/ubuntu/ quantal-backports main restricted universe multiverse |
执行更新源命令,这样系统中的安装源才会被刷新。
sudo apt-get update |
安装vim
还是用不习惯vi,安装vim替代vi。
sudo apt-get install vim |
配置ip
ubuntu下修改ip地址,直接修改/etc/network/interfaces文件即可。
sudo vim /etc/network/interfaces |
以master主机为例,修改为如下配置
# The primary network interface auto eth0 iface eth0 inet static address 192.168.1.30 netmask 255.255.255.0 network 192.168.1.0 broadcast 192.168.1.255 gateway 192.168.1.1 # dns-* options are implemented by the resolvconf package, if installed dns-nameservers 8.8.8.8 |
配置主机名
ubuntu下主机名文件为/etc/hostname,还有/etc/hosts用来配置主机名ip地址转换关系。
先配置主机名
sudo vim /etc/hostname |
配置为
master |
然后配置所有主机名ip地址转换
sudo vim /etc/hosts |
配置为(一次把所有服务器主机都配置上,一劳永逸)
127.0.0.1 localhost 192.168.1.30 master 192.168.1.31 slaver1 192.168.1.32 slaver2 192.168.1.33 slaver3 192.168.1.39 secondary |
hosts文件中的配置参数格式为
ip地址 主机名 别名(可以有0个或n个,空格分开)
克隆系统副本
把安装配置好的ubuntu克隆出多个副本,构建出5台ubuntu小集群,然后分别修改ip、修改主机名。
3. 安装配置SSH
安装SSH
采用apt-get方式安装,方便省事。
sudo apt-get install openssh-server |
用命令查看ssh服务是否启动
ps –ef|grep ssh |
有如下信息就是启动了
hadoop 2147 2105 0 13:11 ? 00:00:00 /usr/bin/ssh-agent /usr/bin/dbus-launch --exit-with-session gnome-session --session=ubuntu root 7226 1 0 23:31 ? 00:00:00 /usr/sbin/sshd -D hadoop 7287 6436 0 23:33 pts/0 00:00:00 grep --color=auto ssh |
ssh分为client和server,client用来ssh登录其它服务器,server用来提供ssh服务,供用户ssh远程登录。ubuntu默认安装了ssh client,所以要安装sshserver。
生成RSA密钥对
在hadoop用户下,用ssh的命令生成密钥对。
ssh-keygen –t rsa |
期间会询问是否为密钥设置密码,空密码即可,没有错误的话,在hadoop的.ssh目录下会生成密钥对文件(id_rsa和id_rsa.pub文件),id_rsa文件为私钥,服务器自己保存,防止外泄,id_rsa.pub文件为公钥,分发给其它需要免密码访问的服务器。
注:ssh和-keygen之间不能有空格,ssh-keygen –t rsa –P “” 命令可以免密钥密码。 |
进入.ssh目录,将公钥追加到授权认证文件(authorized_keys)中,authorized_keys用来存储所有服务器的公钥信息。
cat id_rsa.pub >> authorized_keys |
authorized_keys文件中,公钥以ssh-rsa开头,用户名@主机名结尾,多个服务器的公钥顺序保存,示例如下。
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDs5A9sjk+44DtptGw4fXm5n0qbpnSnFsqRQJnbyD4DGMG7AOpfrEZrMmRiNJA8GZUIcrN71pHEgQimoQGD5CWyVgi1ctWFrULOnGksgixJj167m+FPdpcCFJwfAS34bD6DoVXJgyjWIDT5UFz+RnElNC14s8F0f/w44EYM49y2dmP8gGmzDQ0jfIgPSknUSGoL7fSFJ7PcnRrqWjQ7iq3B0gwyfCvWnq7OmzO8VKabUnzGYST/lXCaSBC5WD2Hvqep8C9+dZRukaa00g2GZVH3UqWO4ExSTefyUMjsal41YVARMGLEfyZzvcFQ8LR0MWhx2WMSkYp6Z6ARbdHZB4MN hadoop@master ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2Hb6mCi6sd6IczIn/pBbj8L9PMS1ac0tlalex/vlSRj2E6kzUrw/urEUVeO76zFcZgjUgKvoZsNAHGrr1Bfw8FiiDcxPtlIREl2L9Qg8Vd0ozgE22bpuxBTn1Yed/bbJ/VxGJsYbOyRB/mBCvEI4ECy/EEPf5CRMDgiTL9XP86MNJ/kgG3odR6hhSE3Ik/NMARTZySXE90cFB0ELr/Io4SaINy7b7m6ssaP16bO8aPbOmsyY2W2AT/+O726Py6tcxwhe2d9y2tnJiELfrMLUPCYGEx0Z/SvEqWhEvvoGn8qnpPJCGg6AxYaXy8jzSqWNZwP3EcFqmVrg9I5v8mvDd hadoop@slaver1 |
分发公钥
服务器把自己的公钥内容分发给其它服务器,就是为了能免密码登录其它服务器,把所有服务器的公钥集中到一台服务器上,然后集中分发给其它服务器,这样处理,5台服务器可以随意互相免密码访问。
分发采用scp命令,scp需要双方服务器都启动ssh服务。scp初次访问需要输入密码。
除master服务器执行如下命令,复制公钥。
cd .ssh scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.slaver1 scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.slaver2 scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.slaver3 scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.secondary |
master服务器执行如下命令,集中公钥。
cd .ssh cat id_rsa.pub.slaver1 >> authorized_keys cat id_rsa.pub.slaver2 >> authorized_keys cat id_rsa.pub.slaver3 >> authorized_keys cat id_rsa.pub.secondary >> authorized_keys |
master服务器执行如下命令,分发公钥。
scp authorized_keys hadoop @slaver1:/home/hadoop/.ssh/authorized_keys scp authorized_keys hadoop @slaver2:/home/hadoop/.ssh/authorized_keys scp authorized_keys hadoop @slaver3:/home/hadoop/.ssh/authorized_keys scp authorized_keys hadoop @secondary:/home/hadoop/.ssh/authorized_keys |
测试免密码访问,输入
ssh slaver1 |
4. 安装配置JDK
部署JDK
解jdk-7u51-linux-x64.tar.gz包到/usr/lib/jdk1.7.0_51
tar –zxvf jdk-7u51-linux-x64.tar.gz –C /usr/lib/ |
配置环境变量
把jdk设置到全局环境变量中
sudo vim /etc/profile |
在最下面添加如下内容
export JAVA_HOME=/usr/lib/jdk1.7.0_51 export JRE_HOME=/usr/lib/jdk1.7.0_51/jre export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH |
注:linux下环境变量分隔符为半角冒号’:’,windows下为半角分号’;’,CLASSPATH中必须有’.’ |
通过如下命令来刷新环境变量
source /etc/profile |
分发JDK
通过scp命令分发安装好的jdk,目录分发需要加-r参数
scp -r /usr/lib/jdk1.7.0_51 hadoop@slaver1: /usr/lib/ |
分发环境变量
/etc/profile为root用户所有,所以scp要加sudo,而且需要传输到slaver1的root用户下。
sudo scp /etc/profile root@slaver1:/etc/profile |
5. 安装配置Hadoop
部署Hadoop
解hadoop-1.2.1.tar.gz包到/home/hadoop/hadoop-1.2.1
tar –zxvf hadoop-1.2.1.tar.gz –C /home/hadoop/ |
配置环境变量
把hadoop设置到全局环境变量中
sudo vim /etc/profile |
在最下面添加如下内容
export HADOOP_HOME=/home/hadoop/hadoop-1.2.1 export PATH=$PATH:$HADOOP_HOME/bin |
刷新环境变量
source /etc/profile |
conf/hadoop-env.sh
export JAVA_HOME=/usr/lib/jdk1.7.0_51 export HADOOP_TASKTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_TASKTRACKER_OPTS" export HADOOP_LOG_DIR=/home/hadoop/hadoop_home/logs export HADOOP_MASTER=master:/home/$USER/hadoop-1.2.1 export HADOOP_SLAVE_SLEEP=0.1 |
export HADOOP_MASTER设置hadoop应用rsync同步配置的主机目录,hadoop启动,就会从主机把配置同步到从机。
export HADOOP_SLAVE_SLEEP=0.1设置hadoop从机同步配置请求的休眠时间(秒),避免多节点同时请求同步对主机造成负担过重。
conf/core-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hadoop_home/tmp</value> </property> <property> <name>fs.trash.interval</name> <value>10080</value> <description>Number of minutes between trash checkpoints.If zero, the trash feature is disabled.</description> </property> <property> <name>fs.checkpoint.period</name> <value>600</value> <description>The number of seconds between two periodic checkpoints.</description> </property> <property> <name>fs.checkpoint.size</name> <value>67108864</value> <description>The size of the current edit log (in bytes) that triggers a periodic checkpoint even if the fs.checkpoint.period hasn't expired.</description> </property> </configuration> |
conf/hdfs-site.xml
<configuration> <property> <name>dfs.name.dir</name> <value>/home/hadoop/hadoop_home/name1,/home/hadoop/hadoop_home/name2</value> <description> </description> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop/hadoop_home/data1,/home/hadoop/hadoop_home/data2</value> <description> </description> </property> <property> <name>fs.checkpoint.dir</name> <value>/home/hadoop/hadoop_home/namesecondary1,/home/hadoop/hadoop_home/namesecondary2</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.http.address</name> <value>master:50070</value> </property> <property> <name>dfs.https.address</name> <value>master:50470</value> </property> <property> <name>dfs.secondary.http.address</name> <value>secondary:50090</value> </property> <property> <name>dfs.datanode.address</name> <value>0.0.0.0:50010</value> </property> <property> <name>dfs.datanode.ipc.address</name> <value>0.0.0.0:50020</value> </property> <property> <name>dfs.datanode.http.address</name> <value>0.0.0.0:50075</value> </property> <property> <name>dfs.datanode.https.address</name> <value>0.0.0.0:50475</value> </property> </configuration> |
conf/mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>master:9001</value> </property> <property> <name>mapred.local.dir</name> <value>/home/hadoop/hadoop_home/local</value> </property> <property> <name>mapred.system.dir</name> <value>/home/hadoop/hadoop_home/system</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>5</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>5</value> </property> <property> <name>mapred.job.tracker.http.address</name> <value>0.0.0.0:50030</value> </property> <property> <name>mapred.task.tracker.http.address</name> <value>0.0.0.0:50060</value> </property> </configuration> |
conf/masters
secondary |
conf/masters配置secondarynamenode的主机名,本方案中secondarynamenode有单独的服务器,与namenode无关。
conf/slaves
slaver1 slaver2 slaver3 |
分发hadoop副本
scp -r /home/hadoop/hadoop-1.2.1 hadoop@slaver1: /home/hadoop/ |
分发环境变量
/etc/profile为root用户所有,所以scp要加sudo,而且需要传输到slaver1的root用户下。
sudo scp /etc/profile root@slaver1:/etc/profile |
6. 启动Hadoop测试
启动hadoop集群
hadoop启动命令如下
命令 | 作用 |
start-all.sh | 启动hdfs和mapreduce守护进程,包括namenode,secondarynamenode,datanode,jobtracker,tasktracker |
stop-all.sh | 停止hdfs和mapreduce守护进程,包括namenode,secondarynamenode,datanode,jobtracker,tasktracker |
start-dfs.sh | 启动hdfs守护进程,包括namenode,secondarynamenode,datanode |
stop-dfs.sh | 停止hdfs守护进程,包括namenode,secondarynamenode,datanode |
start-mapred.sh | 启动mapreduce守护进程,包括jobtracker,tasktracker |
stop-mapred.sh | 停止mapreduce守护进程,包括jobtracker,tasktracker |
hadoop-daemons.sh start namenode | 单独启动namenode守护进程 |
hadoop-daemons.sh stop namenode | 单独停止namenode守护进程 |
hadoop-daemons.sh start datanode | 单独启动datanode守护进程 |
hadoop-daemons.sh stop datanode | 单独停止datanode守护进程 |
hadoop-daemons.sh start secondarynamenode | 单独启动secondarynamenode守护进程 |
hadoop-daemons.sh stop secondarynamenode | 单独停止secondarynamenode守护进程 |
hadoop-daemons.sh start jobtracker | 单独启动jobtracker守护进程 |
hadoop-daemons.sh stop jobtracker | 单独停止jobtracker守护进程 |
hadoop-daemons.sh start tasktracker | 单独启动tasktracker守护进程 |
hadoop-daemons.sh stop tasktracker | 单独停止tasktracker守护进程 |
启动hadoop集群
start-all.sh |
在用stop-all.sh脚本来停止hadoop的时候,查看日志,发现datanode中总会有错误出现,如下
2014-06-10 15:52:20,216 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to master/192.168.1.30:9000 failed on local exception: java.io.EOFExcept ion at org.apache.hadoop.ipc.Client.wrapException(Client.java:1150) at org.apache.hadoop.ipc.Client.call(Client.java:1118) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at com.sun.proxy.$Proxy5.sendHeartbeat(Unknown Source) at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:1031) at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1588) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:845) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:790) |
分析原因,发现datanode停止时间在namenode之后,导致与namenode连接失败,出现上面的异常,研究一下停止脚本,发现在stop-dfs.sh中停止顺序有些不太妥当(个人认为),先停止namenode,后停止datanode,我认为可以调整一下停止顺序,让namenode最后停止,这样应该能避免出现连接异常警告。
调整之后内容如下
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR stop datanode "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters stop secondarynamenode "$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR stop namenode |
经过测试,就没有发现datanode有上面的异常出现了,不知道这么调整会不会对hadoop有影响,欢迎大家指正。
通过http://master:50070查看hdfs的监控页面。
通过http://master:50030查看mapreduce的监控页面。
用命令查看hdfs的情况
hadoop dfsadmin –report hadoop fsck / |
测试hadoop集群
通过运行hadoop自带的wordcount程序来测试hadoop集群是否运行正常。
首先创建两个input数据文件。
echo “Hello World Bye World” > text1.txt echo “Hello Hadoop Goodbye Hadoop” > text2.txt |
上传数据文件到hdfs中
hadoop fs –put text1.txt hdfs://master:9000/user/hadoop/input/text1.txt hadoop fs –put text2.txt hdfs://master:9000/user/hadoop/input/text2.txt |
运行wordcount程序
hadoop jar hadoop-examples-1.2.1.jar wordcount input/file*.txt output-0 |
运行日志如下
14/06/12 01:55:21 INFO input.FileInputFormat: Total input paths to process : 2 14/06/12 01:55:21 INFO util.NativeCodeLoader: Loaded the native-hadoop library 14/06/12 01:55:21 WARN snappy.LoadSnappy: Snappy native library not loaded 14/06/12 01:55:21 INFO mapred.JobClient: Running job: job_201406111818_0001 14/06/12 01:55:22 INFO mapred.JobClient: map 0% reduce 0% 14/06/12 01:55:28 INFO mapred.JobClient: map 50% reduce 0% 14/06/12 01:55:30 INFO mapred.JobClient: map 100% reduce 0% 14/06/12 01:55:36 INFO mapred.JobClient: map 100% reduce 33% 14/06/12 01:55:37 INFO mapred.JobClient: map 100% reduce 100% 14/06/12 01:55:38 INFO mapred.JobClient: Job complete: job_201406111818_0001 14/06/12 01:55:38 INFO mapred.JobClient: Counters: 29 14/06/12 01:55:38 INFO mapred.JobClient: Job Counters 14/06/12 01:55:38 INFO mapred.JobClient: Launched reduce tasks=1 14/06/12 01:55:38 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=8281 14/06/12 01:55:38 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/06/12 01:55:38 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 14/06/12 01:55:38 INFO mapred.JobClient: Launched map tasks=2 14/06/12 01:55:38 INFO mapred.JobClient: Data-local map tasks=2 14/06/12 01:55:38 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8860 14/06/12 01:55:38 INFO mapred.JobClient: File Output Format Counters 14/06/12 01:55:38 INFO mapred.JobClient: Bytes Written=41 14/06/12 01:55:38 INFO mapred.JobClient: FileSystemCounters 14/06/12 01:55:38 INFO mapred.JobClient: FILE_BYTES_READ=79 14/06/12 01:55:38 INFO mapred.JobClient: HDFS_BYTES_READ=272 14/06/12 01:55:38 INFO mapred.JobClient: FILE_BYTES_WRITTEN=166999 14/06/12 01:55:38 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=41 14/06/12 01:55:38 INFO mapred.JobClient: File Input Format Counters 14/06/12 01:55:38 INFO mapred.JobClient: Bytes Read=50 14/06/12 01:55:38 INFO mapred.JobClient: Map-Reduce Framework 14/06/12 01:55:38 INFO mapred.JobClient: Map output materialized bytes=85 14/06/12 01:55:38 INFO mapred.JobClient: Map input records=2 14/06/12 01:55:38 INFO mapred.JobClient: Reduce shuffle bytes=85 14/06/12 01:55:38 INFO mapred.JobClient: Spilled Records=12 14/06/12 01:55:38 INFO mapred.JobClient: Map output bytes=82 14/06/12 01:55:38 INFO mapred.JobClient: Total committed heap usage (bytes)=336338944 14/06/12 01:55:38 INFO mapred.JobClient: CPU time spent (ms)=3010 14/06/12 01:55:38 INFO mapred.JobClient: Combine input records=8 14/06/12 01:55:38 INFO mapred.JobClient: SPLIT_RAW_BYTES=222 14/06/12 01:55:38 INFO mapred.JobClient: Reduce input records=6 14/06/12 01:55:38 INFO mapred.JobClient: Reduce input groups=5 14/06/12 01:55:38 INFO mapred.JobClient: Combine output records=6 14/06/12 01:55:38 INFO mapred.JobClient: Physical memory (bytes) snapshot=394276864 14/06/12 01:55:38 INFO mapred.JobClient: Reduce output records=5 14/06/12 01:55:38 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2918625280 14/06/12 01:55:38 INFO mapred.JobClient: Map output records=8 |
查看运行输出目录,有_SUCCESS文件,说明运行成功,证明hadoop集群环境搭建基本没有问题。
hadoop fs -ls output-0 |
Found 3 items -rw-r--r-- 3 hadoop supergroup 0 2014-06-12 01:55 /user/hadoop/output-0/_SUCCESS drwxr-xr-x - hadoop supergroup 0 2014-06-12 01:55 /user/hadoop/output-0/_logs -rw-r--r-- 3 hadoop supergroup 41 2014-06-12 01:55 /user/hadoop/output-0/part-r-00000 |
查看运行结果
hadoop fs -cat output-0/part-r-00000 |
Bye 1 Goodbye 1 Hadoop 2 Hello 2 World 2 |
与预设数据的预期结果一致。