1、准备工作:修改网卡和主机名(在三个节点同时操作)
1)、修改网卡1:
注意:此处只用一个网卡就可以了,本地IP(10.10.10.1)可以不用配置
Vi/etc/sysconfig/network-scripts/ifcfg-eth0
修改网卡2:
设置好后重启网络服务:
看网卡的配置信息:
2)、修改主机名(修改/etc/hosts文件)(在三个节点同时操作)
[root@hadoop2 ~]# vi /etc/hosts
# Do not remove the following line, orvarious programs
# that require network functionality willfail.
127.0.0.1 localhost
192.168.6.52 hadoop2
10.10.10.2 hadoop2priv
192.168.6.51 hadoop1
10.10.10.1 hadoop1priv
192.168.6.53 hadoop3
10.10.10.3 hadoop3priv
注意:可以把此文件拷贝到节点2和节点3
修改完文件后,最后记得在相应的机器上执行hostname master(你修改后的名字) ,hostname slave1等;(此步可以省略)
[root@hadoop2 ~]# vi /etc/hostname
hadoop2
[root@hadoop2 ~]# hostname hadoop2
3)、关闭防火墙
需要关闭SELINX,执行:/usr/sbin/setenforce0
注意:最好是手动关闭。
还有:要把各个服务器的防火墙给关闭了,不然,后面运行时会报错。
Linux关闭防火墙命令:1) 永久性生效,重启后不会复原 开启:chkconfigiptables on 关闭:chkconfig iptables off 2) 即时生效,重启后复原 开启:serviceiptables start 关闭:service iptables stop
2、创建用户和组,并配置信任关系来实现免密码登陆
1)、创建用户和组(在三个节点同时操作)
[root@hadoop ~]# groupadd -g 200 hadoop
[root@hadoop ~]# useradd -u 200 -g hadoophadoop
[root@hadoop ~]# passwd hadoop
Changing password for user hadoop.
New UNIX password:
BAD PASSWORD: it is based on a dictionaryword
Retype new UNIX password:
passwd: all authentication tokens updatedsuccessfully.
[root@hadoop ~]# su - hadoop
2)、配置信任关系
在节点1
在hadoop用户下生成密钥:rsa格式的密钥都选择默认格式
[hadoop@hadoop ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key(/home/hadoop/.ssh/id_rsa):
Enter passphrase (empty for nopassphrase):
Enter same passphrase again:
Your identification has been saved in/home/hadoop/.ssh/id_rsa.
Your public key has been saved in/home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
1a:d9:48:f8:de:5b:be:e7:1f:5b:fd:48:df:59:59:94hadoop@hadoop
[hadoop@hadoop ~]$ cd .ssh
[hadoop@hadoop .ssh]$ ls
id_rsa id_rsa.pub
把公钥添加到密钥中去
[hadoop@hadoop ~]$ cat .ssh/id_rsa.pub >>.ssh/authorized_keys
#修改master 密钥权限,非常容易错误的地方。
chmod go-rwx/home/hadoop/.ssh/authorized_keys
[hadoop@h1 .ssh]$ ll
total 24
-rw------- 1 hadoop hadoop 391 Jun 7 17:07 authorized_keys 注意:权限为600
-rw------- 1 hadoop hadoop 1675 Jun 7 17:06 id_rsa
-rw-r--r-- 1 hadoop hadoop 391 Jun 7 17:06 id_rsa.pub
在节点2:(注意:在hadoop用户下)
[root@hadoop2 ~]# su - hadoop
[hadoop@hadoop2 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key(/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for nopassphrase):
Enter same passphrase again:
Your identification has been saved in/home/hadoop/.ssh/id_rsa.
Your public key has been saved in/home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
43:c8:05:b9:6d:44:2c:b9:f3:c9:da:2d:64:b7:e9:83hadoop@hadoop2
[hadoop@hadoop2 ~]$ cd .ssh
[hadoop@hadoop2 .ssh]$ ls
id_rsa id_rsa.pub
[hadoop@hadoop2 .ssh]$
在节点3:(注意:在hadoop用户下)
[root@hadoop3 ~]# su - hadoop
[hadoop@hadoop3 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key(/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for nopassphrase):
Enter same passphrase again:
Your identification has been saved in/home/hadoop/.ssh/id_rsa.
Your public key has been saved in/home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
be:2b:89:28:99:e2:46:1d:86:3c:cf:01:78:eb:5d:f3hadoop@hadoop3
[hadoop@hadoop3 ~]$ cd .ssh
[hadoop@hadoop3 .ssh]$ ls
id_rsa id_rsa.pub
[hadoop@hadoop3 .ssh]$
在节点1上,添加节点2和节点3的公钥到节点1的authorized_keys中:
[hadoop@hadoop1 ~]$ ssh hadoop2 cat.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
The authenticity of host 'hadoop2(192.168.6.52)' can't be established.
RSA key fingerprint isbe:ac:97:91:50:9c:63:b6:4d:35:3f:60:be:e1:ab:3d.
Are you sure you want to continueconnecting (yes/no)? yes
Warning: Permanently added'hadoop2,192.168.6.52' (RSA) to the list of known hosts.
hadoop@hadoop2's password:
[hadoop@hadoop1 ~]$ ssh hadoop3 cat.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
The authenticity of host 'hadoop3(192.168.6.53)' can't be established.
RSA key fingerprint isbe:ac:97:91:50:9c:63:b6:4d:35:3f:60:be:e1:ab:3d.
Are you sure you want to continueconnecting (yes/no)? yes
Warning: Permanently added 'hadoop3,192.168.6.53'(RSA) to the list of known hosts.
hadoop@hadoop3's password:
把节点上的authorized_keys 文件拷贝到节点2和节点3上:
[hadoop@hadoop1 ~]$ scp.ssh/authorized_keys hadoop2:~/.ssh
hadoop@hadoop2's password:
authorized_keys 100% 792 0.8KB/s 00:00
[hadoop@hadoop1 ~]$ scp.ssh/authorized_keys hadoop3:~/.ssh
hadoop@hadoop3's password:
authorized_keys 100% 1188 1.2KB/s 00:00
注意:拷贝到节点2和节点3上的authorized_keys文件的权限和节点1修改后的一样,所以,在节点2和节点3上就不用修改了
[hadoop@hadoop2 .ssh]$ ll
total 32
-rw------- 1 hadoop hadoop 1188 Dec 1914:20 authorized_keys
验证节点1能否连通节点1、节点2和节点3
[hadoop@hadoop1 ~]$ ssh hadoop1
Last login: Thu Dec 19 14:14:24 2013 fromhadoop1
[hadoop@hadoop1 ~]$ ssh hadoop2 date
Thu Dec 19 14:16:09 CST 2013
[hadoop@hadoop1 ~]$ ssh hadoop3 date
Thu Dec 19 14:22:31 CST 2013
验证节点2能否连通节点1、节点2和节点3
[hadoop@hadoop2 ~]$ ssh hadoop2
Last login: Thu Dec 19 14:15:42 2013 fromhadoop2
[hadoop@hadoop2 ~]$ ssh hadoop1 date
Thu Dec 19 14:23:12 CST 2013
[hadoop@hadoop2 ~]$ ssh hadoop3 date
The authenticity of host 'hadoop3(192.168.6.53)' can't be established.
RSA key fingerprint isbe:ac:97:91:50:9c:63:b6:4d:35:3f:60:be:e1:ab:3d.
Are you sure you want to continueconnecting (yes/no)? yes
Warning: Permanently added'hadoop3,192.168.6.53' (RSA) to the list of known hosts.
Thu Dec 19 14:23:18 CST 2013
验证节点3能否连通节点1、节点2和节点3
hadoop@hadoop3 .ssh]$ ssh hadoop3
The authenticity of host 'hadoop3 (192.168.6.53)'can't be established.
RSA key fingerprint isbe:ac:97:91:50:9c:63:b6:4d:35:3f:60:be:e1:ab:3d.
Are you sure you want to continueconnecting (yes/no)? yes
Warning: Permanently added'hadoop3,192.168.6.53' (RSA) to the list of known hosts.
Last login: Thu Dec 19 14:22:03 2013 fromhadoop1
[hadoop@hadoop3 ~]$ ssh hadoop2 date
The authenticity of host 'hadoop2(192.168.6.52)' can't be established.
RSA key fingerprint isbe:ac:97:91:50:9c:63:b6:4d:35:3f:60:be:e1:ab:3d.
Are you sure you want to continueconnecting (yes/no)? yes
Warning: Permanently added'hadoop2,192.168.6.52' (RSA) to the list of known hosts.
Thu Dec 19 14:24:08 CST 2013
[hadoop@hadoop3 ~]$ ssh hadoop2 date
Thu Dec 19 14:24:16 CST 2013
3、下载并安装 JAVA JDK系统软件
(注意:在root用户下安装,不然会报错)
请参考linux中安装jdk(在三个节点同时操作)
#下载jdk
wget http://60.28.110.228/source/package/jdk-6u21-linux-i586-rpm.bin
#安装jdk
chmod +x jdk-6u21-linux-i586-rpm.bin 别忘了赋权限
./jdk-6u21-linux-i586-rpm.bin
安装,执行命令
[root@hn ~]# rpm -ivh jdk-6u17-linux-i586.rpm
(jdk的默认路径为/usr/java/jdk1.6.0_17)
#配置环境变量
注意:此处可以修改.bash_profile,也可以修改/etc/profile,也可以修改 /etc/profile.d/java.sh。通常修改.bash_profile,下面以此为例。(此处最好修改/etc/profile文件)
[root@linux64 ~]# vi .bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startupprograms
PATH=$PATH:$HOME/bin
export PATH
unset USERNAME
添加java的参数
export JAVA_HOME=/home/hadoop/java/jdk1.7.0_45
exportHADOOP_HOME=/home/hadoop/hadoop/hadoop-1.1.2
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
exportPATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
vi /etc/profile
#复制粘贴一下内容 到 vi 中。
export JAVA_HOME=/home/hadoop/java/jdk1.8.0_25
exportHADOOP_HOME=/home/hadoop/hadoop/hadoop-2.2.0
exportPATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
exportHADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
exportHADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
exportHADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
#手动立即生效
source /etc/profile
#测试 :下面两个命令都可能报下面的错误
[hadoop@ha2 ~]$ java
[hadoop@ha2 ~]$ jps
Error: dl failure on line 863
Error: failed/home/hadoop/java/jdk1.7.0_45/jre/lib/i386/client/libjvm.so, because/home/hadoop/java/jdk1.7.0_45/jre/lib/i386/client/libjvm.so: cannot restoresegment prot after reloc: Permission denied
这是因为SELINUX的问题,需要关闭SELINX,执行:/usr/sbin/setenforce 0
注意:最好是手动关闭。
还有:要把各个服务器的防火墙给关闭了,不然,后面运行时会报错。
4、检查基础环境
/sbin/ifconfig
[hadoop@master root]$ /sbin/ifconfig eth0 Link encap:Ethernet HWaddr 00:0C:29:7A:DE:12 inet addr:192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe7a:de12/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:14 errors:0 dropped:0 overruns:0 frame:0 TX packets:821 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1591 (1.5 KiB) TX bytes:81925 (80.0 KiB) Interrupt:67 Base address:0x2024 |
ping master
ssh master
jps
echo $JAVA_HOME
echo $HADOOP_HOME
Hadoop
5、在节点1解压安装hadoop
创建hadoop的安装目录(在三个节点同时操作)
[root@hadoop1 hadoop]# mkdir -p/opt/hadoop
[root@hadoop1 hadoop]# chown -Rhadoop:hadoop /opt/hadoop/
[root@hadoop1 hadoop]# chmod 755 *
[root@hadoop1 hadoop]# ll
total 61036
-rwxr-xr-x 1 hadoop hadoop 62428860 Dec 1722:48 hadoop-1.0.3.tar.gz
解压hadoop软件 (只需要在节点1上安装,配置好后拷贝到其他节点即可)
tar -xzvf hadoop-1.0.3.tar.gz
6、安装Hadoop2.2,搭建集群
在hadoopMaster上安装hadoop
首先到Apache官网上下载hadoop2.2的压缩文件,将其解压到当前用户的根文件夹中(home/fyzwjd/),将解压出的文件夹改名为hadoop。
$ sudo mv hadoop-2.2.0 hadoop
配置之前,先在本地文件系统创建以下文件夹:~/hadoop/tmp、~/dfs/data、~/dfs/name。 主要涉及的配置文件有7个:都在/hadoop/etc/hadoop文件夹下,可以用gedit命令对其进行编辑。
~/hadoop/etc/hadoop/hadoop-env.sh
~/hadoop/etc/hadoop/yarn-env.sh
~/hadoop/etc/hadoop/slaves
~/hadoop/etc/hadoop/core-site.xml
~/hadoop/etc/hadoop/hdfs-site.xml
~/hadoop/etc/hadoop/mapred-site.xml
~/hadoop/etc/hadoop/yarn-site.xml
(1) 配置文件1:hadoop-env.sh
修改JAVA_HOME值(export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_51)
(2) 配置文件2:yarn-env.sh
修改JAVA_HOME值(export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_51)
(3) 配置文件3:slaves
hadoopSalve1
hadoopSlave2
(4) 配置文件4:core-site.xml
创建文件夹:
[hadoop@had1 hadoop-2.2.0]$ mkdir tmp
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://had1:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/hadoop/hadoop-2.2.0/tmp</value>
<description>Abasefor othertemporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.fyzwjd.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.fyzwjd.groups</name>
<value>*</value>
</property>
</configuration>
(5) 配置文件5:hdfs-site.xml
创建目录:
[hadoop@had1 hadoop-2.2.0]$ mkdir dfs
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>had1:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop/hadoop-2.2.0/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop/hadoop-2.2.0/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
(6) 配置文件6:mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>had1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>had1:19888</value>
</property>
</configuration>
(7) 配置文件7:yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>had1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>had1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>had1:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>had1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>had1:8088</value>
</property>
</configuration>
2、将hadoop文件夹拷贝到hadoopSlave1和hadoopSlave2上。
scp –r/home/fyzwjd/hadoopfyzwjd@hadoopSlave1:~/
scp –r/home/fyzwjd/hadoopfyzwjd@hadoopSlave2:~/
7、验证与运行
所有的组件启动和停止服务都在/hadoop/sbin目录下,一般启动hadoop前会格式化namenode。具体命令参考如下:
进入安装目录: cd ~/hadoop/
格式化namenode:./bin/hdfsnamenode –format
启动hdfs:./sbin/start-dfs.sh
此时在hadoopMaster上面运行的进程有:namenodesecondarynamenode
hadoopSlave1和hadoopSlave2上面运行的进程有:datanode
启动yarn:./sbin/start-yarn.sh
此时在hadoopMaster上面运行的进程有:namenode secondarynamenode resourcemanager
hadoopSlave1和hadoopSlave2上面运行的进程有:datanode nodemanaget
查看集群状态:./bin/hdfs dfsadmin –report
查看文件块组成: ./bin/hdfs fsck/ -files -blocks
查看HDFS: http://hadoopMaster:50070
查看RM: http://hadoopMaster:8088
8、配置系统参数:
#配置 hadoop-env.sh 环境变量(在三个节点同时操作)
1)、#配置Hadoop 最大HADOOP_HEAPSIZE 大小, 默认 为 2000。
[hadoop@hadoop1 hadoop]$ vi/opt/hadoop/hadoop-1.0.3/conf/hadoop-env.sh
# The maximum amount of heap to use, inMB. Default is 1000.
export HADOOP_HEAPSIZE=1000 去掉前面的注释符,值默认为2000,改为1000,也可以不修改
2)、hadoop-env.sh文件中增加JAVA_HOME的路径,它的作用是配置与hadoop运行环境相关的变量
# The java implementation to use. Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
export JAVA_HOME=/usr/java/jdk1.6.0_21
9、设置主节点和子节点
修改masters和slaves文件
配置主节点
[hadoop@hadoop1 conf]$ vi maters
Hadoop1
配置其他节点
[hadoop@hadoop1 conf]$ vi slaves
hadoop2
hadoop3
10、在 root 下创建Hadoopmapred 、 hdfs namenode 和 datanode 目录
(在三个节点同时操作)
mkdir -p /opt/data/hadoop/
chown -R hadoop:hadoop /opt/data/*
#切换到 hadoop 用户下
su hadoop
#创建mapreduce
mkdir -p /opt/data/hadoop/mapred/mrlocal
mkdir -p /opt/data/hadoop/mapred/mrsystem
mkdir -p /opt/data/hadoop/hdfs/name
mkdir -p /opt/data/hadoop/hdfs/data
mkdir -p /opt/data/hadoop/hdfs/var
mkdir -p /opt/data/hadoop/hdfs/namesecondary
#Hadoop Common组件 配置 core-site.xml (文件入口)
#编辑 core-site.xml 文件
vi/opt/modules/hadoop/hadoop-1.0.3/conf/core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name> 注意:此配置最好不要少,也不要忘记创建此临时文件夹(三个节点操作)。
<value>/home/hadoop/hadoop-1.0.3/tmp</value>
</property>
hadoop.tmp.dir:Hadoop的默认临时路径,这个最好配置,如果在新增节点或者其他情况下莫名其妙的DataNode启动不了,就删除此文件中的tmp目录即可。不过如果删除了NameNode机器的此目录,那么就需要重新执行NameNode格式化的命令。
注意:上面此处为最简单的配置,最好再加一个Hadoop的临时文件夹,如下
#HDFS NameNode,DataNode组建配置hdfs-site.xml
vi /opt/modules/hadoop/hadoop-1.0.3/conf/hdfs-site.xml
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/hadoop/hdfs/data</value>
</property>
/home/hadoop/hadoop/hdfs/data/data1,/home/hadoop/hadoop/hdfs/data/data2
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
指定HDFS的复制因子:如下
#配置MapReduce - JobTracker TaskTracker 启动配置
vi /opt/modules/hadoop/hadoop-1.0.3/conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>hadoop1:9001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/hadoop/hadoop_home/var</value>
</property>
指定JOBtracker的端口和地址
11、copy Hadoop目录
把节点1上的hadoop目录分别拷贝到节点2和节点3 注意:节点1上的master和slaves这2个配置文件可以不拷贝到节点2和节点3上,只在节点1上保存即可
#切换到 hadoop 用户下
su hadoop
scp -r /opt/hadoop/hadoop-1.0.3/ hadoop1:/opt/hadoop/
scp -r /opt/hadoop/hadoop-1.0.3/ hadoop2:/opt/hadoop/
注意:在操作前最好先把防火墙和selinux给关闭:最好用图形界面关闭,否则有时命令没有关闭导致后面程序报错。
service iptables status 查看iptables状态
service iptablesrestart iptables服务重启
service iptables stopiptables服务禁用
在Linux下设置selinux有三种方法。
a、在图形界面中:
桌面-->管理-->安全级别和防火墙,设置为disable。
b、在命令模式下:
修改文件:/etc/selinux/config,然后重启系统。具体修改如图:
c、运行命令:setup,进入”防火墙配置“,在selinux栏,选择”禁用“。(此方法很少用)
12、初始配置
在节点1即主节点上操作
格式化HDFS文件系统。进入/jz/hadoop-0.20.2/bin目录。执行:
hadoopnamenode –format
[hadoop@hadoop1conf]$ hadoop namenode -format
13/12/1917:39:34 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG:Starting NameNode
STARTUP_MSG: host = hadoop1/192.168.6.51
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.0.3
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192; compiled by 'hortonfo' on TueMay 8 20:31:25 UTC 2012
************************************************************/
Re-formatfilesystem in /opt/data/hadoop/hdfs/name ? (Y or N) Y
13/12/1917:42:35 INFO util.GSet: VM type =32-bit
13/12/1917:42:35 INFO util.GSet: 2% max memory = 19.33375 MB
13/12/19 17:42:35INFO util.GSet: capacity = 2^22 =4194304 entries
13/12/1917:42:35 INFO util.GSet: recommended=4194304, actual=4194304
13/12/1917:42:35 INFO namenode.FSNamesystem: fsOwner=hadoop
13/12/1917:42:35 INFO namenode.FSNamesystem: supergroup=supergroup
13/12/1917:42:35 INFO namenode.FSNamesystem: isPermissionEnabled=false
13/12/1917:42:35 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/12/1917:42:35 INFO namenode.FSNamesystem: isAccessTokenEnabled=falseaccessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/12/1917:42:35 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/12/1917:42:36 INFO common.Storage: Image file of size 112 saved in 0 seconds.
13/12/1917:42:36 INFO common.Storage: Storage directory /opt/data/hadoop/hdfs/name has been successfullyformatted.
13/12/1917:42:36 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG:Shutting down NameNode at hadoop1/192.168.6.51
************************************************************/
13、#批量启动和关闭集群
在节点1即主节点上操作
1)、#全部启动
/opt/modules/hadoop/hadoop-1.0.3/bin/start-all.sh
在主节点hadoop1上面启动hadoop,主节点会启动所有从节点的hadoop。
启动后用jps命令看到
DataNode,Jps,NameNode,JobTracker,TaskTracker,SecondaryNameNode则正常。
[hadoop@hadoop1 hadoop-1.0.3]$ jps
10560 JobTracker
10474 SecondaryNameNode
10184 NameNode
10705 TaskTracker
10896 Jps
[hadoop@hadoop2 ~]$ jps
6053 Jps
5826 DataNode
5938 TaskTracker
[root@hadoop3 hadoop-1.0.3]# jps
5724 DataNode
5944 Jps
5835 TaskTracker
#全部关闭
/opt/modules/hadoop/hadoop-1.0.3/bin/stop-all.sh
从主节点master关闭hadoop,主节点会关闭所有从节点的hadoop。
2)、在/jz/hadoop-0.20.2/bin目录下, 注意:此步骤要在下面11步骤执行完后执行
执行: hadoop fs -ls /
[hadoop@hadoop1 conf]$ hadoop fs -ls /
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2013-12-19 17:45 /opt
如果控制台返回结果,表示初始化成功。可以向里面录入数据。
14、测试经典示例
通过执行 Wordcount运行样例检查集群是否成功
[hadoop@hadoop1 hadoop-1.0.3]$ hadoop fs-ls /
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2014-05-11 10:16 /home
drwxr-xr-x - hadoop supergroup 0 2014-05-11 10:41 /user
[hadoop@hadoop1 hadoop-1.0.3]$ hadoop fs -mkdir /input
[hadoop@hadoop1 hadoop-1.0.3]$ hadoop fs-ls /
Found 3 items
drwxr-xr-x - hadoop supergroup 0 2014-05-11 10:16 /home
drwxr-xr-x - hadoop supergroup 0 2014-05-11 10:48 /input
drwxr-xr-x - hadoop supergroup 0 2014-05-11 10:41 /user
[hadoop@hadoop1 bin]$ hadoop fs -put *.sh /input
[hadoop@hadoop1 bin]$ hadoop fs -ls /input
Found 14 items
-rw-r--r-- 3 hadoop supergroup 2373 2014-05-11 10:49/input/hadoop-config.sh
-rw-r--r-- 3 hadoop supergroup 4336 2014-05-11 10:49/input/hadoop-daemon.sh
-rw-r--r-- 3 hadoop supergroup 1329 2014-05-11 10:49/input/hadoop-daemons.sh
-rw-r--r-- 3 hadoop supergroup 2143 2014-05-11 10:49 /input/slaves.sh
-rw-r--r-- 3 hadoop supergroup 1166 2014-05-11 10:49/input/start-all.sh
-rw-r--r-- 3 hadoop supergroup 1065 2014-05-11 10:49/input/start-balancer.sh
-rw-r--r-- 3 hadoop supergroup 1745 2014-05-11 10:49/input/start-dfs.sh
-rw-r--r-- 3 hadoop supergroup 1145 2014-05-11 10:49/input/start-jobhistoryserver.sh
-rw-r--r-- 3 hadoop supergroup 1259 2014-05-11 10:49/input/start-mapred.sh
-rw-r--r-- 3 hadoop supergroup 1119 2014-05-11 10:49 /input/stop-all.sh
-rw-r--r-- 3 hadoop supergroup 1116 2014-05-11 10:49/input/stop-balancer.sh
-rw-r--r-- 3 hadoop supergroup 1246 2014-05-11 10:49 /input/stop-dfs.sh
-rw-r--r-- 3 hadoop supergroup 1131 2014-05-11 10:49/input/stop-jobhistoryserver.sh
-rw-r--r-- 3 hadoop supergroup 1168 2014-05-11 10:49/input/stop-mapred.sh
[hadoop@hadoop1 hadoop-1.0.3]$ hadoop jar hadoop-examples-1.0.3.jarwordcount /input /output
14/05/11 10:50:34 INFOinput.FileInputFormat: Total input paths to process : 14
14/05/11 10:50:34 INFOutil.NativeCodeLoader: Loaded the native-hadoop library
14/05/11 10:50:34 WARN snappy.LoadSnappy:Snappy native library not loaded
14/05/11 10:50:34 INFO mapred.JobClient:Running job: job_201405111029_0002
14/05/11 10:50:35 INFOmapred.JobClient: map 0% reduce 0%
14/05/11 10:50:48 INFOmapred.JobClient: map 7% reduce 0%
14/05/11 10:50:51 INFO mapred.JobClient: map 28% reduce 0%
14/05/11 10:50:52 INFOmapred.JobClient: map 42% reduce 0%
14/05/11 10:50:59 INFOmapred.JobClient: map 57% reduce 0%
14/05/11 10:51:00 INFOmapred.JobClient: map 71% reduce 0%
14/05/11 10:51:01 INFOmapred.JobClient: map 85% reduce 0%
14/05/11 10:51:08 INFOmapred.JobClient: map 100% reduce 0%
14/05/11 10:51:11 INFOmapred.JobClient: map 100% reduce 28%
14/05/11 10:51:22 INFOmapred.JobClient: map 100% reduce 100%
14/05/11 10:51:27 INFO mapred.JobClient:Job complete: job_201405111029_0002
14/05/11 10:51:27 INFO mapred.JobClient:Counters: 30
14/05/11 10:51:27 INFOmapred.JobClient: Job Counters
14/05/11 10:51:27 INFOmapred.JobClient: Launched reducetasks=1
14/05/11 10:51:27 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=105264
14/05/11 10:51:27 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
14/05/11 10:51:27 INFOmapred.JobClient: Total time spent byall maps waiting after reserving slots (ms)=0
14/05/11 10:51:27 INFOmapred.JobClient: Rack-local maptasks=4
14/05/11 10:51:27 INFOmapred.JobClient: Launched maptasks=14
14/05/11 10:51:27 INFOmapred.JobClient: Data-local maptasks=10
14/05/11 10:51:27 INFOmapred.JobClient: SLOTS_MILLIS_REDUCES=31039
14/05/11 10:51:27 INFOmapred.JobClient: File Output FormatCounters
14/05/11 10:51:27 INFOmapred.JobClient: Bytes Written=6173
14/05/11 10:51:27 INFOmapred.JobClient: FileSystemCounters
14/05/11 10:51:27 INFOmapred.JobClient: FILE_BYTES_READ=28724
14/05/11 10:51:27 INFOmapred.JobClient: HDFS_BYTES_READ=23830
14/05/11 10:51:27 INFOmapred.JobClient: FILE_BYTES_WRITTEN=382189
14/05/11 10:51:27 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=6173
14/05/11 10:51:27 INFO mapred.JobClient: File Input Format Counters
14/05/11 10:51:27 INFOmapred.JobClient: Bytes Read=22341
14/05/11 10:51:27 INFOmapred.JobClient: Map-Reduce Framework
14/05/11 10:51:27 INFOmapred.JobClient: Map outputmaterialized bytes=28802
14/05/11 10:51:27 INFOmapred.JobClient: Map inputrecords=691
14/05/11 10:51:27 INFOmapred.JobClient: Reduce shufflebytes=28802
14/05/11 10:51:27 INFOmapred.JobClient: SpilledRecords=4018
14/05/11 10:51:27 INFOmapred.JobClient: Map output bytes=34161
14/05/11 10:51:27 INFOmapred.JobClient: Total committedheap usage (bytes)=2266947584
14/05/11 10:51:27 INFOmapred.JobClient: CPU time spent(ms)=7070
14/05/11 10:51:27 INFOmapred.JobClient: Combine inputrecords=3137
14/05/11 10:51:27 INFOmapred.JobClient: SPLIT_RAW_BYTES=1489
14/05/11 10:51:27 INFOmapred.JobClient: Reduce inputrecords=2009
14/05/11 10:51:27 INFOmapred.JobClient: Reduce inputgroups=497
14/05/11 10:51:27 INFOmapred.JobClient: Combine output records=2009
14/05/11 10:51:27 INFOmapred.JobClient: Physical memory(bytes) snapshot=2002677760
14/05/11 10:51:27 INFOmapred.JobClient: Reduce outputrecords=497
14/05/11 10:51:27 INFOmapred.JobClient: Virtual memory(bytes) snapshot=5151145984
14/05/11 10:51:27 INFOmapred.JobClient: Map outputrecords=3137
通过执行 Hadoop pi 运行样例检查集群是否成功
[hadoop@hadoop1 hadoop-1.0.3]$ hadoop jar/home/hadoop/hadoop-1.0.3/hadoop-examples-1.0.3.jar pi 10 100
Number of Maps = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
14/05/11 10:41:40 INFOmapred.FileInputFormat: Total input paths to process : 10
14/05/11 10:41:40 INFO mapred.JobClient:Running job: job_201405111029_0001
14/05/11 10:41:41 INFOmapred.JobClient: map 0% reduce 0%
14/05/11 10:41:54 INFOmapred.JobClient: map 20% reduce 0%
14/05/11 10:42:00 INFOmapred.JobClient: map 40% reduce 0%
14/05/11 10:42:01 INFOmapred.JobClient: map 60% reduce 0%
14/05/11 10:42:03 INFOmapred.JobClient: map 80% reduce 0%
14/05/11 10:42:05 INFOmapred.JobClient: map 100% reduce 0%
14/05/11 10:42:11 INFOmapred.JobClient: map 100% reduce 26%
14/05/11 10:42:17 INFOmapred.JobClient: map 100% reduce 100%
14/05/11 10:42:22 INFO mapred.JobClient:Job complete: job_201405111029_0001
14/05/11 10:42:22 INFO mapred.JobClient:Counters: 31
14/05/11 10:42:22 INFOmapred.JobClient: Job Counters
14/05/11 10:42:22 INFOmapred.JobClient: Launched reducetasks=1
14/05/11 10:42:22 INFOmapred.JobClient: SLOTS_MILLIS_MAPS=70846
14/05/11 10:42:22 INFOmapred.JobClient: Total time spent byall reduces waiting after reserving slots (ms)=0
14/05/11 10:42:22 INFOmapred.JobClient: Total time spent byall maps waiting after reserving slots (ms)=0
14/05/11 10:42:22 INFOmapred.JobClient: Rack-local maptasks=2
14/05/11 10:42:22 INFO mapred.JobClient: Launched map tasks=10
14/05/11 10:42:22 INFOmapred.JobClient: Data-local maptasks=8
14/05/11 10:42:22 INFOmapred.JobClient: SLOTS_MILLIS_REDUCES=22725
14/05/11 10:42:22 INFOmapred.JobClient: File Input FormatCounters
14/05/11 10:42:22 INFOmapred.JobClient: Bytes Read=1180
14/05/11 10:42:22 INFOmapred.JobClient: File Output FormatCounters
14/05/11 10:42:22 INFOmapred.JobClient: Bytes Written=97
14/05/11 10:42:22 INFOmapred.JobClient: FileSystemCounters
14/05/11 10:42:22 INFOmapred.JobClient: FILE_BYTES_READ=226
14/05/11 10:42:22 INFOmapred.JobClient: HDFS_BYTES_READ=2390
14/05/11 10:42:22 INFOmapred.JobClient: FILE_BYTES_WRITTEN=239428
14/05/11 10:42:22 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=215
14/05/11 10:42:22 INFOmapred.JobClient: Map-Reduce Framework
14/05/11 10:42:22 INFOmapred.JobClient: Map outputmaterialized bytes=280
14/05/11 10:42:22 INFOmapred.JobClient: Map inputrecords=10
14/05/11 10:42:22 INFO mapred.JobClient: Reduce shuffle bytes=280
14/05/11 10:42:22 INFOmapred.JobClient: Spilled Records=40
14/05/11 10:42:22 INFOmapred.JobClient: Map outputbytes=180
14/05/11 10:42:22 INFOmapred.JobClient: Total committedheap usage (bytes)=1623891968
14/05/11 10:42:22 INFOmapred.JobClient: CPU time spent(ms)=4990
14/05/11 10:42:22 INFOmapred.JobClient: Map input bytes=240
14/05/11 10:42:22 INFOmapred.JobClient: SPLIT_RAW_BYTES=1210
14/05/11 10:42:22 INFOmapred.JobClient: Combine input records=0
14/05/11 10:42:22 INFOmapred.JobClient: Reduce inputrecords=20
14/05/11 10:42:22 INFOmapred.JobClient: Reduce inputgroups=20
14/05/11 10:42:22 INFOmapred.JobClient: Combine outputrecords=0
14/05/11 10:42:22 INFO mapred.JobClient: Physical memory (bytes)snapshot=1441292288
14/05/11 10:42:22 INFOmapred.JobClient: Reduce outputrecords=0
14/05/11 10:42:22 INFOmapred.JobClient: Virtual memory(bytes) snapshot=3777863680
14/05/11 10:42:22 INFO mapred.JobClient: Map output records=20
Job Finished in 42.086 seconds
Estimated value of Pi is3.14800000000000000000
12、通过WEB查看hadoop
查看集群状态 | |
查看JOB状态 |
#通过界面查看集群部署部署成功
#检查 namenode 和 datanode 是否正常
#检查 jobtracker 和 tasktracker 是否正常
hadoop fs -ls/
hadoop fs-mkdir /data/
#通过执行 Hadoop pi 运行样例检查集群是否成功
cd /opt/modules/hadoop/hadoop-1.0.3
bin/hadoop jar hadoop-examples-1.0.3.jarpi 10 100
#集群正常效果如下
12/07/15 10:50:48 INFO mapred.FileInputFormat: Total input paths to process : 10 12/07/15 10:50:48 INFO mapred.JobClient: Running job: job_201207151041_0001 12/07/15 10:50:49 INFO mapred.JobClient: map 0% reduce 0% 12/07/15 10:51:42 INFO mapred.JobClient: map 40% reduce 0% 12/07/15 10:52:07 INFO mapred.JobClient: map 70% reduce 13% 12/07/15 10:52:10 INFO mapred.JobClient: map 80% reduce 16% 12/07/15 10:52:11 INFO mapred.JobClient: map 90% reduce 16% 12/07/15 10:52:22 INFO mapred.JobClient: map 100% reduce 100% ..................... 12/07/15 10:52:28 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2155343872 12/07/15 10:52:28 INFO mapred.JobClient: Map output records=20 Job Finished in 100.608 seconds Estimated value of Pi is 3.14800000000000000000 |