对Hadoop的集群来说,可以分成两大类角色:Master 和 Slave,前者主要配置NameNode和JobTracker 的角色,负责总管分布式数据和分解任务的执行,后者配置DataNode和TaskTracker 的角色,负责分布式数据存储以及任务的执行。
系统版本
rhel-server-6.3-x86_64
hadoop 版本
hadoop-0.20.2.tar.gz // 我用其他版本时总出问题,这个版本却很不错,不知大家什么个情况
JDK
jdk-7u7-linux-x64.tar.gz
IP地址分配
IP | 主机名 |
192.168.100.16 | master |
192.168.100.17 | slave1 |
配置host
Namenode(master)的机器,需要配置集群中所有机器的ip
修改/etc/hosts
127.0.0.1 localhost
::1 localhost6.localdomain6 localhost6
192.168.100.11 master
192.168.100.12 slave1
其他的datanode的/etc/hosts 只需要配置namenode的机器ip和本机ip,那当然也可以配置所有的IP
192.168.100.11 master
192.168.100.12 slave1
修改hostname(可选)
vi /proc/sys/kernel/hostname
注意:
/etc/hosts文件 /etc/sysconfig/network文件及执行 $hostname 命令得到的主机名要一致
关闭防火墙
[root@master bin]# service iptables stop;
iptables: Flushing firewall rules: [ OK ]
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Unloading modules: [ OK ]
[root@slave1 ~]# chkconfig iptables off;
建立haoop用户和组
groupadd -g 600 hadoop;
useradd -u 800 -g hadoop -d /home/hadoop hadoop
每个节点都有相同的用户hadoop
[root@prac1 ~]# chmod -R 775 /etc/sudoers
用root权限的用户登录系统,执行:
[root@master ~]# vi /etc/sudoers
在打开的文件中加入:
hadoop ALL=(ALL) ALL
注:方便hadoop用户访问和修改系统文件与配置
建立ssh无密码登录
注意在hadoop家目录下执行命令
在两个节点都执行如下操作
[hadoop@master ~]$ ssh-keygen -t rsa // 遇到输入全回车
[hadoop@master ~]$ ssh-keygen -t dsa
在master节点执行以下命令
[hadoop@master ~]$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys
[hadoop@master ~]$ cat .ssh/id_dsa.pub >> .ssh/authorized_keys
[hadoop@master ~]$ chmod -R 775 .ssh/authorized_keys
注意以下两步要在master上进行,如果在slave1上进行,则信任关系建立不成功
[hadoop@master ~]$ ssh slave1 cat .ssh/id_rsa.pub >>.ssh/authorized_keys
[hadoop@master ~]$ ssh slave1 cat .ssh/id_dsa.pub >>.ssh/authorized_keys
[hadoop@master ~]$ scp .ssh/authorized_keys slave1:~/.ssh/
测试ssh
ssh slave1 date
ssh master date
要保证ssh访问自己和对方都不需要密码
[hadoop@master ~]$ ssh slave1 date
Tue Oct 16 17:43:46 CST 2012
[hadoop@master ~]$ ssh master date
Tue Oct 16 17:43:53 CST 2012
[hadoop@slave1 ~]$ ssh master date
Tue Oct 16 17:44:16 CST 2012
[hadoop@slave1 ~]$ ssh slave1 date
Tue Oct 16 17:44:20 CST 2012
注:
[hadoop@master ~]$ ssh-keygen -t rsa // 执行它是生成id_rsa/id_rsa.pub的无密码的公/私钥对
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
[hadoop@master ~]$ ls -l ~/.ssh // 这样我们会看到如下所示的结果集:
-rw------- 1 root root 1675 Jul 5 14:01 id_rsa
-rw-r--r-- 1 root root 394 Jul 5 14:01 id_rsa.pub
然后将id_rsa.pub的内容复制到每个机器(也包括本机)的~/.ssh/authorized_keys文件中
[hadoop@master ~]$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys
因为ssh服务对文件的权限有着非常严格的限制(authorized_keys只能拥有指定用户的写权限)于是需要进行如下处理
[hadoop@master ~]$ chmod 644 ~/.ssh/authorized_keys
[hadoop@master ~]$ scp .ssh/authorized_keys slave1:~/.ssh/ // 复制到其他机器上
如果 建立ssh无密码登录 这小节不太清楚,可参考其他文档
下载jdk和hadoop
安装jdk(所有节点)
免安装文件直接解压到当前目录即可:
[root@prac1 ]# tar zxvf jdk-7u7-linux-x64.tar.gz // oracle 官网下载
[root@prac1 jdk1.7.0_07]# pwd
/root/jdk1.7.0_07
[root@prac1 jdk1.7.0_07]# cd ..
[root@prac1 ~]# mv jdk1.7.0_07/ jdk
[root@prac1 ~]# mv jdk /usr/local // 把整个hadoop 或jdk 单独放到根目录下面还不行
然后修改/etc/profile:
export JAVA_HOME=/usr/local/jdk
export CLASSPATH=.:$JAVA_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin
保存,并执行source /etc/profile
安装hadoop
[hadoop@master ~]$ su - root
Password:
[root@master ~]# cd /usr/local/
[root@master local]# tar zxvf hadoop-0.20.2.tar.gz
[root@master local]# mv hadoop-0.20.2 hadoop
然后修改/etc/profile
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
source /etc/profile
配置hadoop
root用户建立hadoop所需要的目录
mkdir -p /hadoop/hdfs/tmp
mkdir -p /hadoop/hdfs/name
mkdir -p /hadoop/hdfs/data
mkdir -p /hadoop/mapred/local
mkdir -p /hadoop/mapred/system
chown -R hadoop:hadoop /hadoop
chmod -R 775 /hadoop
1.修改/usr/local/hadoop/conf/hadoop-env.sh,添加jdk支持 // 这个很重要
export JAVA_HOME=/usr/local/jdk
2.修改conf/core-site.xml,增加下面内容
fs.default.name
hdfs://master:9000
dfs.replication
3
hadoop.tmp.dir
/hadoop/hdfs/tmp
3.修改conf/hdfs-site.xml,增加下面内容
dfs.name.dir
/hadoop/hdfs/name
dfs.data.dir
/hadoop/hdfs/data
dfs.datanode.max.xcievers
4096
dfs.replication
3
3.修改conf/mapred-site.xml,增加下面内容
mapred.job.tracker
master:9001
mapred.local.dir
/hadoop/mapred/local
true
mapred.system.dir
/hadoop/mapred/system
true
4. 修改conf/masters,这个决定那个是secondarynamenode
master
5 .修改conf/slaves,这个是所有datanode的机器
slave1
注:上面各个配置文件的意思请参考其他文档
将配置好的hadoop拷贝到所有的datanode
root@master:/data/soft/hadoop/conf# scp -rp /data/soft/hadoop-0.21.0 10.10.236.191:/data/soft/hadoop-0.21.0
格式化hdfs文件系统的namenode
[root@master conf]# hadoop namenode -format
12/10/15 17:30:51 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = master.oracle.com/60.195.191.227
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
Re-format filesystem in /hadoop/hdfs/name ? (Y or N) Y // 注意这里是大写的Y
12/10/15 17:30:54 INFO namenode.FSNamesystem: fsOwner=root,root,sfcb
12/10/15 17:30:54 INFO namenode.FSNamesystem: supergroup=supergroup
12/10/15 17:30:54 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/10/15 17:30:54 INFO common.Storage: Image file of size 94 saved in 0 seconds.
12/10/15 17:30:54 INFO common.Storage: Storage directory /hadoop/hdfs/name has been successfully formatted.
12/10/15 17:30:54 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master.oracle.com/60.195.191.227
************************************************************/
启动hadoop集群
root@master:/data/soft/hadoop# bin/start-all.sh
Hdfs操作
建立目录
root@master:/data/soft/hadoop # bin/hadoop dfs -mkdir testdir
查看现有文件
root@master:/data/soft/hadoop # bin/hadoop dfs -ls
关闭Hdfs
root@master:/data/soft/hadoop# bin/stop-all.sh
利用jps查看已经启动的服务
Master:
[hadoop@master ~]$ jps
2167 NameNode
2355 JobTracker
2297 SecondaryNameNode
4027 Jps
[hadoop@master ~]$
Slave:
[hadoop@slave1 ~]$ jps
6192 Jps
2096 DataNode
2178 TaskTracker
[hadoop@slave1 ~]$
[hadoop@slave1 ~]$ hadoop dfsadmin -report
Configured Capacity: 10568916992 (9.84 GB)
Present Capacity: 3969232896 (3.7 GB)
DFS Remaining: 3818074112 (3.56 GB)
DFS Used: 151158784 (144.16 MB)
DFS Used%: 3.81%
Under replicated blocks: 29
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
Name: 192.168.100.12:50010
Decommission Status : Normal
Configured Capacity: 10568916992 (9.84 GB)
DFS Used: 151158784 (144.16 MB)
Non DFS Used: 6599684096 (6.15 GB)
DFS Remaining: 3818074112(3.56 GB)
DFS Used%: 1.43%
DFS Remaining%: 36.13%
Last contact: Tue Oct 16 17:37:26 CST 2012
[hadoop@slave1 ~]$
[@more@]来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/28254374/viewspace-1059609/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/28254374/viewspace-1059609/