三种运行模式 单机模式:安装简单,几乎不用作任何配置,但仅限于调试用途 伪分布模式:在单节点上同时启动namenode、datanode、jobtracker、tasktracker、secondary namenode等5个进程,模拟分布式运行的各个节点【考虑到本人的机器配置,选择这种方式学习】 完全分布式模式:正常的Hadoop集群,由多个各司其职的节点构成 伪分布式模式的安装和配置步骤 下载与解压 下载并解压Hadoop安装包,现在很多教程都是用0.20.2版本,所以我也选择这个版本 http://hadoop.apache.org/releases.html http://archive.apache.org/dist/hadoop/core/ http://archive.apache.org/dist/hadoop/core/hadoop-0.20.2/ hadoop-0.20.2.tar.gz root@debian3:/usr/local#tar zxvf hadoop-0.20.2.tar.gz 更改配置文件 进入Hadoop的解压目录,编辑conf/hadoop-env.sh文件(注意0.23版后配置文件的位置有所变化)配置JDK的安装路径 conf/hadoop-env.sh
# The java implementation to use. Required. export JAVA_HOME=/usr/local/jdk1.6.0_38 | JDK的安装请参考:基于debian(ubuntu)的JDK安装与卸载-vps环境搭建实录(一) 编辑conf目录下core-site.xml、hdfs-site.xml和mapred-site.xml三个核心配置文件 core-site.xml
<configuration> <!-- fs.default.name - 这是一个描述集群中NameNode结点的URI(包括协议、主机名称、端口号),集群里面的每一台机器都需要知道NameNode的地址。DataNode结点会先在NameNode上注册,这样它们的数据才可以被使用。独立的客户端程序通过这个URI跟DataNode交互,以取得文件的块列表。 --> <property> <name>fs.default.name</name> <value>hdfs://192.168.1.102:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop-0.20.2/mytmp</value> </property> </configuration> | hdfs-site.xml
<configuration> <!-- dfs.replication:它决定着系统里面的文件块的数据备份个数。对于一个实际的应用>,它 应该被设为3(这个数字并没有上限,但更多的备份可能并没有作用,而且会占用更多 的空间)。少于三个的备份,可能会影响到数据的可靠性(系统故障时,也许会造成数据丢> 失) dfs.data.dir:数据节点存储块的目录 --> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.data.dir</name> <value>/usr/local/hadoop-0.20.2/data</value> </property> </configuration> | mapred-site.xml
<configuration> <!-- mapred.job.tracker -JobTracker的主机(或者IP)和端口。--> <property> <name>mapred.job.tracker</name> <value>192.168.1.102:9001</value> </property> </configuration> | 编辑conf目录下conf/master和conf/conf/slaves,把IP添加进去 在conf/master中加入master的ip : 192.168.1.102 在conf/slaves中加入slaves的IP: 192.168.1.102 配置ssh 配置ssh,生成密钥,使到ssh可以免密码连192.168.1.102
root@debian3:/usr/local/hadoop-0.20.2# cd /root/ root@debian3:~# ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: 11:25:02:a9:9b:70:2f:52:72:10:96:3e:a6:21:9f:89 root@debian3 The key's randomart image is: +--[ RSA 2048]----+ |.o. .o. o.. | |o. . . o | |.. . . | |=+= . | |+X.* S | |E B . | | . . | | | | | +-----------------+ root@debian3:~# cd .ssh root@debian3:~/.ssh# ls id_rsa id_rsa.pub known_hosts id_rsa是密钥文件,id_rsa.pub是公钥文件。 root@debian3:~/.ssh# cp id_rsa.pub authorized_keys root@debian3:~/.ssh# ls authorized_keys id_rsa id_rsa.pub known_hosts | 格式化HDFS
root@debian3:/usr/local/hadoop-0.20.2# bin/hadoop namenode -format 13/02/23 22:57:17 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = debian3/127.0.1.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.2 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 ************************************************************/ 13/02/23 22:57:18 INFO namenode.FSNamesystem: fsOwner=root,root 13/02/23 22:57:18 INFO namenode.FSNamesystem: supergroup=supergroup 13/02/23 22:57:18 INFO namenode.FSNamesystem: isPermissionEnabled=true 13/02/23 22:57:18 INFO common.Storage: Image file of size 94 saved in 0 seconds. 13/02/23 22:57:18 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted. 13/02/23 22:57:18 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at debian3/127.0.1.1 ************************************************************/ | 启动Hadoop 使用bin/start-all.sh启动Hadoop
root@debian3:/usr/local/hadoop-0.20.2# bin/start-all.sh starting namenode, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-debian3.out 192.168.1.102: starting datanode, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-debian3.out 192.168.1.102: starting secondarynamenode, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-debian3.out starting jobtracker, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-debian3.out 192.168.1.102: starting tasktracker, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-debian3.out | 检测守护进程启动情况
root@debian3:/usr/local/hadoop-0.20.2# /usr/local/jdk1.6.0_38/bin/jps 9622 DataNode 9872 TaskTracker 9533 NameNode 9781 JobTracker 9711 SecondaryNameNode 9919 Jps | 关闭Hadoop 使用bin/stop-all.sh关闭Hadoop
root@debian3:/usr/local/hadoop-0.20.2# bin/stop-all.sh stopping jobtracker 192.168.1.102: stopping tasktracker stopping namenode 192.168.1.102: stopping datanode 192.168.1.102: stopping secondarynamenode | 本文链接:基于linux的hadoop环境搭建,由领悟书生原创,转载请注明出处http://www.656463.com/article/377 |