Hadoop 2.2.0安装配置

Pre-installation

保证所有主机上已经安装JDK 1.6+和ssh。

添加主机名到/etc/hosts

修改/etc/hosts

sudo vi /etc/hosts

添加

10.5.5.3    master
10.5.5.4    slave1
10.5.5.5    slave2
10.5.5.6    slave3
hostname修改在/etc/sysconfig/network

配置无密码的ssh连接(使得master无密码ssh到slave)

在所有主机上生成ssh的公钥和私钥

ssh-keygen -t rsa

在master主机上,生成authorized_keys

cat ~/.ssh/id_rsa.pub  >> ~/.ssh/authorized_keys

将master公交scp到其他slave上,并追加到authorized_keys中

Installation

Download Apache Hadoop

Apache Hadoop: http://www.apache.org/dyn/closer.cgi/hadoop/common/

保证master和slaves上面的hadoop解压的相同的目录

Set Environment Variables

#export HADOOP_HOME=/home/hadoop/hadoop-2.2.0
#export JAVA_HOME=/home/hadoop/jdk1.6.0_45
java home环境变量设置成本机的jdk路径,不知道jre路径可以不可以
写到.bashrc中也可以
 
网上资料还需要在/etc/profile中添加
#hadoop variable settings
export HADOOP_HOME=/root/hadoop-2.2.0
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib
   
#export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
#export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
不确定是否必须。

Hadoop Configuration

修改conf文件夹下面的几个文件(根据各种资料修改的):

  • core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
    </property>   
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/root/hadoop-2.2.0/tmp</value>
        <description>A base for other temporarydirectories.</description>
    </property>    
</configuration>
  • hdfs-site.xml
<configuration>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/root/hadoop-2.2.0/name</value>
        <description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description>
    <final>true</final>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/root/hadoop-2.2.0/data</value>
        <description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
    <final>true</final>
    </property>
    
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
</configuration>
  • mapred-site.xml
<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
</configuration>
  • yarn-site.xml
<configuration>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>master:8031</value>
        <description>host is the hostname of the resource manager and    port is the port on which the NodeManagers contact the Resource Manager. </description>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>master:8030</value>
        <description>host is the hostname of the resourcemanager and port is the port    on which the Applications in the cluster talk to the Resource Manager. </description>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
        <description>In case you do not want to use the default scheduler</description>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>master:8032</value>
        <description>the host is the hostname of the ResourceManager and the port is the port on    which the clients can talk to the Resource Manager. </description>
    </property>
    <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>${hadoop.tmp.dir}/nodemanager/local</value>
        <description>the local directories used by the nodemanager</description>
    </property>
    <property>
        <name>yarn.nodemanager.address</name>
        <value>0.0.0.0:8034</value>
        <description>the nodemanagers bind to this port</description>
    </property>
    <property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>${hadoop.tmp.dir}/nodemanager/remote</value>
        <description>directory on hdfs where the application logs are moved to </description>
    </property>
    <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>${hadoop.tmp.dir}/nodemanager/logs</value>
        <description>the directories used by Nodemanagers as log directories</description>
    </property>
    <!-- Use mapreduce_shuffle instead of mapreduce.suffle (YARN-1229)-->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
</configuration>
  • slaves
slave1
slave2
slave3
  • hadoop-env.sh
export JAVA_HOME=/home/hadoop/jdk1.6.0_45

此处JAVA_HOME可以根据每天Server情况设定。

格式化NameNode

第一次启动,需要先格式化NameNode。

hadoop namenode - format

Start Hadoop

~ /hadoop-2 .2.0 /sbin/start-all .sh

会输出各种结点日志创建过程

完成后,分别在master和slave的shell上输入jps,分别会显示:

[root@master ~]# jps
15096 ResourceManager
7980 Jps
14782 NameNode
14956 SecondaryNameNode

[root@slave1 ~]# jps
7839 NodeManager
7733 DataNode
20035 Jps

 Test

HDFS Web UI: http://10.108.100.18:50070/
YARN Web UI: http://10.108.100.18:8088/
YARN && MapReduce 测试(运行 wordcount)

[root@master hadoop-2.2.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /input /root/output

其中两个路径都是hadoop的namespace下的路径,对namespace中文件和文件夹操作格式:hadoop dfs -shell -filename

eg:hadoop dfs -mkdir /home/hadoop/input hadoop dfs -cat /home/hadoop/output/part-r-00000

其中,本地文件系统向hdfs传文件指令:hadoop dfs -put input/* /home/hadoop/input

前者是本地文件系统路径,后者是hdfs中路径。


至此,成功在openstack中几台虚拟机中搭建了hadoop集群。值得注意的是,每个slave应该在不同的计算结点上部署,才会达到很好的效率。

2014年3月,openstack中的hadoop项目Sahara(旧称:Savanna),从OpenStack孵化项目顺利毕业,将从OpenStack下一版本Juno开始作为OpenStack核心项目之一。

不知道Hortonworks的大牛们会怎样调度分配nova的VM来搭建hadoop集群


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值