hadoop3.0出来了,想尝试一下新版本的特性及mapreduce性能提升,以下以6台机器作为hadoop集群,机器主机名为:hadoop1、hadoop2、hadoop3、hadoop4、hadoop5、hadoop6,其中hadoop1-3作为namenode节点,hadoop4-6作为datanode节点。
一、前提条件
1、6台机器上都安装了jdk,并配置好了jdk环境变量(建议安装jdk1.8)。
jdk安装参考:http://blog.csdn.net/u011563666/article/details/50170465 中的1.7安装jdk部分。
2、集群中安装好了zookeeper集群,HDFS HA需要。
zookeeper集群安装参考:http://blog.csdn.net/u011563666/article/details/51320364
我这里假设zookeeper安装在:hadoop1、hadoop2、hadoop3这三台机器上。
3、集群中6台机器都相互配置了ssh免密码登录
ssh免密码登录配置参考:http://blog.csdn.net/u011563666/article/details/78200771
二、hadoop3.0安装步骤
1、下载hadoop3.0
wget http://mirrors.shu.edu.cn/apache/hadoop/common/hadoop-3.0.0/hadoop-3.0.0.tar.gz
2、解压文件
tar -zxvf hadoop-3.0.0.tar.gz -C /opt
3、修改hadoop-env.sh配置文件,配置jdk环境变量
export JAVA_HOME=/opt/jdk1.8.0_121
4、修改hdfs-site.xml配置文件
<configuration>
<!-- Hadoop 3.0 HA Configuration -->
<property>
<name>dfs.nameservices</name>
<value>hdfscluster</value>
</property>
<property>
<name>dfs.ha.namenodes.hdfscluster</name>
<value>nn1,nn2,nn3</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hdfscluster.nn1</name>
<value>hadoop1:9820</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hdfscluster.nn2</name>
<value>hadoop2:9820</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hdfscluster.nn3</name>
<value>hadoop3:9820</value>
</property>
<property>
<name>dfs.namenode.http-address.hdfscluster.nn1</name>
<value>hadoop1:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.hdfscluster.nn2</name>
<value>hadoop2:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.hdfscluster.nn3</name>
<value>hadoop3:9870</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/hdfscluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.hdfscluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/hadoop-3.0.0/datas/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop-3.0.0/datas/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop-3.0.0/datas/datanode</value>
</property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
<value>false</value>
</property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
<value>DEFAULT</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
</configuration>
5、修改core-site.xml配置文件
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hdfscluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-3.0.0/tmp</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>1440</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
</property>
</configuration>
6、修改yarn-site.xml配置文件
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop1:8031</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
7、修改mapred-site.xml配置文件
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop1:19888</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/opt/hadoop-3.0.0/etc/hadoop,
/opt/hadoop-3.0.0/share/hadoop/common/*,
/opt/hadoop-3.0.0/share/hadoop/common/lib/*,
/opt/hadoop-3.0.0/share/hadoop/hdfs/*,
/opt/hadoop-3.0.0/share/hadoop/hdfs/lib/*,
/opt/hadoop-3.0.0/share/hadoop/mapreduce/*,
/opt/hadoop-3.0.0/share/hadoop/mapreduce/lib/*,
/opt/hadoop-3.0.0/share/hadoop/yarn/*,
/opt/hadoop-3.0.0/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
8、修改workers配置文件
hadoop4
hadoop5
hadoop6
9、创建相应目录
mkdir -p /opt/hadoop-3.0.0/datas/journal
mkdir -p /opt/hadoop-3.0.0/datas/namenode
mkdir -p /opt/hadoop-3.0.0/datas/datanode
10、将配置好的hadoop拷贝到其它机器上(hadoop2-6)
scp -r /opt/hadoop-3.0.0 hadoop2:/opt
其它机器执行同样操作
三、启动hadoop集群
注意启动集群之前需要做如下步骤
1、格式化zkfc
hdfs zkfc -formatZK
2、启动journalnode
hadoop-daemon.sh start journalnode
3、在hadoop1上格式化namenode
命令:/opt/hadoop-3.0.0/bin/hadoop namenode -format
4、将hadoop1上格式化后的namenode元数据信息复制到hadoop2、hadoop3这两台namenode上
scp -r /opt/hadoop-3.0.0/datas/namenode/* hadoop2:/op/opt/hadoop-3.0.0/datas/namenode/
scp -r /opt/hadoop-3.0.0/datas/namenode/* hadoop3:/op/opt/hadoop-3.0.0/datas/namenode/
5、执行以上两步后现在可以启动hadoop集群了
在启动hdfs集群之前,先关闭之前已经启动的journalnode
启动hdfs集群:/opt/hadoop-3.0.0/sbin/start-dfs.sh
6、验证hdfs是否启动成功
分别访问:
都能访问说明安装成功了。需要注意三台namenode的状态。
7、注意事项
hadoop3.0的许多端口已经发生了变化,详细如下:
Namenode 端口:
50470 --> 9871
50070 --> 9870
8020 --> 9820
Secondary NN 端口:
50091 --> 9869
50090 --> 9868
Datanode 端口:
50020 --> 9867
50010 --> 9866
50475 --> 9865
50075 --> 9864