hadoop1.0.4升级到hadoop2.4.1 with HA
1、准备
a)、节点
服务器 | 角色 |
---|---|
namenode0 | namenode,ResourceManager,zkfc |
datanode0 | namenode,NodeManager,zkfc,datanode,JournalNode |
datanode1 | datanode,NodeManager,JournalNode |
datanode2 | datanode,NodeManager,JournalNode |
b)、软件环境
下载hadoop2.4.1软件包
cd /usr/local
wget http://mirrors.cnnic.cn/apache/hadoop/common/stable2/hadoop-2.4.1.tar.gz
tar xzvf hadoop-2.4-1.tar.gz
2、HDFS升级
2.1 HDFS升级
尝试这按官方文档直接升级到hadoop2的HA模式,没有成功,最后分为两阶段HDFS的升级才成功
2.1.1升级前的一些备份工作
a)、停止hadoop1、备份hadoop1的配置文件
#假设HADOOP_HOME已经配置好
#stop hadoop1
$HADOOP_HOME/bin/stop-all.sh
#备份配置
cp ${HADOO_HOME}/conf ${HADOOP_HOME}/conf_bak
#创建软连接
ln -s hadoop-2.4.1 hadoop
b)、备份hadoop1的元数据文件
#此变量代表hdfs-site.xml文件中的配置项
cp ${dfs.name.dir} ${dfs.name.dir}_bak
c)、copy步骤a中的备份文件到hadoop2
cp ${OLD_HADOOP_HOME}/conf_bak/* ${hadoop_home}/etc/hadoop/
2.1.2 升级HDFS(无HA)
a)、分发准备好的HADOOP2软件包
假设前面的准备工作都在namenode0上操作,并把这些操作在其他节点重复操作一遍
#分发软件包(在namenode0上操作)
scp -r /usr/local/hadoop-2.4-1 datanode0:/usr/local/;
scp -r /usr/local/hadoop-2.4-1 datanode1:/usr/local/;
scp -r /usr/local/hadoop-2.4-1 datanode2:/usr/local/;
b)、执行升级(@namenode0)
cd $HADOOP_HOME/bin
./hdfs namenode -rollingUpgrade started
#若升级失败
./hdfs namenode -rollback
#接着启动hdfs
cd ${hadoop_home}/sbin
./start-dfs.sh
以上命令操作完毕后,若没有出现错误,就算hdfs从1.x升级到了2.x(无HA),接下来将hdfs转为HA模式
2.2 HDFS转为HA模式
a)、修改配置文件
core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://oppo-hdp1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop/tmpdata</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>10080</value>
<description>Number of minutes between trash checkpoints.If zero, the trash feature is disabled.</description>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>datanode0:2181,datanode1:2181,datanode2:2181</value>
<description>
A list of ZooKeeper server addresses, separated by commas, that are
to be used by the ZKFailoverController in automatic failover.
</description>
</property>
<property>
<name>dfs.datanode.socket.write.timeout</name>
<value>36000000</value>
</property>
<property>
<name>ipc.server.tcpnodelay</name>
<value>true</value>
</property>
<property>
<name>ipc.client.tcpnodelay</name>
<value>true</value>
</property>
<property>
<name>webinterface.private.actions</name>
<value>true</value>
<description> If set to true, the web interfaces of JT and NN may contain
actions, such as kill job, delete file, etc., that should
not be exposed to public. Enable this option if the interfaces
are only reachable by those who have the right authorization.
</description>
</property>
<!--security conf -->
<property>
<name>hadoop.security.authorization</name>
<value>false</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>simple</value>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/sdb1/hadoop/name1,/sdb1/hadoop/name2,/sdb1/hadoop/name3</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/var/hadoop/data1,/var/hadoop/data2,/var/hadoop/data3</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>100000</value>
</property>
<property>
<name>dfs.datanode.socket.write.timeout</name>
<value>3600000</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.socket.timeout</name>
<value>3600000</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>oppo-hdp1</value>
<description>提供服务的NS逻辑名称,与core-site.xml里的对应</description>
</property>
<property>
<name>dfs.ha.namenodes.oppo-hdp1</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.oppo-hdp1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
<description>该类用来判断哪个namenode处于生效状态</description>
</property>
<property>
<name>dfs.namenode.rpc-address.oppo-hdp1.nn1</name>
<value>namenode0:7001</value>
</property>
<property>
<name>dfs.namenode.http-address.oppo-hdp1.nn1</name>
<value>namenode0:7005</value>
</property>
<property>
<name>dfs.namenode.rpc-address.oppo-hdp1.nn2</name>
<value>datanode0:7001</value>
</property>
<property>
<name>dfs.namenode.http-address.oppo-hdp1.nn2</name>
<value>datanode0:7005</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:7003</value>
</property>
<property>
<name>dfs.datanode.balance.bandwidthPerSec</name>
<value>10485760</value>
<description>
Specifies the maximum amount of bandwidth that each datanode
can utilize for the balancing purpose in term of
the number of bytes per second.
</description>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
<description>
Whether automatic failover is enabled. See the HDFS High
Availability documentation for details on automatic HA
configuration.
</description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://datanode0:8485;datanode1:8485;datanode2:8485/oppo-hdp1</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/var/hadoop/journal</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence(root:22)</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>
b) 执行HA转换
#zookeeper 集群的搭建与启动略
#停止dfs
${HADOOP_HOME}/sbin/start-dfs.sh
#格式化zkfc,在namenode0
${HADOOP_HOME}/bin/hdfs zkfc -formatZK
#启动zkfc
${HADOOP_HOME}/sbin/hadoop-daemon.sh start zkfc
ssh datanode0 "${HADOOP_HOME}/sbin/hadoop-daemon.sh start zkfc"
#初始化dfs.namenode.shared.edits.dir目录
${HADOOP_HOME}/bin/hdfs namenode -initializeSharedEdits
#启动所有 journalnode
ssh datanode0 "hostname;${HADOOP_HOME}/sbin/hadoop-daemon.sh start journalnode"
ssh datanode1 "hostname;${HADOOP_HOME}/sbin/hadoop-daemon.sh start journalnode"
ssh datanode2 "hostname;${HADOOP_HOME}/sbin/hadoop-daemon.sh start journalnode"
#启动namenode
${HADOOP_HOME}/sbin/hadoop-daemon.sh start namenode
ssh datanode0 "hostname;${HADOOP_HOME}/sbin/hadoop-daemon.sh start namenode -bootstrapStandby"
#启动dfs
${HADOOP_HOME}/sbin/start-dfs.sh
至此hdfs HA升级完毕 通过浏览器http://namenode0:7005/,http://datanode0:7005/ 可以查看 hdfs界面
3、YARN部署(无HA)
yarn的部署比较简单主要配置 两个文件
a)、yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>namenode0:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>namenode0:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>namenode0:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>namenode0:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>namenode0:7013</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<description>Amount of physical memory, in MB, that can be allocated
for containers.</description>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<description>Number of CPU cores that can be allocated
for containers.</description>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>8</value>
</property>
<property>
<description>List of directories to store localized files in. An
application's localized file directory will be found in:
${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
Individual containers' work directories, called container_${contid}, will
be subdirectories of this.
</description>
<name>yarn.nodemanager.local-dirs</name>
<value>/var/hadoop/nm-local-dir1,/var/hadoop/nm-local-dir2,/var/hadoop/nm-local-dir3</value>
</property>
</configuration>
b)、mapred.site.xml
此文件的参数在2.x版本中被重构了一遍,但也兼容老版本配置,但不推荐使用
<?xml version="1.0"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx512m</value>
</property>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>256</value>
</property>
<property>
<name>mapreduce.cluster.local.dir</name>
<value>/var/hadoop/mapred1/local,/var/hadoop/mapred2/local,/var/hadoop/mapred3/local</value>
</property>
<property>
<name>mapreduce.job.jvm.numtasks</name>
<value>15</value>
<description>How many tasks to run per jvm. If set to -1, there is no limit.</description>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>15</value>
<description>The default number of parallel transfers run by reduce during the copy(shuffle) phase.</description>
</property>
</configuration>
yarn无需升级,只需修改配置 直接启动即可
#登入namenode0
$HADOOP_HOME/sbin/start-yarn.sh
4、总结
至此,hadoop1.x升级到2.x完毕,在升级hdfs时比较曲折,按官网以及网上比较有名的博客所述都没有成功,可能是环境原因,也可能是博客或官网说的有问题,最终通过摸索,分了两步走才升级成功:1、直接copy1.x的hdfs配置文件到2.x,直接运行升级命令 2、从无HA的2.0转为有HA的2.0