在HortonWorks HDP 2.1 和2.2 集群间进行数据迁移包括（Hive数据表）

最新推荐文章于 2023-06-14 13:20:09 发布

tiimfei

最新推荐文章于 2023-06-14 13:20:09 发布

阅读量1.4k

点赞数

分类专栏： hadoop 文章标签： hortonworks 集群数据迁移 hive

本文链接：https://blog.csdn.net/tiimfei/article/details/43936945

版权

hadoop 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

我之前搭建了一个基于HDP 2.1 的集群。现在又根据需要重新搭建了一个新的HDP2.2版本的集群准备做新的生产环境。 HDP2.1 集群上大约有600GB的数据，主要以Hive数据表格的形式存在。因此需要将HDP2.1集群的数据迁移到新集群上来。

实施的思路参考了这篇文章：

https://amalgjose.wordpress.com/2013/10/11/migrating-hive-from-one-hadoop-cluster-to-another-cluster-2/

1) Install hive in the new hadoop cluster 在新集群中安装Hive，及meta store等
2) Transfer the data present in the hive metastore directory (/user/hive/warehouse) to the new hadoop
cluster 通过hadoop distcp 命令将数据文件夹从老集群copy到新集群。

具体命令

hadoop distcp hdfs://[oldcluster.fqdn]:8020/user/hive hdfs://[newcluster.fqdn]:8020/user/

hadoop distcp hdfs://[oldcluster.fqdn]:8020/apps/hive/warehouse hdfs://[newcluster.fqdn]:8020/apps/hive/

distcp 命令的说明http://hadoop.apache.org/docs/r1.2.1/distcp.html

3) take the mysql metastore dump.

在老集群的hive metastore所在的节点上使用数据库dump工具将metastore 库的数据都dump出来。 hive metastore存储了hive表的结构，数据库信息等元数据。我所使用的hive metastore 是mysql 数据库因此使用下面的命令行

Database Type	Backup	Restore
MySQL	`mysqldump $dbname > $outputfilename.sql` For example:`mysqldump hive > /tmp/mydir/backup_hive.sql`	`mysql $dbname < $inputfilename.sql` For example: `mysql hive < /tmp/mydir/backup_hive.sql`
Postgres	`sudo -u $username pg_dump $databasename > $outputfilename.sql` For example: `sudo -u postgres pg_dump hive > /tmp/mydir/backup_hive.sql`	`sudo -u $username psql $databasename < $inputfilename.sql` For example: `sudo -u postgres psql hive < /tmp/mydir/backup_hive.sql`
Oracle	Connect to the Oracle database using `sqlplus` export the database: `exp username/password@database full=yes file=output_file.dmp`	Import the database: `imp username/password@database ile=input_file.dmp`

4) Install mysql in the new hadoop cluster （这一步可忽略，因为安装HDP2.2集群时本来就已经安装了hive 及hive metastore

5) Open the hive mysql-metastore dump using text readers such as notepad, notepad++ etc and search for

hdfs://ip-address-old-namenode:port and replace with hdfs://ip-address-new-namenode:port and save it.

将刚才导出的数据库dump 文件，复制到本地，使用文本编辑器查找替换老集群的namenode的域名，为新集群的namenode

Where ip-address-old-namenode is the ipaddress of namenode of old hadoop cluster and ip-address-
new-namenode is the ipaddress of namenode of new hadoop cluster.

6) After doing the above steps, restore the editted mysql dump into the mysql of new hadoop cluster.

将修改后的数据库dump文件复制到新集群的hive metastore所在的节点上。并使用mysql 命令行，将数据导入进去。

For example: mysql hive < /tmp/mydir/backup_hive.sql

7) Configure hive as normal and do the hive schema upgradations if needed.

这时，应该已经可以使用hive shell或者hue 来查看导入的数据表是否可以访问了。