Hive之数据迁移方案(实测)

最新推荐文章于 2024-08-26 09:44:09 发布

南风知我意丿

最新推荐文章于 2024-08-26 09:44:09 发布

阅读量1.3k

点赞数 1

分类专栏： Hive 文章标签： hive hadoop hdfs

本文链接：https://blog.csdn.net/Lzx116/article/details/126539520

版权

Hive 专栏收录该内容

3 篇文章 1 订阅

订阅专栏

文章目录

Hive的迁移涉及两个技术点：

Hive的迁移涉及两个技术点：

1.仅迁移元数据

参考：网易元数据管理 - hive 元数据迁移与合并

2.元数据及Hive数据全量迁移

主要流程
1.将旧集群的hive数据导出至其hdfs中
2.将旧集群hdfs中的导出数据下载到本地中
3.将本地的导出数据上传至新的集群hdfs中
4.将新集群hdfs中的数据导入至新集群中的hive中

2.1 全表迁移

2.1.1 旧集群

设置hive默认数据库

vim ~/.hiverc
use export_db;

hdfs dfs -mkdir -p /tmp/export_db_export

生成、执行导出脚本

hive -e "show tables;" | awk  '{printf "export table %s to |/tmp/export_db_export/%s|;\n",$1,$1}' | sed "s/|/'/g" | grep -v tab_name > ~/export.hql

hive -f ~/export.hql

发送数据

sudo scp -r export_db_export/ hr@192.168.1.xx:/opt/lzx

2.1.2 新集群

上传数据到hdfs

hdfs dfs -put ~/export_db /tmp/export_db_export

生成、执行导入脚本

cp ~/export.sql ~/import.sql
sed -i 's/export /import /g' ~/import.sql
sed -i 's/ to / from /g' ~/import.sql

hive -f ~/import.sql

2.2 仅部分分区迁移（主要步骤）

2.1.1 旧集群

生成、执行导出脚本

vim export.hql

export table hr_task_scan_official_3hh partition(ds='20200409')  to '/tmp/export_db_export/20200409';
export table hr_task_scan_official_3hh partition(ds='20200410')  to '/tmp/export_db_export/20200410';
export table hr_task_scan_official_3hh partition(ds='20200411')  to '/tmp/export_db_export/20200411';
export table hr_task_scan_official_3hh partition(ds='20200412')  to '/tmp/export_db_export/20200412';
export table hr_task_scan_official_3hh partition(ds='20200413')  to '/tmp/export_db_export/20200413';
export table hr_task_scan_official_3hh partition(ds='20200414')  to '/tmp/export_db_export/20200414';

hive -e ~/export.hql

2.1.2 新集群

无需建表

生成、执行导入脚本

vim import.sql

import table hr_task_scan_official_3hh from '/tmp/export_db_export/20200409';
import table hr_task_scan_official_3hh from '/tmp/export_db_export/20200410';
import table hr_task_scan_official_3hh from '/tmp/export_db_export/20200411';
import table hr_task_scan_official_3hh from '/tmp/export_db_export/20200412';
import table hr_task_scan_official_3hh from '/tmp/export_db_export/20200413';
import table hr_task_scan_official_3hh from '/tmp/export_db_export/20200414';

hive -f ~/import.sql

2.3 beeline连接hive并进行数据迁移

beeline 生成导出脚本

beeline -u jdbc:hive2://cdh01:10000 -e "use export_db;show tables;"| awk '{printf "export table %s to |/tmp/export_db_export/%s|;\n",$2,$2}' | sed "s/|/'/g"|sed '1,3d'|sed '$d' > ~/export.hql

执行脚本

sed -i '1i use export_db;' ~/export.hql
beeline -u jdbc:hive2://cdh01:10000 -n hdfs -f ~/export.hql

发数据到新集群hdfs

# 新的集群hdfs目录需要提前创建
hadoop distcp hdfs://cdh01:8020/tmp/export_db_export/ hdfs://cdh02:8020/tmp/export_db_export

生成导入脚本

cp ~/export.hql ~/import.hql
sed -i 's/export /import /g' ~/import.hql
sed -i 's/ to / from /g' ~/import.hql
sed -i '1d' ~/import.hql
sed -i '1i use import_db;' ~/import.hql

导入

create database import_db;
beeline -u jdbc:hive2://cdh02:10000 -n hdfs -f ~/import.hql

南风知我意丿

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录