【Hive】hive数据迁移

背景:现将hive 数据从aws迁移至阿里云

按照网上的方法踩了大坑最后总结了区别

分两种情况:

事务参数:TBLPROPERTIES ('transactional'='true')
#########建表开启事务的情况下
1 方法一export/import
export table collection_server_have_trans  to '/tmp/collection_server_have_trans/';
scp 到另一个集群下
import from '/tmp/collection_server_have_trans';
结果:读不出数据
2 方法二:
直接get hdfs,拷贝到另一个集群,然后get上传,然后repair
结果,读不出数据
3 将上述两个办法在一个集群内做实验是可以的,难道是跨集群导致的问题?还是版本的原因呢?
#########建表不开启事务的情况下
1 方法一export/import
export table collection_server_have_trans  to '/tmp/collection_server_have_trans/';
scp 到另一个集群下
import from '/tmp/collection_server_have_trans';
结果:读的出数据
2方法二:
直接拷贝分区,上传 然后repair
结果,读的出数据
3。将上述两个办法在一个集群内做实验也是可以的


通过上面的实验对比不难得出结论:

  1. 开启事务 的情况下:本集群内export/import 拷贝hdfs的方法都可用,跨集群不可使用该方法

  2. 不开启事务 的情况下:本集群内export/import 拷贝hdfs的方法都可用,跨集群也可使用该方法




具体操作流程如下:

1.登录目前hive,导出数据

源端导出数据
export table collection_server to '/tmp/collection_server' ;
查看
[hadoop@ip-172-31-33-52 tmp]$ hadoop fs -ls /tmp/collection_server
Found 7 items
-rwxrwxrwt   1 hadoop hadoop      12971 2018-08-21 07:37 /tmp/collection_server/_metadata
drwxrwxrwt   - hadoop hadoop          0 2018-08-21 07:37 /tmp/collection_server/partition_day=180816
drwxrwxrwt   - hadoop hadoop          0 2018-08-21 07:37 /tmp/collection_server/partition_day=180817
drwxrwxrwt   - hadoop hadoop          0 2018-08-21 07:37 /tmp/collection_server/partition_day=180818
drwxrwxrwt   - hadoop hadoop          0 2018-08-21 07:37 /tmp/collection_server/partition_day=180821
drwxrwxrwt   - hadoop hadoop          0 2018-08-21 07:37 /tmp/collection_server/partition_day=180823
drwxrwxrwt   - hadoop hadoop          0 2018-08-21 07:37 /tmp/collection_server/partition_day=180824


2.从hdfs中下载并拷贝至阿里云hfds

hfds文件系统下载到本地
[hadoop@ip-172-31-33-52 tmp]$ hadoop fs -get /tmp/collection_server /tmp/
拷贝:略
上传至阿里云hdfs文件系统
[hadoop@ip-172-31-46-220 ~]$ hadoop fs -put collection_server /tmp/
查看:
[hadoop@ip-172-31-46-220 ~]$ hadoop fs -ls /tmp/
Found 4 items
drwxr-xr-x   - hadoop hadoop          0 2018-08-21 07:47 /tmp/collection_server
drwxrwxrwx   - mapred mapred          0 2018-06-22 09:04 /tmp/hadoop-yarn
drwx-wx-wx   - hive   hadoop          0 2018-07-20 09:43 /tmp/hive
-rw-r--r--   1 hadoop hadoop         22 2018-06-27 05:40 /tmp/tt.txt


3.导入新的阿里云hive集群中

hive> import from '/tmp/collection_server';
Copying data from hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180816
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180816/delta_0000001_0000010
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180816/delta_0000011_0000020
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180816/delta_0000021_0000030
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180816/delta_0000031_0000040
Copying data from hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000041_0000050
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000051_0000060
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000061_0000070
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000071_0000080
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000081_0000090
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000091_0000100
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000101_0000110
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000111_0000120
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000121_0000130
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000131_0000140
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000141_0000150
Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000151_0000160
Copying data from hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180818
检查:
hive> select count(*) from collection_server;
Query ID = hadoop_20180821074918_5ad9d0e5-9bce-4c52-b1dc-93ddd9ef766f
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1529658302761_18066)
----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED     30         30        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 7.60 s     
----------------------------------------------------------------------------------------------
OK
21570



更多迁移方法参考:

https://blog.csdn.net/u9999/article/details/78830818

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29096438/viewspace-2212476/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/29096438/viewspace-2212476/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值