背景:现将hive 数据从aws迁移至阿里云
按照网上的方法踩了大坑最后总结了区别
分两种情况:
事务参数:TBLPROPERTIES ('transactional'='true') #########建表开启事务的情况下 1 方法一export/import export table collection_server_have_trans to '/tmp/collection_server_have_trans/'; scp 到另一个集群下 import from '/tmp/collection_server_have_trans'; 结果:读不出数据 2 方法二: 直接get hdfs,拷贝到另一个集群,然后get上传,然后repair 结果,读不出数据 3 将上述两个办法在一个集群内做实验是可以的,难道是跨集群导致的问题?还是版本的原因呢? #########建表不开启事务的情况下 1 方法一export/import export table collection_server_have_trans to '/tmp/collection_server_have_trans/'; scp 到另一个集群下 import from '/tmp/collection_server_have_trans'; 结果:读的出数据 2方法二: 直接拷贝分区,上传 然后repair 结果,读的出数据 3。将上述两个办法在一个集群内做实验也是可以的
通过上面的实验对比不难得出结论:
-
在 开启事务 的情况下:本集群内export/import 拷贝hdfs的方法都可用,跨集群不可使用该方法
-
在 不开启事务 的情况下:本集群内export/import 拷贝hdfs的方法都可用,跨集群也可使用该方法
具体操作流程如下:
1.登录目前hive,导出数据
源端导出数据 export table collection_server to '/tmp/collection_server' ; 查看 [hadoop@ip-172-31-33-52 tmp]$ hadoop fs -ls /tmp/collection_server Found 7 items -rwxrwxrwt 1 hadoop hadoop 12971 2018-08-21 07:37 /tmp/collection_server/_metadata drwxrwxrwt - hadoop hadoop 0 2018-08-21 07:37 /tmp/collection_server/partition_day=180816 drwxrwxrwt - hadoop hadoop 0 2018-08-21 07:37 /tmp/collection_server/partition_day=180817 drwxrwxrwt - hadoop hadoop 0 2018-08-21 07:37 /tmp/collection_server/partition_day=180818 drwxrwxrwt - hadoop hadoop 0 2018-08-21 07:37 /tmp/collection_server/partition_day=180821 drwxrwxrwt - hadoop hadoop 0 2018-08-21 07:37 /tmp/collection_server/partition_day=180823 drwxrwxrwt - hadoop hadoop 0 2018-08-21 07:37 /tmp/collection_server/partition_day=180824
2.从hdfs中下载并拷贝至阿里云hfds
hfds文件系统下载到本地 [hadoop@ip-172-31-33-52 tmp]$ hadoop fs -get /tmp/collection_server /tmp/ 拷贝:略 上传至阿里云hdfs文件系统 [hadoop@ip-172-31-46-220 ~]$ hadoop fs -put collection_server /tmp/ 查看: [hadoop@ip-172-31-46-220 ~]$ hadoop fs -ls /tmp/ Found 4 items drwxr-xr-x - hadoop hadoop 0 2018-08-21 07:47 /tmp/collection_server drwxrwxrwx - mapred mapred 0 2018-06-22 09:04 /tmp/hadoop-yarn drwx-wx-wx - hive hadoop 0 2018-07-20 09:43 /tmp/hive -rw-r--r-- 1 hadoop hadoop 22 2018-06-27 05:40 /tmp/tt.txt
3.导入新的阿里云hive集群中
hive> import from '/tmp/collection_server'; Copying data from hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180816 Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180816/delta_0000001_0000010 Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180816/delta_0000011_0000020 Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180816/delta_0000021_0000030 Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180816/delta_0000031_0000040 Copying data from hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817 Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000041_0000050 Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000051_0000060 Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000061_0000070 Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000071_0000080 Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000081_0000090 Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000091_0000100 Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000101_0000110 Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000111_0000120 Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000121_0000130 Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000131_0000140 Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000141_0000150 Copying file: hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180817/delta_0000151_0000160 Copying data from hdfs://ip-172-31-46-220.ap-south-1.compute.internal:8020/tmp/collection_server/partition_day=180818 检查: hive> select count(*) from collection_server; Query ID = hadoop_20180821074918_5ad9d0e5-9bce-4c52-b1dc-93ddd9ef766f Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1529658302761_18066) ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 30 30 0 0 0 0 Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 7.60 s ---------------------------------------------------------------------------------------------- OK 21570
更多迁移方法参考:
https://blog.csdn.net/u9999/article/details/78830818
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29096438/viewspace-2212476/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/29096438/viewspace-2212476/