本文通过实际案例描述如何将一个集群中的Trafodion表数据迁移到另外一个集群的Trafodion表,利用Trafodion的数据导入导出功能。
1 首先在源集群与目标集群有相同结构的两张表,源集群表有数据,而目标集群的表为空表,
SQL>showddl tbl_src;
CREATE TABLE TRAFODION.SEABASE.TBL_SRC
(
A INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
SERIALIZED
, B VARCHAR(10) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
, C DATE DEFAULT NULL NOT SERIALIZED
, D TIMESTAMP(6) DEFAULT NULL NOT SERIALIZED
, E NUMERIC(10, 5) DEFAULT NULL NOT SERIALIZED
, PRIMARY KEY (A ASC)
)
SALT USING 4 PARTITIONS
ATTRIBUTES ALIGNED FORMAT
HBASE_OPTIONS
(
DATA_BLOCK_ENCODING = 'FAST_DIFF',
COMPRESSION = 'SNAPPY',
MEMSTORE_FLUSH_SIZE = '536870912'
)
;
SQL>select * from tbl_src;
A B C D E
----------- ---------- ---------- -------------------------- ------------
2 DEF 2017-06-09 2017-06-09 16:44:29.420805 456.78000
1 ABC 2017-06-09 2017-06-09 16:44:06.567763 123.45000
3 GHI 2017-06-09 2017-06-09 16:44:54.214790 789.12000
--- 3 row(s) selected.
SQL>select * from tbl_tgt;
--- 0 row(s) selected.
2 利用Trafodion自带UNLOAD功能将源表数据导出到一个HDFS目录,关于UNLOAD用法,可参考 http://blog.csdn.net/post_yuan/article/details/53405224
SQL>unload into '/bulkload/tbl_src' select * from tbl_src;
UTIL_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
Task: UNLOAD Status: Started
Task: EXTRACT Status: Started Time: 2017-06-09 16:49:37.122
Rows Processed: 3
Task: EXTRACT Status: Ended Time: 2017-06-09 16:49:37.659
Task: EXTRACT Status: Ended Elapsed Time: 00:00:00.537
--- SQL operation complete.
3 验证上述UNLOAD成功,
[trafodion@openstack-1 ~]$ hadoop fs -ls /bulkload/tbl_src
Found 1 items
-rw-r--r-- 3 trafodion trafodion 162 2017-06-09 16:49 /bulkload/tbl_src/file0-20170609084937-367
[trafodion@openstack-1 ~]$ hadoop fs -cat /bulkload/tbl_src/file0-20170609084937-367
2|DEF|2017-06-09|2017-06-09 16:44:29.420805|456.78000
1|ABC|2017-06-09|2017-06-09 16:44:06.567763|123.45000
3|GHI|2017-06-09|2017-06-09 16:44:54.214790|789.12000
4 将HDFS文件get到源集群本地,
[trafodion@openstack-1 ~]$ hadoop fs -get /bulkload/tbl_src ./
[trafodion@openstack-1 ~]$ ll tbl_src/
total 4
-rw-r--r-- 1 trafodion trafodion 162 Jun 9 16:58 file0-20170609084937-367
5 将上述生成的文件移动到新的集群的本地路径,并使用odb工具向目标表进行导入工作,
关于odb使用,可参考 http://blog.csdn.net/post_yuan/article/details/54631881
[trafodion@openstack-2 bin64]$ ./odb64luo -d traf -u zz -p zz -l src=/home/trafodion/tbl_src/file0-20170609084937-367:tgt=trafodion.seabase.tbl_tgt:fs=\|:rows=5000:loadcmd=UL:truncate:parallel=4
odb [2017-06-09 17:04:02]: starting ODBC connection(s)... (1) 1 2 3 4
Connected to Trafodion
[0.0.0]--- command executed in 9.661s (prep 0.008s, exec 9.653s, fetch 0.000s/0.000s)
[1] 3 records inserted [commit]
[0] odb version 1.1.0 Load(2) statistics:
[0] Target table: TRAFODION.SEABASE.TBL_TGT
[0] Source: /home/trafodion/tbl_src/file0-20170609084937-367
[0] Pre-loading time: 23.110 s (00:00:23.110)
[0] Loading time: 0.081 s(00:00:00.081)
[0] Total records read: 3
[0] Total records inserted: 3
[0] Total number of columns: 5
[0] Total bytes read: 162
[0] Average input row size: 54.0 B
[0] ODBC row size: 69 B (data) + 40 B (len ind)
[0] Rowset size: 5,000
[0] Rowset buffer size: 532.23 KiB
[0] Load throughput (real data): 1.953 KiB/s
[0] Load throughput (ODBC): 2.496 KiB/s
[0] Reader Total/Wait Cycles: 1/0
odb [2017-06-09 17:04:26]: exiting. Session Elapsed time 23.220 seconds (00:00:23.220)
6 验证数据是否成功导入目标表,
SQL>select * from tbl_tgt;
A B C D E
----------- ---------- ---------- -------------------------- ------------
2 DEF 2017-06-09 2017-06-09 16:44:29.420805 456.78000
1 ABC 2017-06-09 2017-06-09 16:44:06.567763 123.45000
3 GHI 2017-06-09 2017-06-09 16:44:54.214790 789.12000
--- 3 row(s) selected.
至此,完成单表数据从集群向目标集群迁移的工作。