场景:旧集群的数据要迁移到新集群上面
hadoop distcp [option] hdfs://master_ip:8020/hive/warehouse/xxx.db/tab_name hdfs://master_ip:8020/hive/warehouse/xxx.db/tab_name
option的内容可以hadoop distcp回车就可以查看帮助了,这里不用多解释了吧。
master_ip:填集群master的IP
tab_name:天要迁移表的名字
路径要保证正确,如果你不知道表的路径可以用desc formatted db_name.tab_name来看。location就是正确的路径,把test01换成master_ip:port即可。
例如:
hive> desc formatted aidemo.ac_ref;
OK
# col_name data_type comment
pkg_name string
label string
# Detailed Table Information
Database: aidemo
Owner: hchou
CreateTime: Wed Jun 07 15:34:35 CST 2017
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://test01/hive/warehouse/aidemo.db/ac_ref
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1496820875
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \t
serialization.format \t
Time taken: 0.078 seconds, Fetched: 28 row(s)