hive sql对hdfs的操作最终都会转化为mr任务,下面介绍如何将已经存在的hdfs文件“导入”hive表,很简单
条件及要求:
1)hdfs文件为经过lzo压缩的seqFile
2)seqFile数据样例
127.0.0.1 ae8186613401633040361974|1340163304040|0.1|hsf.qa.test|yuanfeng|test_07_01
其中key是hostIp, value为一条日志,格式为:traceid|time|rpcid|appname|querykey|msg
3)hive表是外在分区表
步骤1. 建立hive表
CREATE EXTERNAL TABLE biz_eagleeye (traceId STRING, time STRING, rpcId STRING,
appName STRING, queryKey STRING, msg STRING, kvMap STRING)
PARTITIONED BY(pt STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\|'
STORED AS SEQUENCEFILE;
其中EXTERNAL和PARTITIONED关键字指明表为外表和分区表,STORED AS SEQUENCEFILE是专门指定加载seqFile数据源的,如果是普通文本可换成TEXTFILE