HIVE存储格式:
1.txtfiel格式存储:
存储方式为行存储,磁盘开销大 ,数据解析开销大,但使用这种方式,hive不会对数据进行切分,从而无法对数据进行并行操作。
例子:CREATE EXTERNAL TABLE page_view_stg(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User',
country STRING COMMENT 'country of origination')
COMMENT 'This is the staging page view table'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '44' LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/opt/hive2.1/data/page_view';
2、sequencefile
二进制文件,以<key,value>的形式序列化到文件中;
存储方式:行存储;
可分割 压缩;
一般选择block压缩;
优势是文件和Hadoop api中的mapfile是相互兼容的;
例子:CREATE TABLE page_view (
viewTime INT,
userid BIGINT,
page_url STRING,
referrer_url STRING,
ip STRING COMMENT 'IP Address of the User'
) COMMENT 'This is the page view table' PARTITIONED BY (dt STRING, country STRING) CLUSTERED BY (userid) SORTED BY (viewTime) INTO 32 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY '1' COLLECTION ITEMS TERMINATED BY '2' MAP KEYS TERMINATED BY '3' STORED AS SEQUENCEFILE;
3、rcfile
存储方式:数据按行分块 每块按照列存储;
压缩快 快速列存取;
读记录尽量涉及到的block最少;
读取需要的列只需要读取每个row group 的头部定义;
读取全量数据的操作 性能可能比sequencefile没有明显的优势,
4、orcfile
存储方式:数据按行分块 每块按照列存储;
压缩快 快速列存取;
效率比rcfile高,是rcfile的改良版本。
例子.create external table `driver_butie_order_info`(
`drive_license_number` bigint COMMENT '1',
`drive_num` bigint COMMENT '2',
`order_num` bigint COMMENT '3',
`coupon_num` bigint COMMENT '4',
`passenger_num` bigint COMMENT '5',
`total_coupon_amount` bigint COMMENT '6')
COMMENT '表描述'
PARTITIONED BY (
`event_day` string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS orcfile
-------hive建外表关联hbase----
hive> CREATE EXTERNAL TABLE navi.test(
rowkey string,
ename string
) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:ename ")
TBLPROPERTIES ("hbase.table.name" = "test")
;