HiveSQL

最新推荐文章于 2025-02-23 00:00:00 发布

Mr.Persimmon

最新推荐文章于 2025-02-23 00:00:00 发布

阅读量119

点赞数

分类专栏：基于阿里云ECS服务器的大数据开发草记文章标签： hive

本文链接：https://blog.csdn.net/NoBuggie/article/details/117364582

版权

HiveSQL LZO压缩分区表数据加载索引创建

关键词由CSDN通过智能技术生成

基于阿里云ECS服务器的大数据开发草记专栏收录该内容

7 篇文章 0 订阅

订阅专栏

HiveSQL

1、创建支持LZO压缩的分区表

create external table 表名称(
	`字段名称` 字段类型
)
partitioned BY(`dt` string) --按照时间创建分区
row format delimited fields terminated by '\t' --指定分隔符为\t
stored AS --指定存储方式，读数据采用LZOTextINputFormat，用于支持lzo压缩
	inputformat 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
	outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location '/warehouse/...' --指定数据在HDFS上

备注：

outputformat输出格式

HiveIgnoreKeyTextOutputFormat replaces key with null before feeding the to TextOutputFormat.RecordWriter

采用日期进行分区
默认情况下指定分隔符为 \n

2、加载数据

load data inpath 'HDFS上的数据路径' into table 表名称 partition(dt='时间');

3、为LZO压缩文件创建索引

hadoop jar (hadoop-lzo-0.4.20.jar的路径) com.hadoop.compression.lzo.DistributedLzoIndexer -Dmapreduce.job.queuename=(hive执行队列名称) (存储到hdfs上的路径)