Impala Complex Types,map,array

impala不支持直接insert complex type data,需要从hive中或者从parquet file中取出来.而且需要使用parquet格式的表.
官方文档如下:

Because the Impala INSERT statement does not currently support creating new data with complex type columns, or copying existing complex type values from one table to another, you primarily use Impala to query Parquet tables with complex types where the data was inserted through Hive, or create tables with complex types where you already have existing Parquet data files.

If you have created a Hive table with the Parquet file format and containing complex types, use the same table for Impala queries with no changes. If you have such a Hive table in some other format, use a Hive CREATE TABLE AS SELECT … STORED AS PARQUET or INSERT … SELECT statement to produce an equivalent Parquet table that Impala can query.

If you have existing Parquet data files containing complex types, located outside of any Impala or Hive table, such as data files created by Spark jobs, you can use an Impala CREATE TABLE … STORED AS PARQUET statement, followed by an Impala LOAD DATA statement to move the data files into the table. As an alternative, you can use an Impala CREATE EXTERNAL TABLE statement to create a table pointing to the HDFS directory that already contains the data files.

具体demo步骤如下:

创建hive表
create  table array_map_1(id string , column_name array<string>,info map<string, string>)
     ROW FORMAT DELIMITED
     FIELDS TERMINATED BY '|'
     COLLECTION ITEMS TERMINATED BY ','
     MAP KEYS TERMINATED BY ':';
导入数据到hive中
LOAD DATA local INPATH '/home/weblog/demodata' INTO TABLE array_map_1 ;
# 数据如下
1|体育,娱乐,科技,社会|name:tom,age:16,city:beijing
2|娱乐,社会,NBA,时尚,历史|name:jerry,age:17
3|汽车,NBA,健康|name:alice,city:shanghai
4|视频,政务|name:harry,age:16,city:jinan
5|体育,军事|name:danny,age:17,city:
6|体育,政务,军事,话题,历史,社会|name:liming,city:dalian

因为hive中的parquet表map,array不支持直接导入txt之类的非parquet格式数据,我也暂时不知道怎么创建parquet格式文件,只能暂时创建普通表格导入数据,再select到一个parquet表中.

创建新的parquet表格并导入数据
create table array_map(id string,column_name array<string>,info map<string, string>) stored as parquet;
insert into array_map select id,column_name,info from array_map_1;
查看array_map的hdfs路径
describe formatted array_map;

...
| Location: | hdfs://localhost:20500/test-warehouse/tpch_nested_parquet.db/array_map 
...
查看hdfs具体路径
$ hdfs dfs -ls hdfs://localhost:20500/test-warehouse/tpch_nested_parquet.db/array_map
Found 4 items
-rwxr-xr-x   3 dev supergroup  171298918 2015-09-22 23:30 hdfs://localhost:20500/blah/tpch_nested_parquet.db/array_map/000000_0
...
在impala shell中把数据拉过去
CREATE TABLE array_map
  LIKE PARQUET 'hdfs://localhost:20500/blah/tpch_nested_parquet.db/array_map/000000_0'
  STORED AS PARQUET location 'hdfs://localhost:20500/blah/tpch_nested_parquet.db/array_map/';

最终结果显示
这里写图片描述

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值