impala不支持直接insert complex type data,需要从hive中或者从parquet file中取出来.而且需要使用parquet格式的表.
官方文档如下:
Because the Impala INSERT statement does not currently support creating new data with complex type columns, or copying existing complex type values from one table to another, you primarily use Impala to query Parquet tables with complex types where the data was inserted through Hive, or create tables with complex types where you already have existing Parquet data files.
If you have created a Hive table with the Parquet file format and containing complex types, use the same table for Impala queries with no changes. If you have such a Hive table in some other format, use a Hive CREATE TABLE AS SELECT … STORED AS PARQUET or INSERT … SELECT statement to produce an equivalent Parquet table that Impala can query.
If you have existing Parquet data files containing complex types, located outside of any Impala or Hive table, such as data files created by Spark jobs, you can use an Impala CREATE TABLE … STORED AS PARQUET statement, followed by an Impala LOAD DATA statement to move the data files into the table. As an alternative, you can use an Impala CREATE EXTERNAL TABLE statement to create a table pointing to the HDFS directory that already contains the data files.
具体demo步骤如下:
创建hive表
create table array_map_1(id string , column_name array<string>,info map<string, string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':';
导入数据到hive中
LOAD DATA local INPATH '/home/weblog/demodata' INTO TABLE array_map_1 ;
# 数据如下
1|体育,娱乐,科技,社会|name:tom,age:16,city:beijing
2|娱乐,社会,NBA,时尚,历史|name:jerry,age:17
3|汽车,NBA,健康|name:alice,city:shanghai
4|视频,政务|name:harry,age:16,city:jinan
5|体育,军事|name:danny,age:17,city:
6|体育,政务,军事,话题,历史,社会|name:liming,city:dalian
因为hive中的parquet表map,array不支持直接导入txt之类的非parquet格式数据,我也暂时不知道怎么创建parquet格式文件,只能暂时创建普通表格导入数据,再select到一个parquet表中.
创建新的parquet表格并导入数据
create table array_map(id string,column_name array<string>,info map<string, string>) stored as parquet;
insert into array_map select id,column_name,info from array_map_1;
查看array_map的hdfs路径
describe formatted array_map;
...
| Location: | hdfs://localhost:20500/test-warehouse/tpch_nested_parquet.db/array_map
...
查看hdfs具体路径
$ hdfs dfs -ls hdfs://localhost:20500/test-warehouse/tpch_nested_parquet.db/array_map
Found 4 items
-rwxr-xr-x 3 dev supergroup 171298918 2015-09-22 23:30 hdfs://localhost:20500/blah/tpch_nested_parquet.db/array_map/000000_0
...
在impala shell中把数据拉过去
CREATE TABLE array_map
LIKE PARQUET 'hdfs://localhost:20500/blah/tpch_nested_parquet.db/array_map/000000_0'
STORED AS PARQUET location 'hdfs://localhost:20500/blah/tpch_nested_parquet.db/array_map/';
最终结果显示