常用的基本数据类型
基本数据类型 | 所占字节 |
int | |
boolean | |
float | |
double | |
string |
复杂数据类型
复杂数据类型 | 说明 |
array | array类型是由一系列相同数据类型的元素组成。并且可以通过下表来进行访问。注意:下表从0开始计 |
map | map包含key-value键值对,可以通过key来访问元素 |
struct | struct可以包含不同数据类型元素。相当于一个对象结构。可以通过对象属性来访问 |
数组类型
建表语句:create external table xx(xx array<int>)row format delimited fields terminated by '\t' collection items teminated by ',' location '数据路径';
注:1.ollection items teminated by ',' 表示array数组元素之间用 ‘,’ 隔开
2.建表是可同时建立多个array数组。例:
元数据:
100,200,300 tom,jary
200,300,500 rose,jack
建表语句:
create external table ex1(info1 array<int>,info2 array<string>) row format delimited fields terminated by '\t' collection items terminated by ',' location '/ex';
结果:
map类型
案例一
元数据:
tom,23
rose,25
jary,28
建表语句:
create external table m1 (vals map<string,int>) row format delimited fields terminated by '\t' map keys terminated by ',' location '/map';
查询语句:
select vals['tom'] from m1;
案列二,要求查询tom这个人都浏览了哪些网站,并且为null的值不显示
源数据(分隔符为空格):
tom 192.168.234.21
rose 192.168.234.21
tom 192.168.234.22
jary 192.168.234.21
tom 192.168.234.24
tom 192.168.234.21
rose 192.168.234.21
tom 192.168.234.22
jary 192.168.234.21
tom 192.168.234.22
tom 192.168.234.23
建表语句
create external table ex (vals map<string,string>) row format delimited fields terminated by '/t' map keys terminated by ' ' location '/ex';
注意:map类型,列的分割符必须是\t
查询语句
select vals['tom'] from ex where vals['tom'] is not null;
如果想做去重工作,可以调用distinct内置函数
select distinct(ip) from (select vals['tom'] ip from ex where vals['tom'] is not null)ex1;select distinct(vals['tom']) from m2 where vals['tom'] is not null;
struct类型
元数据:
tom 23
rose 22
jary 26
建表语句:
create external table ex (vals struct<name:string,age:int>)row format delimited collection items terminated by ' ' location '/ex';
查询语句:
select vals.age from ex where vals.name='tom';