复杂数据类型
-
Array:数组类型,对应了Java中的数组或者集合类型
-
原始数据(注意中间有空格)
2,3,4,1,3,4 3,4,2,4,5,6,2,3
4,5,7,4,7,3,2 6,3,3,6,2,5,2
7,3,6,8 2,4,47,7,2
4,5,7,9,0,3 2,4,5,6,8,5
-
建表:
create table num(nums1 array<int>,nums2 array<int>) row format delimited fields terminated by ' ' collection items terminated by ',';
-
加载数据:
load data local inpath '/home/hivedata/arr.txt' into table num;
-
非空查询:
select nums1[6] from num where nums1[6] is not null;
-
-
Map:映射类型,对应了Java中的Map类型
-
原始数据:
1 tom,15
2 sam,18
3 jack,20
4 lucy,16
5 mark,22 -
建表:
create table infos(id int,info map<string,int>) row format delimited fields terminated by ' ' map keys terminated by ',';
-
加载数据:
load data local inpath '/home/hivedata/info.txt' into table infos;
-
非空查询:
select nums1['jack'] from infos where nums1['jack'] is not null;
-
-
Struct:结构体类型,对应了Java中的对象
-
原始数据:
1 tom,15,male alex,16,male
2 amy,14,female bob,19,male
3 lucy,15,female lily,16,female -
建表:
create table groups(groupid int,m1 struct<name:string,age:int,gender:string>,m2 struct<name:string,age:int,gender:string>) row format delimited fields terminated by ' ' collection items terminated by ',';
-
加载数据:
load data local inpath '/home/hivedata/group.txt' into table groups;
-
查询数据:
select m1.age from groups;
-
函数
-
为了对数据进行较为方便地操作,Hive提供了大量的内置运算符和内置函数:数学、类型转换、日期、条件、字符串、聚合等
-
expolde:将数组中的每一个元素提取出来成为单独一行
-
实例:单词统计
-
原始数据:
hello tom hello bob hello joy
hello rose hello joy hello
jerry hello tom
hello rose hello joy -
建表(外部表):
create external table words (word array<string>) row format delimited collection items terminated by ' ' location '/words';
-
查询每个单词出现的次数:
select w,count(w) from (select explode(word) w from words) ws group by w;
-
统计次数并且把结果写到本地目录中:
insert overwrite local directory '/home/hivedata/' row format delimited fields terminated by '\t' select w,count(w) from (select explode(word) w from words) ws group by w;
-
-
Hive中的自定义函数分为三大类
-
UDF(User Defined Function):用户定义函数,定入(每次都
- split:一行输入一行输出
-
UDAF
- Max(age):多行输入、一行输出
-
UDTF
- explode:一行输入、多行输出
-
自定义函数:
-
package cn.tedu.udf; import org.apache.hadoop.hive.ql.exec.UDF; public class RepeatString extends UDF { // 覆盖evaluate方法 // 为什么父类中没有将此方法设计为抽象方法 // 原因:业务不同,返回值和参数类型就不能确定 public String evaluate(String str,int num){ StringBuilder sb = new StringBuilder(); for (int i = 0; i < num; i++) { sb.append(str); } return sb.toString(); } }
-
打成jar包后导入虚拟机中
-
add jar /hive_test-1.0-SNAPSHOT.jar
-
create temporary function stringrepeat as 'cn.tedu.udf.RepeatString';
-
select stringrepeat('abcd',5);
-
-