join操作
inner join:只返回连接条件匹配上的数据
outer join
left:左表为基准
right:右表为基准
full:左右两表数据都会查询出
select
e.empno, e.ename, e.deptno, d.dname
from emp e join dept d
on e.deptno = d.deptno;
HiveServer2 = HS2
HiveServer2:是一个server
默认端口号是10000,也可以修改端口为其他端口,如10086
启动hive的server
设置端口参数的地址https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties
key | value | desc |
---|---|---|
hive.server2.thrift.port | 指定的端口 | 用户可以指定端口启动 |
进入bin目录
[hadoop@JD bin]$ pwd
/home/hadoop/app/hive-1.1.0-cdh5.16.2/bin
执行启动命令
[hadoop@JD bin]$ hiveserver2
指定端口启动(先kill掉原来的进程再启动)
[hadoop@JD bin]$ ./hiveserver2 --hiveconf hive.server2.thrift.port=10086;
后台启动,并输出日志到指定文件
[hadoop@JD bin]$ nohup hiveserver2 > ~/log/hiveserver.log 2>&1 &
beeline:是一个客户端
连接server
启动beeline时,建议先cd $HIVE_HOME/bin中去,因为hive中有beeline,spark中也有beeline,只有进入到组件的bin目录下,才能确定启动的是该组件的客户端
连接server(JD:指的是server所在主机名 10000:server默认端口 db_hive1:数据库名 -n:指定用户启动)
[hadoop@JD bin]$ ./beeline -u jdbc:hive2://JD:10000/db_hive1 -n hadoop
scan complete in 1ms
Connecting to jdbc:hive2://JD:10000/db_hive1 #连接到server地址
Connected to: Apache Hive (version 1.1.0-cdh5.16.2) #连接到hive
Driver: Hive JDBC (version 1.1.0-cdh5.16.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.16.2 by Apache Hive
Hive复杂数据类型
对于复杂数据类型,掌握存和取即可
建表语句的row_format参数
地址:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
复杂类型地址https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types
array
array T泛型,array里的数据类型要一样
数据内容如下
[root@JD data]# cat hive_array.txt
pk beijing,shanghai,tianjin,hangzhou
jepson changchu,chengdu,wuhan,beijing
建表 导入数据
数据按照\t分割,数组间按照,分割
create table hive_array(name string, work_locations array<string>)
row format delimited
fields terminated by '\t'
COLLECTION ITEMS TERMINATED BY ',';
导入数据
0: jdbc:hive2://JD:10086/db_hive1> load data local inpath '/home/hadoop/data/hive_array.txt' into table hive_array;
查询
0: jdbc:hive2://JD:10086/db_hive1> select * from hive_array;
+------------------+----------------------------------------------+--+
| hive_array.name | hive_array.work_locations |
+------------------+----------------------------------------------+--+
| pk | ["beijing","shanghai","tianjin","hangzhou"] |
| jepson | ["changchu","chengdu","wuhan","beijing"] |
+------------------+----------------------------------------------+--+
2 rows selected (0.216 seconds)
查询表信息
0: jdbc:hive2://JD:10086/db_hive1> desc hive_array;
+-----------------+----------------+----------+--+
| col_name | data_type | comment |
+-----------------+----------------+----------+--+
| name | string | |
| work_locations | array<string> | |
+-----------------+----------------+----------+--+
2 rows selected (0.116 seconds)
取数组的第一个数据
0: jdbc:hive2://JD:10086/db_hive1> select name,work_locations[0] from hive_array;
+---------+-----------+--+
| name | _c1 |
+---------+-----------+--+
| pk | beijing |
| jepson | changchu |
+---------+-----------+--+
查询数组的长度
0: jdbc:hive2://JD:10086/db_hive1> select name,size(work_locations) from hive_array;
+---------+------+--+
| name | _c1 |
+---------+------+--+
| pk | 4 |
| jepson | 4 |
+---------+------+--+
查询数组包含tianjin的数据
0: jdbc:hive2://JD:10086/db_hive1> select name,work_locations from hive_array where array_contains(work_locations,'tianjin');
+-------+----------------------------------------------+--+
| name | work_locations |
+-------+----------------------------------------------+--+
| pk | ["beijing","shanghai","tianjin","hangzhou"] |
+-------+----------------------------------------------+--+
1 row selected (0.098 seconds)
map类型
MAP<primitive_type, data_type> k-v类型
数据内容
[root@JD data]# cat hive_map.txt
1,zhangsan,father:xiaoming#mother:xiaohuang#brother:xiaoxu,28
2,lisi,father:mayun#mother:huangyi#brother:guanyu,22
3,wangwu,father:wangjianlin#mother:ruhua#sister:jingtian,29
4,mayun,father:mayongzhen#mother:angelababy,26
建表语句
create table hive_map(
id int, name string, members map<string,string>, age int
)
row format delimited
fields terminated by ','
COLLECTION ITEMS TERMINATED BY '#'
MAP KEYS TERMINATED BY ':'
;
导入数据
load data local inpath '/home/hadoop/data/hive_map.txt' into table hive_map;
查询数据
0: jdbc:hive2://JD:10086/db_hive1> select * from hive_map;
+--------------+----------------+----------------------------------------------------+---------------+--+
| hive_map.id | hive_map.name | hive_map.members | hive_map.age |
+--------------+----------------+----------------------------------------------------+---------------+--+
| 1 | zhangsan | {"father":"xiaoming","mother":"xiaohuang","brother":"xiaoxu"} | 28 |
| 2 | lisi | {"father":"mayun","mother":"huangyi","brother":"guanyu"} | 22 |
| 3 | wangwu | {"father":"wangjianlin","mother":"ruhua","sister":"jingtian"} | 29 |
| 4 | mayun | {"father":"mayongzhen","mother":"angelababy"} | 26 |
+--------------+----------------+----------------------------------------------------+---------------+--+
查询map中的值
0: jdbc:hive2://JD:10086/db_hive1> select id,name,members['father'] as father,members['mother'] as mother,age from hive_map;
+-----+-----------+--------------+-------------+------+--+
| id | name | father | mother | age |
+-----+-----------+--------------+-------------+------+--+
| 1 | zhangsan | xiaoming | xiaohuang | 28 |
| 2 | lisi | mayun | huangyi | 22 |
| 3 | wangwu | wangjianlin | ruhua | 29 |
| 4 | mayun | mayongzhen | angelababy | 26 |
+-----+-----------+--------------+-------------+------+--+
查询map中的所有key
0: jdbc:hive2://JD:10086/db_hive1> select id,name,map_keys(members) as cation from hive_map;
+-----+-----------+--------------------------------+--+
| id | name | cation |
+-----+-----------+--------------------------------+--+
| 1 | zhangsan | ["father","mother","brother"] |
| 2 | lisi | ["father","mother","brother"] |
| 3 | wangwu | ["father","mother","sister"] |
| 4 | mayun | ["father","mother"] |
+-----+-----------+--------------------------------+--+
查询map中的所有值
0: jdbc:hive2://JD:10086/db_hive1> select id,name,map_values(members) as cation from hive_map;
+-----+-----------+-------------------------------------+--+
| id | name | cation |
+-----+-----------+-------------------------------------+--+
| 1 | zhangsan | ["xiaoming","xiaohuang","xiaoxu"] |
| 2 | lisi | ["mayun","huangyi","guanyu"] |
| 3 | wangwu | ["wangjianlin","ruhua","jingtian"] |
| 4 | mayun | ["mayongzhen","angelababy"] |
+-----+-----------+-------------------------------------+--+
查询出有兄弟的人以及他的兄弟是谁
分析:使用array_contains(map_keys(members),'value')查询出有兄弟的人,然后再使用map['key']查询出兄弟的姓名
常规写法
0: jdbc:hive2://JD:10086/db_hive1> select id,name,members['brother'] from hive_map where array_contains(map_keys(members),'brother');
+-----+-----------+---------+--+
| id | name | _c2 |
+-----+-----------+---------+--+
| 1 | zhangsan | xiaoxu |
| 2 | lisi | guanyu |
+-----+-----------+---------+--+
子查询写法
0: jdbc:hive2://JD:10086/db_hive1> select a.id,a.name,a.members['brother'] as brother from (select * from hive_map where array_contains(map_keys(members),'brother')) as a;
+-------+-----------+----------+--+
| a.id | a.name | brother |
+-------+-----------+----------+--+
| 1 | zhangsan | xiaoxu |
| 2 | lisi | guanyu |
+-------+-----------+----------+--+
structs类型
structs: STRUCT<col_name : data_type [COMMENT col_comment], …> 内部的数据类型可以不同
数据内容
[root@JD data]# cat hive_struct.txt
192.168.1.1#zhangsan:40
192.168.1.2#lisi:50
192.168.1.3#wangwu:60
建表语句
create table hive_struct(
ip string, info struct<name:string,age:int>
)
row format delimited
fields terminated by '#'
COLLECTION ITEMS TERMINATED BY ':'
;
导入数据
0: jdbc:hive2://JD:10086/db_hive1> load data local inpath '/home/hadoop/data/hive_struct.txt' into table hive_struct;
查询
0: jdbc:hive2://JD:10086/db_hive1> select * from hive_struct;
+-----------------+-------------------------------+--+
| hive_struct.ip | hive_struct.info |
+-----------------+-------------------------------+--+
| 192.168.1.1 | {"name":"zhangsan","age":40} |
| 192.168.1.2 | {"name":"lisi","age":50} |
| 192.168.1.3 | {"name":"wangwu","age":60} |
| 192.168.1.4 | {"name":"zhaoliu","age":70} |
+-----------------+-------------------------------+--+
查询struct的值
0: jdbc:hive2://JD:10086/db_hive1> select ip,info.name,info.age from hive_struct;
+--------------+-----------+------+--+
| ip | name | age |
+--------------+-----------+------+--+
| 192.168.1.1 | zhangsan | 40 |
| 192.168.1.2 | lisi | 50 |
| 192.168.1.3 | wangwu | 60 |
| 192.168.1.4 | zhaoliu | 70 |
+--------------+-----------+------+--+
4 rows selected (0.076 seconds)
点击流案例(分组,join,count)
有两张表click_log和ad_list,数据内容如下
[hadoop@JD data]$ cat ad_list.txt
ad_101 http://www.google.com catalog8|catalog1
ad_102 http://www.sohu.com catalog6|catalog3
ad_103 http://www.baidu.com catalog7
ad_104 http://www.qq.com catalog5|catalog1|catalog4|catalog9
ad_105 http://sina.com
[hadoop@JD data]$ cat click_log.txt
11 ad_101 2014-05-01 06:01:12.334+01
22 ad_102 2014-05-01 07:28:12.342+01
33 ad_103 2014-05-01 07:50:12.33+01
11 ad_104 2014-05-01 09:27:12.33+01
22 ad_103 2014-05-01 09:03:12.324+01
33 ad_102 2014-05-02 19:10:12.343+01
11 ad_101 2014-05-02 09:07:12.344+01
35 ad_105 2014-05-03 11:07:12.339+01
22 ad_104 2014-05-03 12:59:12.743+01
77 ad_103 2014-05-03 18:04:12.355+01
99 ad_102 2014-05-04 00:36:39.713+01
33 ad_101 2014-05-04 19:10:12.343+01
11 ad_101 2014-05-05 09:07:12.344+01
35 ad_102 2014-05-05 11:07:12.339+01
22 ad_103 2014-05-05 12:59:12.743+01
77 ad_104 2014-05-05 18:04:12.355+01
99 ad_105 2014-05-05 20:36:39.713+01
建表语句
0: jdbc:hive2://JD:10086/db_hive1> create table ad_list(ad_id string,url string,catalogs string)row format delimited fields terminated by '\t';
0: jdbc:hive2://JD:10086/db_hive1> create table click_log(cookie_id string,ad_id string,time string)row format delimited fields terminated by '\t';
导入数据
0: jdbc:hive2://JD:10086/db_hive1> load data local inpath '/home/hadoop/data/ad_list.txt' into table ad_list;
0: jdbc:hive2://JD:10086/db_hive1> load data local inpath '/home/hadoop/data/click_log.txt' into table click_log;
查询
0: jdbc:hive2://JD:10086/db_hive1> select * from click_log;
+----------------------+------------------+-----------------------------+--+
| click_log.cookie_id | click_log.ad_id | click_log.time |
+----------------------+------------------+-----------------------------+--+
| 11 | ad_101 | 2014-05-01 06:01:12.334+01 |
| 22 | ad_102 | 2014-05-01 07:28:12.342+01 |
| 33 | ad_103 | 2014-05-01 07:50:12.33+01 |
| 11 | ad_104 | 2014-05-01 09:27:12.33+01 |
| 22 | ad_103 | 2014-05-01 09:03:12.324+01 |
| 33 | ad_102 | 2014-05-02 19:10:12.343+01 |
| 11 | ad_101 | 2014-05-02 09:07:12.344+01 |
| 35 | ad_105 | 2014-05-03 11:07:12.339+01 |
| 22 | ad_104 | 2014-05-03 12:59:12.743+01 |
| 77 | ad_103 | 2014-05-03 18:04:12.355+01 |
| 99 | ad_102 | 2014-05-04 00:36:39.713+01 |
| 33 | ad_101 | 2014-05-04 19:10:12.343+01 |
| 11 | ad_101 | 2014-05-05 09:07:12.344+01 |
| 35 | ad_102 | 2014-05-05 11:07:12.339+01 |
| 22 | ad_103 | 2014-05-05 12:59:12.743+01 |
| 77 | ad_104 | 2014-05-05 18:04:12.355+01 |
| 99 | ad_105 | 2014-05-05 20:36:39.713+01 |
+----------------------+------------------+-----------------------------+--+
0: jdbc:hive2://JD:10086/db_hive1> select * from ad_list;
+----------------+------------------------+--------------------------------------+--+
| ad_list.ad_id | ad_list.url | ad_list.catalogs |
+----------------+------------------------+--------------------------------------+--+
| ad_101 | http://www.google.com | catalog8|catalog1 |
| ad_102 | http://www.sohu.com | catalog6|catalog3 |
| ad_103 | http://www.baidu.com | catalog7 |
| ad_104 | http://www.qq.com | catalog5|catalog1|catalog4|catalog9 |
| ad_105 | http://sina.com | NULL |
+----------------+------------------------+--------------------------------------+--+
查询根据用户访问的ad_id
0: jdbc:hive2://JD:10000/db_hive1> select cookie_id,collect_list(ad_id) from click_log group by cookie_id;
+------------+----------------------------------------+--+
| cookie_id | _c1 |
+------------+----------------------------------------+--+
| 11 | ["ad_101","ad_104","ad_101","ad_101"] |
| 22 | ["ad_102","ad_103","ad_104","ad_103"] |
| 33 | ["ad_103","ad_102","ad_101"] |
| 35 | ["ad_105","ad_102"] |
| 77 | ["ad_103","ad_104"] |
| 99 | ["ad_102","ad_105"] |
+------------+----------------------------------------+--+
查询根据用户访问的ad_id,并去重
0: jdbc:hive2://JD:10000/db_hive1> select cookie_id,collect_set(ad_id) from click_log group by cookie_id;
+------------+-------------------------------+--+
| cookie_id | _c1 |
+------------+-------------------------------+--+
| 11 | ["ad_101","ad_104"] |
| 22 | ["ad_102","ad_103","ad_104"] |
| 33 | ["ad_103","ad_102","ad_101"] |
| 35 | ["ad_105","ad_102"] |
| 77 | ["ad_103","ad_104"] |
| 99 | ["ad_102","ad_105"] |
+------------+-------------------------------+--+
6 rows selected (21.369 seconds)
查询出用户访问ad_id的次数
0: jdbc:hive2://JD:10000/db_hive1> select cookie_id,ad_id,count(1) from click_log group by cookie_id,ad_id;
+------------+---------+------+--+
| cookie_id | ad_id | _c2 |
+------------+---------+------+--+
| 11 | ad_101 | 3 |
| 11 | ad_104 | 1 |
| 22 | ad_102 | 1 |
| 22 | ad_103 | 2 |
| 22 | ad_104 | 1 |
| 33 | ad_101 | 1 |
| 33 | ad_102 | 1 |
| 33 | ad_103 | 1 |
| 35 | ad_102 | 1 |
| 35 | ad_105 | 1 |
| 77 | ad_103 | 1 |
| 77 | ad_104 | 1 |
| 99 | ad_102 | 1 |
| 99 | ad_105 | 1 |
+------------+---------+------+--+
根据ad_id关联ad_list,查询出对应的类别
select
click.cookie_id,click.ad_id,click.amount,ad_list.catalogs
from
(select cookie_id, ad_id, count(1) amount from click_log group by cookie_id, ad_id) click
join ad_list
on ad_list.ad_id = click.ad_id;
+------------------+--------------+---------------+--------------------------------------+--+
| click.cookie_id | click.ad_id | click.amount | ad_list.catalogs |
+------------------+--------------+---------------+--------------------------------------+--+
| 11 | ad_101 | 3 | catalog8|catalog1 |
| 11 | ad_104 | 1 | catalog5|catalog1|catalog4|catalog9 |
| 22 | ad_102 | 1 | catalog6|catalog3 |
| 22 | ad_103 | 2 | catalog7 |
| 22 | ad_104 | 1 | catalog5|catalog1|catalog4|catalog9 |
| 33 | ad_101 | 1 | catalog8|catalog1 |
| 33 | ad_102 | 1 | catalog6|catalog3 |
| 33 | ad_103 | 1 | catalog7 |
| 35 | ad_102 | 1 | catalog6|catalog3 |
| 35 | ad_105 | 1 | NULL |
| 77 | ad_103 | 1 | catalog7 |
| 77 | ad_104 | 1 | catalog5|catalog1|catalog4|catalog9 |
| 99 | ad_102 | 1 | catalog6|catalog3 |
| 99 | ad_105 | 1 | NULL |
+------------------+--------------+---------------+--------------------------------------+--+
列转行
原格式:ad_101 catalog8|catalog1
转为:ad_101 catalog8
ad_101 catalog1
语法:lateral view outer explode(split(catalogs,’\|’)) t as catalog
0: jdbc:hive2://JD:10000/db_hive1> select * from ad_list;
+----------------+------------------------+--------------------------------------+--+
| ad_list.ad_id | ad_list.url | ad_list.catalogs |
+----------------+------------------------+--------------------------------------+--+
| ad_101 | http://www.google.com | catalog8|catalog1 |
| ad_102 | http://www.sohu.com | catalog6|catalog3 |
| ad_103 | http://www.baidu.com | catalog7 |
| ad_104 | http://www.qq.com | catalog5|catalog1|catalog4|catalog9 |
| ad_105 | http://sina.com | NULL |
+----------------+------------------------+--------------------------------------+--+
5 rows selected (0.12 seconds)
列转行,outer表示左连接
0: jdbc:hive2://JD:10000/db_hive1> select ad_id,catalog from ad_list lateral view outer explode(split(catalogs,'\\|')) t as catalog;
+---------+-----------+--+
| ad_id | catalog |
+---------+-----------+--+
| ad_101 | catalog8 |
| ad_101 | catalog1 |
| ad_102 | catalog6 |
| ad_102 | catalog3 |
| ad_103 | catalog7 |
| ad_104 | catalog5 |
| ad_104 | catalog1 |
| ad_104 | catalog4 |
| ad_104 | catalog9 |
| ad_105 | NULL |
+---------+-----------+--+
列转行,去掉outer查询出匹配上的数据
0: jdbc:hive2://JD:10000/db_hive1> select ad_id,catalog from ad_list lateral view outer explode(split(catalogs,'\\|')) t as catalog;
+---------+-----------+--+
| ad_id | catalog |
+---------+-----------+--+
| ad_101 | catalog8 |
| ad_101 | catalog1 |
| ad_102 | catalog6 |
| ad_102 | catalog3 |
| ad_103 | catalog7 |
| ad_104 | catalog5 |
| ad_104 | catalog1 |
| ad_104 | catalog4 |
| ad_104 | catalog9 |
+---------+-----------+--+
Function 函数
使用方式
查看函数
0: jdbc:hive2://JD:10000/db_hive1> show functions;
如果不知道某一函数怎么用或者什么意思,可以查看具体的某一函数
0: jdbc:hive2://JD:10000/db_hive1> desc function upper;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| upper(str) - Returns str with all characters changed to uppercase |
+----------------------------------------------------+--+
查看详细的使用示例
0: jdbc:hive2://JD:10000/db_hive1> desc function extended upper;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| upper(str) - Returns str with all characters changed to uppercase |
| Synonyms: ucase |
| Example: |
| > SELECT upper('Facebook') FROM src LIMIT 1; |
| 'FACEBOOK' |
+----------------------------------------------------+--+
按顺序排列sort_array(colume)
让catalogs中的数据升序排序
数据
0: jdbc:hive2://JD:10000/db_hive1> select * from ad_list_2;
+------------------+------------------------+------------------------------------------------+--+
| ad_list_2.ad_id | ad_list_2.url | ad_list_2.catalogs |
+------------------+------------------------+------------------------------------------------+--+
| ad_101 | http://www.google.com | ["catalog8","catalog1"] |
| ad_102 | http://www.sohu.com | ["catalog6","catalog3"] |
| ad_103 | http://www.baidu.com | ["catalog7"] |
| ad_104 | http://www.qq.com | ["catalog5","catalog1","catalog4","catalog9"] |
| ad_105 | http://sina.com | NULL |
+------------------+------------------------+------------------------------------------------+--+
查询
0: jdbc:hive2://JD:10000/db_hive1> select ad_id,sort_array(catalogs) from ad_list_2;
+---------+------------------------------------------------+--+
| ad_id | _c1 |
+---------+------------------------------------------------+--+
| ad_101 | ["catalog1","catalog8"] |
| ad_102 | ["catalog3","catalog6"] |
| ad_103 | ["catalog7"] |
| ad_104 | ["catalog1","catalog4","catalog5","catalog9"] |
| ad_105 | NULL |
+---------+------------------------------------------------+--+
日期类型以及类型转换
0: jdbc:hive2://JD:10000/db_hive1> create table dual(x string);
插入一条空数据
0: jdbc:hive2://JD:10000/db_hive1> insert into dual values('');
查询
0: jdbc:hive2://JD:10000/db_hive1> select * from dual;
+---------+--+
| dual.x |
+---------+--+
| |
+---------+--+
### 查看当前时间
0: jdbc:hive2://JD:10000/db_hive1> select current_date from dual;
+-------------+--+
| _c0 |
+-------------+--+
| 2019-12-20 |
+-------------+--+
查看具体的时间
0: jdbc:hive2://JD:10000/db_hive1> select current_timestamp from dual;
+--------------------------+--+
| _c0 |
+--------------------------+--+
| 2019-12-20 14:02:37.041 |
+--------------------------+--+
获取从1970年到现在的毫秒数
0: jdbc:hive2://JD:10000/db_hive1> select unix_timestamp() from dual;
+-------------+--+
| _c0 |
+-------------+--+
| 1576821909 |
+-------------+--+
获取指定1970年到指定时间的毫秒数
0: jdbc:hive2://JD:10000/db_hive1> select unix_timestamp('2019-12-31 12:10:09') from dual;
+-------------+--+
| _c0 |
+-------------+--+
| 1577765409 |
+-------------+--+
传入指定格式的时间获取毫秒数
0: jdbc:hive2://JD:10000/db_hive1> select unix_timestamp('20191231 12:10:09','yyyyMMdd HH:mm:ss') from dual;
+-------------+--+
| _c0 |
+-------------+--+
| 1577765409 |
+-------------+--+
毫秒转日期
0: jdbc:hive2://JD:10000/db_hive1> desc function extended from_unixtime;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| from_unixtime(unix_time, format) - returns unix_time in the specified format |
| Example: |
| > SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ss') FROM src LIMIT 1; |
| '1970-01-01 00:00:00' |
+----------------------------------------------------+--+
0: jdbc:hive2://JD:10000/db_hive1> SELECT from_unixtime(9987659999, 'yyyy-MM-dd HH:mm:ss') FROM dual;
+----------------------+--+
| _c0 |
+----------------------+--+
| 2286-07-01 05:59:59 |
+----------------------+--+
日期转成天
0: jdbc:hive2://JD:10000/db_hive1> desc function extended to_date;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| to_date(expr) - Extracts the date part of the date or datetime expression expr |
| Example: |
| > SELECT to_date('2009-07-30 04:17:52') FROM src LIMIT 1; |
| '2009-07-30' |
+----------------------------------------------------+--+
0: jdbc:hive2://JD:10000/db_hive1> SELECT to_date('2009-07-30 04:17:52') FROM dual;
+-------------+--+
| _c0 |
+-------------+--+
| 2009-07-30 |
+-------------+--+
查询出年、月、日、时、分、秒
0: jdbc:hive2://JD:10000/db_hive1> desc function extended year;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| year(date) - Returns the year of date |
| date is a string in the format of 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. |
| Example: |
| > SELECT year('2009-07-30') FROM src LIMIT 1; |
| 2009 |
+----------------------------------------------------+--+
0: jdbc:hive2://JD:10000/db_hive1> SELECT year('2009-07-30') FROM dual;
+-------+--+
| _c0 |
+-------+--+
| 2009 |
+-------+--+
0: jdbc:hive2://JD:10000/db_hive1> SELECT month('2009-07-30 18:30:05') FROM dual;
+------+--+
| _c0 |
+------+--+
| 7 |
+------+--+
0: jdbc:hive2://JD:10000/db_hive1> SELECT day('2009-07-30 18:30:05') FROM dual;
+------+--+
| _c0 |
+------+--+
|30 |
+------+--+
0: jdbc:hive2://JD:10000/db_hive1> SELECT hour('2009-07-30 18:30:05') FROM dual;
+------+--+
| _c0 |
+------+--+
|18 |
+------+--+
0: jdbc:hive2://JD:10000/db_hive1> SELECT minute('2009-07-30 18:30:05') FROM dual;
+------+--+
| _c0 |
+------+--+
|30 |
+------+--+
0: jdbc:hive2://JD:10000/db_hive1> SELECT second('2009-07-30 18:30:05') FROM dual;
+------+--+
| _c0 |
+------+--+
|5 |
+------+--+
日期加减函数
0: jdbc:hive2://JD:10000/db_hive1> desc function extended date_add;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| date_add(start_date, num_days) - Returns the date that is num_days after start_date. |
| start_date is a string in the format 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. num_days is a number. The time part of start_date is ignored. |
| Example: |
| > SELECT date_add('2009-07-30', 1) FROM src LIMIT 1; |
| '2009-07-31' |
+----------------------------------------------------+--+
0: jdbc:hive2://JD:10000/db_hive1> desc function extended date_sub;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| date_sub(start_date, num_days) - Returns the date that is num_days before start_date. |
| start_date is a string in the format 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. num_days is a number. The time part of start_date is ignored. |
| Example: |
| > SELECT date_sub('2009-07-30', 1) FROM src LIMIT 1; |
| '2009-07-29' |
+----------------------------------------------------+--+
类型转换函数cast(value as type):将字符串转为int
0: jdbc:hive2://JD:10000/db_hive1> select cast('10' as int) from dual;
+------+--+
| _c0 |
+------+--+
| 10 |
+------+--+
字符串转日期
0: jdbc:hive2://JD:10000/db_hive1> select cast('2016-12-30' as date) from dual;
+-------------+--+
| _c0 |
+-------------+--+
| 2016-12-30 |
+-------------+--+
类型转换有时会出现错误,因为有的类型无法转,转不过来就会转为NULL
数据为
0: jdbc:hive2://JD:10000/db_hive1> select * from emp_test1;
+------------------+------------------+----------------+----------------+---------------------+----------------+-----------------+-------------------+--+
| emp_test1.empno | emp_test1.ename | emp_test1.job | emp_test1.mgr | emp_test1.hiredate | emp_test1.sal | emp_test1.comm | emp_test1.deptno |
+------------------+------------------+----------------+----------------+---------------------+----------------+-----------------+-------------------+--+
| 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 800.0 | NULL | 20 |
| 7499 | ALLEN | SALESMAN | 7698 | 1981-2-20 | 1600.0 | 300.0 | 30 |
| 7521 | WARD | SALESMAN | 7698 | 1981-2-22 | 1250.0 | 500.0 | 30 |
| 7566 | JONES | MANAGER | 7839 | 1981-4-2 | 2975.0 | NULL | 20 |
| 7654 | MARTIN | SALESMAN | 7698 | 1981-9-28 | 1250.0 | 1400.0 | 30 |
| 7698 | BLAKE | MANAGER | 7839 | 1981-5-1 | 2850.0 | NULL | 30 |
| 7782 | CLARK | MANAGER | 7839 | 1981-6-9 | 2450.0 | NULL | 10 |
| 7788 | SCOTT | ANALYST | 7566 | 1987-4-19 | 3000.0 | NULL | 20 |
| 7839 | KING | PRESIDENT | NULL | 1981-11-17 | 5000.0 | NULL | 10 |
| 7844 | TURNER | SALESMAN | 7698 | 1981-9-8 | 1500.0 | 0.0 | 30 |
| 7876 | ADAMS | CLERK | 7788 | 1987-5-23 | 1100.0 | NULL | 20 |
| 7900 | JAMES | CLERK | 7698 | 1981-12-3 | 950.0 | NULL | 30 |
| 7902 | FORD | ANALYST | 7566 | 1981-12-3 | 3000.0 | NULL | 20 |
| 7934 | MILLER | CLERK | 7782 | 1982-1-23 | 1300.0 | NULL | 10 |
| 8888 | HIVE | PROGRAM | 7839 | 1988-1-23 | 10300.0 | NULL | NULL |
+------------------+------------------+----------------+----------------+---------------------+----------------+-----------------+-------------------+--+
使用类型转换函数
0: jdbc:hive2://JD:10000/db_hive1> select empno,comm,cast(comm as int) from emp_test1;
+--------+---------+-------+--+
| empno | comm | _c2 |
+--------+---------+-------+--+
| 7369 | NULL | NULL |
| 7499 | 300.0 | 300 |
| 7521 | 500.0 | 500 |
| 7566 | NULL | NULL |
| 7654 | 1400.0 | 1400 |
| 7698 | NULL | NULL |
| 7782 | NULL | NULL |
| 7788 | NULL | NULL |
| 7839 | NULL | NULL |
| 7844 | 0.0 | 0 |
| 7876 | NULL | NULL |
| 7900 | NULL | NULL |
| 7902 | NULL | NULL |
| 7934 | NULL | NULL |
| 8888 | NULL | NULL |
+--------+---------+-------+--+
有一种特殊的类型binary:binary只能转为string,如果遇到要求转为int,那么只能binary==》string==》int
数学类转换
保留小数位数函数
0: jdbc:hive2://JD:10000/db_hive1> desc function extended round;
+-------------------------------------------------+--+
| tab_name |
+-------------------------------------------------+--+
| round(x[, d]) - round x to d decimal places |
| Example: |
| > SELECT round(12.3456, 1) FROM src LIMIT 1; |
| 12.3' |
+-------------------------------------------------+--+
返回一个整数比传入数据大于等于传入数据的函数
0: jdbc:hive2://JD:10000/db_hive1> desc function extended ceil;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| ceil(x) - Find the smallest integer not smaller than x |
| Synonyms: ceiling |
| Example: |
| > SELECT ceil(-0.1) FROM src LIMIT 1; |
| 0 |
| > SELECT ceil(5) FROM src LIMIT 1; |
| 5 |
+----------------------------------------------------+--+
返回一个整数比传入数据小于等于传入数据的函数
0: jdbc:hive2://JD:10000/db_hive1> desc function extended floor;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| floor(x) - Find the largest integer not greater than x |
| Example: |
| > SELECT floor(-0.1) FROM src LIMIT 1; |
| -1 |
| > SELECT floor(5) FROM src LIMIT 1; |
| 5 |
+----------------------------------------------------+--+
取绝对值的函数
0: jdbc:hive2://JD:10000/db_hive1> desc function extended abs;
+-------------------------------------------+--+
| tab_name |
+-------------------------------------------+--+
| abs(x) - returns the absolute value of x |
| Example: |
| > SELECT abs(0) FROM src LIMIT 1; |
| 0 |
| > SELECT abs(-5) FROM src LIMIT 1; |
| 5 |
+-------------------------------------------+--+
传进几个数值,返回最小值的函数
0: jdbc:hive2://JD:10000/db_hive1> desc function extended least;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| least(v1, v2, ...) - Returns the least value in a list of values |
| Example: |
| > SELECT least(2, 3, 1) FROM src LIMIT 1; |
| 1 |
+----------------------------------------------------+--+
传进几个数值,返回最大值的函数
0: jdbc:hive2://JD:10000/db_hive1> desc function extended greatest;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| greatest(v1, v2, ...) - Returns the greatest value in a list of values |
| Example: |
| > SELECT greatest(2, 3, 1) FROM src LIMIT 1; |
| 3 |
+----------------------------------------------------+--+
字符串操作函数
字符串截取(从1开始)
函数substr和substring是一样的用法
0: jdbc:hive2://JD:10000/db_hive1> desc function extended substr;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| substr(str, pos[, len]) - returns the substring of str that starts at pos and is of length len orsubstr(bin, pos[, len]) - returns the slice of byte array that starts at pos and is of length len |
| Synonyms: substring |
| pos is a 1-based index. If pos<0 the starting position is determined by counting backwards from the end of str. |
| Example: |
| > SELECT substr('Facebook', 5) FROM src LIMIT 1; |
| 'book' |
| > SELECT substr('Facebook', -5) FROM src LIMIT 1; |
| 'ebook' |
| > SELECT substr('Facebook', 5, 1) FROM src LIMIT 1; |
| 'b' |
+----------------------------------------------------+--+
输入字符串,输出拼接后的字符串
0: jdbc:hive2://JD:10000/db_hive1> desc function extended concat;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| concat(str1, str2, ... strN) - returns the concatenation of str1, str2, ... strN or concat(bin1, bin2, ... binN) - returns the concatenation of bytes in binary data bin1, bin2, ... binN |
| Returns NULL if any argument is NULL. |
| Example: |
| > SELECT concat('abc', 'def') FROM src LIMIT 1; |
| 'abcdef' |
+----------------------------------------------------+--+
用分隔符拼接输入的字符串
0: jdbc:hive2://JD:10000/db_hive1> desc function extended concat_ws;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| concat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator. |
| Example: |
| > SELECT concat_ws('.', 'www', array('facebook', 'com')) FROM src LIMIT 1; |
| 'www.facebook.com' |
+----------------------------------------------------+--+
计算字符串或binary的长度
0: jdbc:hive2://JD:10000/db_hive1> desc function extended length;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| length(str | binary) - Returns the length of str or number of bytes in binary data |
| Example: |
| > SELECT length('Facebook') FROM src LIMIT 1; |
| 8 |
+----------------------------------------------------+--+
输入一个字符串,按照对应规则分割,返回分割后的数据
0: jdbc:hive2://JD:10000/db_hive1> desc function extended split;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| split(str, regex) - Splits str around occurances that match regex |
| Example: |
| > SELECT split('oneAtwoBthreeC', '[ABC]') FROM src LIMIT 1; |
| ["one", "two", "three"] |
+----------------------------------------------------+--+
有时候分割的时候需要转译,比如按照.分割的话就需要转译
0: jdbc:hive2://JD:10000/db_hive1> SELECT split('192.168.1.2', '\\.') FROM dual;
+------------------------+--+
| _c0 |
+------------------------+--+
| ["192","168","1","2"] |
+------------------------+--+
多个函数可以嵌套使用,比如
0: jdbc:hive2://JD:10000/db_hive1> select upper(substr('abcdeg',2,3)) from dual;
+------+--+
| _c0 |
+------+--+
| BCD |
+------+--+
总结集合的操作
array_contains 返回值是boolean 放在where条件后使用
sort_array 排序后的数组
size 返回数值
map_keys 返回一个数组
map_values 返回一个数组
json操作的函数
数据为
[root@JD data]# cat regions.txt
{"movie":"2321","rate":"3","time":"9908886622","userid":"1"}
{"movie":"2682","rate":"4","time":"9902226622","userid":"1"}
{"movie":"2333","rate":"2","time":"9808886622","userid":"1"}
{"movie":"2222","rate":"1","time":"8908886622","userid":"1"}
建表
0: jdbc:hive2://JD:10000/db_hive1> create table rating_json(json string);
导入数据
0: jdbc:hive2://JD:10000/db_hive1> load data local inpath '/home/hadoop/data/regions.txt' into table rating_json;
查看数据
0: jdbc:hive2://JD:10000/db_hive1> select * from rating_json;
+----------------------------------------------------+--+
| rating_json.json |
+----------------------------------------------------+--+
| {"movie":"2321","rate":"3","time":"9908886622","userid":"1"} |
| {"movie":"2682","rate":"4","time":"9902226622","userid":"1"} |
| {"movie":"2333","rate":"2","time":"9808886622","userid":"1"} |
| {"movie":"2222","rate":"1","time":"8908886622","userid":"1"} |
+----------------------------------------------------+--+
json_tuple函数
0: jdbc:hive2://JD:10000/db_hive1> desc function extended json_tuple;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| json_tuple(jsonStr, p1, p2, ..., pn) - like get_json_object, but it takes multiple names and return a tuple. All the input parameters and output column types are string. |
+----------------------------------------------------+--+
传入json字符串
0: jdbc:hive2://JD:10000/db_hive1> select json_tuple(json,'movie','rate','time','userid') as (movie,rate,time,userid) from rating_json;
+--------+-------+-------------+---------+--+
| movie | rate | time | userid |
+--------+-------+-------------+---------+--+
| 2321 | 3 | 9908886622 | 1 |
| 2682 | 4 | 9902226622 | 1 |
| 2333 | 2 | 9808886622 | 1 |
| 2222 | 1 | 8908886622 | 1 |
+--------+-------+-------------+---------+--+
查询出年,发现报错,因为from_unixtime()函数传入的参数要求是bigint,所以需要cast函数进行转换
0: jdbc:hive2://JD:10000/db_hive1> select year(from_unixtime(time)) as year from (select json_tuple(json, 'movie','rate','time','userid') as (movie, rate,time, userid) from rating_json) t;
Error: Error while compiling statement: FAILED: SemanticException [Error 10014]: Line 1:12 Wrong arguments 'time': No matching method for class org.apache.hadoop.hive.ql.udf.UDFFromUnixTime with (string). Possible choices: _FUNC_(bigint) _FUNC_(bigint, string) _FUNC_(int) _FUNC_(int, string) (state=42000,code=10014)
最终sql语句
select userid,movie,rate,time,
from_unixtime(cast(time as bigint)) as ts,
year(from_unixtime(cast(time as bigint))) as year,
month(from_unixtime(cast(time as bigint))) as month,
day(from_unixtime(cast(time as bigint))) as day,
hour(from_unixtime(cast(time as bigint))) as hour,
minute(from_unixtime(cast(time as bigint))) as minute,
second(from_unixtime(cast(time as bigint))) as second
from (select json_tuple(json, 'movie','rate','time','userid') as (movie, rate,time, userid) from rating_json ) t limit 10;
+---------+--------+-------+-------------+----------------------+-------+--------+------+-------+---------+---------+--+
| userid | movie | rate | time | ts | year | month | day | hour | minute | second |
+---------+--------+-------+-------------+----------------------+-------+--------+------+-------+---------+---------+--+
| 1 | 2321 | 3 | 9908886622 | 2284-01-01 12:30:22 | 2284 | 1 | 1 | 12 | 30 | 22 |
| 1 | 2682 | 4 | 9902226622 | 2283-10-16 10:30:22 | 2283 | 10 | 16 | 10 | 30 | 22 |
| 1 | 2333 | 2 | 9808886622 | 2280-10-31 02:43:42 | 2280 | 10 | 31 | 2 | 43 | 42 |
| 1 | 2222 | 1 | 8908886622 | 2252-04-24 10:43:42 | 2252 | 4 | 24 | 10 | 43 | 42 |
+---------+--------+-------+-------------+----------------------+-------+--------+------+-------+---------+---------+--+
解析url的函数
分割域名、path和query以及在query中取值
0: jdbc:hive2://JD:10000/db_hive1> desc function extended parse_url_tuple;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| parse_url_tuple(url, partname1, partname2, ..., partnameN) - extracts N (N>=1) parts from a URL. |
| It takes a URL and one or multiple partnames, and returns a tuple. All the input parameters and output column types are string. |
| Partname: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO, QUERY:<KEY_NAME> |
| Note: Partnames are case-sensitive, and should not contain unnecessary white spaces. |
| Example: |
| > SELECT b.* FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY', 'QUERY:id') b as host, path, query, query_id LIMIT 1; |
| > SELECT parse_url_tuple(a.fullurl, 'HOST', 'PATH', 'QUERY', 'REF', 'PROTOCOL', 'FILE', 'AUTHORITY', 'USERINFO', 'QUERY:k1') as (ho, pa, qu, re, pr, fi, au, us, qk1) from src a; |
+----------------------------------------------------+--+
解析域名
0: jdbc:hive2://JD:10000/db_hive1> select parse_url_tuple('http://www.bigdata.com/bigdata/hive?cookie_id=10&a=b','HOST','PATH','QUERY','QUERY:cookie_id') from dual;
+------------------+----------------+-------------------+-----+--+
| c0 | c1 | c2 | c3 |
+------------------+----------------+-------------------+-----+--+
| www.bigdata.com | /bigdata/hive | cookie_id=10&a=b | 10 |
+------------------+----------------+-------------------+-----+--+
判断为空的函数
0: jdbc:hive2://JD:10000/db_hive1> desc function extended isnull;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| isnull a - Returns true if a is NULL and false otherwise |
+----------------------------------------------------+--+
判断不为空的函数
0: jdbc:hive2://JD:10000/db_hive1> desc function extended isnotnull;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| isnotnull a - Returns true if a is not NULL and false otherwise |
+----------------------------------------------------+--+
查询
0: jdbc:hive2://JD:10000/db_hive1> select empno,comm,isnull(comm) as is_null,isnotnull(comm) as is_not_null from emp_test1;
+--------+---------+----------+--------------+--+
| empno | comm | is_null | is_not_null |
+--------+---------+----------+--------------+--+
| 7369 | NULL | true | false |
| 7499 | 300.0 | false | true |
| 7521 | 500.0 | false | true |
| 7566 | NULL | true | false |
| 7654 | 1400.0 | false | true |
| 7698 | NULL | true | false |
| 7782 | NULL | true | false |
| 7788 | NULL | true | false |
| 7839 | NULL | true | false |
| 7844 | 0.0 | false | true |
| 7876 | NULL | true | false |
| 7900 | NULL | true | false |
| 7902 | NULL | true | false |
| 7934 | NULL | true | false |
| 8888 | NULL | true | false |
+--------+---------+----------+--------------+--+
15 rows selected (0.071 seconds)
assert_true函数(如果不符合条件,抛异常)
0: jdbc:hive2://JD:10000/db_hive1> desc function extended assert_true;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| assert_true(condition) - Throw an exception if 'condition' is not true. |
| Example: |
| > SELECT assert_true(x >= 0) FROM src LIMIT 1; |
| NULL |
+----------------------------------------------------+--+
elt函数
n为1,返回第一个字符串,为2,返回第二个字符串
0: jdbc:hive2://JD:10000/db_hive1> desc function extended elt;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| elt(n, str1, str2, ...) - returns the n-th string |
| Example: |
| > SELECT elt(1, 'face', 'book') FROM src LIMIT 1; |
| 'face' |
+----------------------------------------------------+--+
4 rows selected (0.027 seconds)
nvl函数
如果为null,返回一个默认值
0: jdbc:hive2://JD:10000/db_hive1> desc function extended nvl;
+----------------------------------------------------+--+
| tab_name |
+----------------------------------------------------+--+
| nvl(value,default_value) - Returns default value if value is null else returns value |
| Example: |
| > SELECT nvl(null,'bla') FROM src LIMIT 1; |
| bla |
+----------------------------------------------------+--+
0: jdbc:hive2://JD:10000/db_hive1> select ename,comm,nvl(comm,0) from emp_test1;
+---------+---------+---------+--+
| ename | comm | _c2 |
+---------+---------+---------+--+
| SMITH | NULL | 0.0 |
| ALLEN | 300.0 | 300.0 |
| WARD | 500.0 | 500.0 |
| JONES | NULL | 0.0 |
| MARTIN | 1400.0 | 1400.0 |
| BLAKE | NULL | 0.0 |
| CLARK | NULL | 0.0 |
| SCOTT | NULL | 0.0 |
| KING | NULL | 0.0 |
| TURNER | 0.0 | 0.0 |
| ADAMS | NULL | 0.0 |
| JAMES | NULL | 0.0 |
| FORD | NULL | 0.0 |
| MILLER | NULL | 0.0 |
| HIVE | NULL | 0.0 |
+---------+---------+---------+--+