HiveQL或trino(presto)：查询

三生暮雨渡瀟瀟

已于 2022-05-16 21:27:23 修改

阅读量4.2k

点赞数 3

于 2022-05-14 15:49:47 首次发布

本文链接：https://blog.csdn.net/weixin_42771366/article/details/124769128

版权

SQL 同时被 2 个专栏收录

7 篇文章 1 订阅

订阅专栏

presto

3 篇文章 0 订阅

订阅专栏

工作中在用大数据，hive、impala、trino都有使用，使用hive和trino最多，整里了以下内容，有点长，看完，绝对有收获。

提示：上面的代码是hive，下面的代码是trino，文字说明用的是hive。

1、select...from语句：

select name,salary from hive.presto.employees;
select e.name,e.salary from hive.presto.employees e;

当用户选择的列是集合类型，hive会使用JSON语法应用于输出。注意，集合的字符串元素是加上引号的，而基本数据类型string的列值是不加引号的。

----trino(hive中一样)：
select e.name,e.salary,e.subordinates from hive.presto.employees e;
---------------------------------------------------
  name	  salary	subordinates
John Doe  100,000	['Mary Smith','Todd Jonhes']

deductions列是一个Map，其实用JSON格式来表达map，即使用一个被括在{...}内的以逗号分隔的键:值对列进行表示：

select e.name, e.deductions from hive.presto.employees e;
--trino：
-------------------------------------
name	         deductions
John Doe	{Insurance=0.1, Federal Taxes=0.2, State Taxes=0.05}

address列是一个struct(trino:row)，其实使用JSON map格式进行表示：

select e.name, e.address from hive.presto.employees e;
--trino：
select e.name, e.address from hive.presto.employees e;
-----------------------------------------
name	        address
John Doe	{street=1 Michigan Ave., city=Chicago, state=IL, zip=60600}

数组索引是基于0的，这个和Java中是一样的。这里是一个选择subordinates数组中的第一个元素的查询：

select name,subordinates[0] from hive.presto.employees;
---trino：
select name,subordinates[1] from hive.presto.employees;

注意，引用一个不存在的元素将会返回null。同时，提取出的string数据类型的值将不再加引号！

为了引用一个map元素，用户还可以使用array[...]语法，但是使用的是键值而不是整数索引：

elect name,deductions["State Taxes"] from hive.presto.employees;
--trino：
select name,deductions['State Taxes'] from hive.presto.employees;

最后，为了引用struct中的一个元素，可以使用"点"符号：

select name,address.city from hive.presto.employees; 
--trino：
select name,address.city from hive.presto.employees;

2、算术运算符：

对于int和bigint运算，int会将类型转换为bigint。对于int和float运算，int将提升为float。

当进行算术运算时，需要注意数据溢出或数据下溢问题。乘法和除法最有可能会引发这个问题。

3、使用函数：

函数floor、round和ceil("向上取整")输入的是double类型的值，而返回值是bigint类型的，也就是将浮点类型数转换成了整型。

4、日期函数：

1、to_date：
select to_date('1970-01-01 00:00:00');
---trino：使用date
select date(substr('1970-01-01 00:00:00',1,10));
select date(format_datetime(cast('1970-01-01 00:00:00' as timestamp),'yyyy-MM-dd'));
2、year：返回时间字符串中的年份并使用int类型表示
select year('1970-01-01 00:00:00');
---trino：year(date) , year(interval year to month) , year(timestamp(p)) , year(timestamp(p) with time zone) 
select year(cast('1970-01-01 00:00:00' as timestamp));
3、month：返回时间字符串中的月份并使用int类型表示
select month("1970-11-01 00:00:00");
---trino：month(date) , month(interval year to month) , month(timestamp(p)) , month(timestamp(p) with time zone)
select month(cast('1970-11-01 00:00:00' as timestamp));
4、day：返回时间字符串中的天并使用int类型表示
select day("1970-11-12 00:00:00");
---trino：day(date) , day(interval day to second) , day(timestamp(p)) , day(timestamp(p) with time zone) 
select day(cast('1970-11-12 00:00:00' as timestamp));
5、hour：返回时间戳字符串中的小时并使用int类型表示：
select hour('2009-07-30 12:58:59');
select hour('12:58:59');
----trino：hour(interval day to second) , hour(timestamp(p)) , hour(timestamp(p) with time zone) , hour(time(p)) , hour(time(p) with time zone) 
select hour(cast('2009-07-30 12:58:59' as timestamp));
select hour(cast('12:58:59' as time));
6、minute：返回分钟数
7、second：返回时间字符串中的秒数
8、datediff：计算两个时间相差的天数
select datediff('2022-01-01','2022-05-25');
---trino：date_diff(varchar(x), date, date) , date_diff(varchar(x), timestamp(p), timestamp(p)) , date_diff(varchar(x), timestamp(p) with time zone, timestamp(p) with time zone) , date_diff(varchar(x), time(p), time(p)) , date_diff(varchar(x), time(p) with time zone, time(p) with time zone) 
select date_diff('day',date'2022-01-01',date'2022-05-25');
--------trino：计算相差月份、年份
select date_diff('month',date'2022-01-01',date'2022-05-25');
select date_diff('year',date'2022-01-01',date'2025-05-25');
9、date_add：增加天数
select date_add('2022-05-23',1);
---trino：date_add(varchar(x), bigint, date) , date_add(varchar(x), bigint, timestamp(p)) , date_add(varchar(x), bigint, timestamp(p) with time zone) , date_add(varchar(x), bigint, time(p)) , date_add(varchar(x), bigint, time(p) with time zone) 
select date_add('day',1,current_date);
10、date_sub()：减少天数
select date_sub('2022-05-23',1);
----trino：
select date_add('day',-1,current_date);

5、关于截取：

hive中用trunc，trino中用date_trunc

6、一点小说明：

工作中遇到了一个问题就是，求两时间相差的秒数，以前环境用的DB2 能使用自定义函数实现，换成大数据就没了。

解决方案：

hive中：用 unix_timestamp 转成unix时间戳，然后计算两个日期相差秒数

select
 unix_timestamp(concat(substr('20170728102031',1,4),'-',substr('20170728102031',5,2),'-',substr('20170728102031',7,2),' ',substr('20170728102031',9,2),':',substr('20170728102031',11,2),':',substr('20170728102031',13,2)))
-
unix_timestamp(concat(substr('20170728112031',1,4),'-',substr('20170728112031',5,2),'-',substr('20170728112031',7,2),' ',substr('20170728112031',9,2),':',substr('20170728112031',11,2),':',substr('20170728112031',13,2)))

trino中的实现：使用to_unixtime将时间戳转换成 UNIX 时间,再相减，转换成毫秒后就可以想变成啥变成啥。时分秒任你选。

select to_unixtime(cast('2019-09-09 12:32:05' as timestamp))
-to_unixtime(cast('2019-09-08 12:32:05' as timestamp))

有问题或纰漏错误等，欢迎大家指正。

关于trino时间函数的其他操作，请看下面的文章：

https://blog.csdn.net/weixin_42771366/article/details/122289547?spm=1001.2014.3001.5502https://blog.csdn.net/weixin_42771366/article/details/122289547?spm=1001.2014.3001.5502

三生暮雨渡瀟瀟

关注

3
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
HiveQL或trino(presto)：查询

hive或trino(presto)查询及时间函数的使用和相关说明
复制链接

扫一扫

专栏目录