hql语句

最新推荐文章于 2024-06-26 09:31:09 发布

Joseph-Growth

最新推荐文章于 2024-06-26 09:31:09 发布

阅读量2k

点赞数 1

分类专栏：大数据之hive

本文链接：https://blog.csdn.net/joseph_happy/article/details/50464848

版权

大数据之hive 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

简介

hql为hive sql的缩写。hive本身为java语言开发而成，所以hive上面如果有什么特殊需求，完全可以是用hive udf订制自己的需求（后续会介绍udf的开发方法）。

语法

以下只列举一些对作者有用的语法.

    LIKE操作
    语法： A LIKE B
    操作类型：string
    描述：如果字符串A或者字符串B为NULL，则返回NULL；如果字符串A符合表达式B   的正则语法，则为TRUE；否则为FALSE。B中字符”_”表示任意单个字符，而字符”%”表示任意数量的字符。
    举例：
    hive> select * from tbl_lzo where sport like 'foot%';
    ...footbar...
    hive> select * from tbl_lzo where sport like 'foot___';
    ...footbar...
    <font color=red>hive> select * from tbl_lzo where not sport like 'foot___';</font> ## 否定

    REGEXP操作
    语法：A REGEXP B
    操作类型：string
    描述：如果字符串A或者字符串B为NULL，则返回NULL；如果字符串A符合JAVA正则表达式B的正则语法，则为TRUE；否则为FALSE。
    举例：
    hive> select * from tbl_lzo where sport REGEXP '^f.*r$';

    ROUND操作
    语法：round(double a, int decim)
    返回值：double
    说明：返回指定精度decim的double类型
    举例：
    hive> select round(score, 2) from tbl_lzo;

    FROM_UNIXTIME函数
    语法：from_unixtime(bigint unixtime[, string format])
    返回值：string
    说明：转化UNIX时间戳（从1970-01-01 00:00:00 UTC到指定时间的秒数）到当前时区的时间格式
    举例：
    hive> select from_unixtime(timestamp,'yyyyMMdd') from tbl_lzo;

    UNIX_TIMESTAMP函数
    语法：unix_timestamp([string date[, string pattern]])
    返回值： bigint
    说明：如果有date，则按照pattern格式的日期转换到UNIX时间戳。转换失败则为0
    举例：
    hive> select unix_timestamp(), unix_timestamp('2016-01-05 17:50:33'), unix_timestamp('2016-01-05 17:50:33', 'yyyyMMdd') from tbl_lzo;

    TO_DATE函数
    语法：to_date(string datetime)
    返回值：string
    说明：返回日期时间字段中的日期部分
    举例：
    hive> select to_date('2016-01-05 17:54:22') from tbl_lzo;

    datediff函数 [date_add,date_sub]
    语法：datediff(string enddate, string startdate)
    返回值：int
    说明：返回结束日期减去开始日期的天数
    举例：
    hive> select datediff('2016-01-05', '2015-11-13') from tbl_lzo;

    CASE函数
    语法：
    1.CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END
    2.CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END
    返回值：T
    说明：
    1.如果a等于b，那么返回c；如果a等于d，那么返回e；否则返回f
    2.如果a为TRUE,则返回b；如果c为TRUE，则返回d；否则返回e

    字符串长度函数：length
    语法: length(string A)
    返回值: int
    说明：返回字符串A的长度
    举例：
    hive> select length('abcedfg') from tbl_lzo;

    字符串反转函数：reverse
    语法: reverse(string A)
    返回值: string
    说明：返回字符串A的反转结果
    举例：
    hive> select reverse('abcedfg’) from tbl_lzo;

    字符串连接函数：concat
    语法: concat(string A, string B…)
    返回值: string
    说明：返回输入字符串连接后的结果，支持任意个输入字符串
    举例：
    hive> select concat(‘abc’,'def’,'gh’) from tbl_lzo;

    带分隔符字符串连接函数：concat_ws
    语法: concat_ws(string SEP, string A, string B…)
    返回值: string
    说明：返回输入字符串连接后的结果，SEP表示各个字符串间的分隔符
    举例：
    hive> select concat_ws(',','abc','def','gh') from tbl_lzo;

    字符串截取函数：substr,substring
    语法: 
    substr(string A, int start, int len)
    substring(string A, intstart, int len)
    返回值: string
    说明：返回字符串A从start位置开始，长度为len的字符串
    举例：
    hive> select substr('abcde',3,2) from tbl_lzo;

    Map类型构建: map
    语法: map (key1, value1, key2, value2, …)
    说明：根据输入的key和value对构建map类型
    举例：
    hive> Create table lxw_test as select 
    map('100','tom','200','mary')as t from tbl_lzo;

    hive> describe tbl_lzo;
    t       map<string,string>

    hive> select t from tbl_lzo;
    {"100":"tom","200":"mary"}

    Struct类型构建: struct
    语法: struct(val1, val2, val3, …)
    说明：根据输入的参数构建结构体struct类型
    举例：
    hive> create table tbl_lzo as select 
    struct('tom','mary','tim')as t from tbl_lzo;

    hive> describe tbl_lzo;
    t       struct<col1:string,col2:string,col3:string>

    hive> select t from tbl_lzo;
    {"col1":"tom","col2":"mary","col3":"tim"}

    array类型构建: array
    语法: array(val1, val2, …)
    说明：根据输入的参数构建数组array类型
    举例：
    hive> create table tbl_lzo as 
    select array("tom","mary","tim") as t from tbl_lzo;

    hive> describe tbl_lzo;
    t       array<string>

    hive> select t from tbl_lzo;
    ["tom","mary","tim"]

    struct类型访问: S.x
    语法: S.x
    操作类型: S为struct类型
    说明：返回结构体S中的x字段。比如，对于结构体struct foobar 
{int foo, int bar}，foobar.foo返回结构体中的foo字段
    举例：
    hive> create table tbl_lzo as select 
    struct('tom','mary','tim')as t from tbl_lzo;

    hive> describe tbl_lzo;
    t       struct<col1:string,col2:string,col3:string>

    hive> select t.col1,t.col3 from tbl_lzo;
    tom     tim

    array类型长度函数: size(Array<T>)
    语法: size(Array<T>)
    返回值: int
    说明: 返回array类型的长度
    举例：
    hive> select size(array('100','101','102','103')) 
    from tbl_lzo;

参考

详细命令可参考：

http://blog.csdn.net/wisgood/article/details/17376393
http://www.cnblogs.com/end/archive/2012/06/18/2553682.html

重点

下面介绍内容为本篇博客重点

    数据去重collect_set函数
    语法：collect_set(string column)
    返回值：array
    说明：返回去重的array数组数据
    举例：
    hive> select imei, device_os, collect_set(province), collect_set(game) from s_tbl_lzo where dayid='20160104' group by imei, device_os;

collect_set函数属于聚合操作，可以根据主键来聚合其他列中为字符串的列。