Impala 自用使用手册
一、函数
1. 数据类型转换函数
cast(a as T)
/*
Converts the results of the expression expr to type T. For example, cast('1' as BIGINT) will convert the string '1' to its integral representation.
A null is returned if the conversion does not succeed.
If cast(expr as boolean) Hive returns true for a non-empty string.
即 CAST(字段名 as 要转换成的类型)
*/
# eg.
CAST(id as string)
2. 计算某列不重复值
count([DISTINCT|ALL] col)
/* An aggregate function that returns the number of rows, or the number of non-NULL rows. */
# eg.
COUNT(DISTINCT [brand_name])
3. 取出某一字段的前N行的数据(Lag)或后N行的数据(Lead)
lag(expr [, offset] [, default]) OVER ([partition_by_clause] order_by_clause)
/*
This function returns the value of an expression using column values from a preceding row.
You specify an integer offset, which designates a row position some number of rows previous to the current row.
Any column references in the expression argument refer to column values from that prior row.
Offset是偏移量,即是上1个或上N个的值,假设当前行在表中排在第5行,offset为3,则表示我们所要找的数据行就是表中的第2行(即5-3=2)
default默认值,当两个函数取上N/下N个值,当在表中从当前行位置向前数N行已经超出了表的范围时,函数将default这个参数值作为函数的返回值,若没有指定默认值,则返回NULL
*/
# eg.
round(lag(sum(sales),1)over(PARTITION BY platform ORDER BY dt)
lead(expr [, offset] [, default]) OVER ([partition_by_clause