hive学习笔记之七:内置函数

  1. Sqoop

  2. 基础UDF

  3. 用户自定义聚合函数(UDAF)

  4. UDTF

本篇概览

  • 本文是《hive学习笔记》系列的第七篇,前文熟悉了HiveQL的常用语句,接下来把常用的内置函数简单过一遍,分为以下几部分:
  1. 数学

  2. 字符

  3. json处理

  4. 转换

  5. 日期

  6. 条件

  7. 聚合

准备数据

  1. 本次实战要准备两个表:学生表和住址表,字段都很简单,如下图所示,学生表有个住址ID字段,是住址表里的记录的唯一ID:

在这里插入图片描述

  1. 先创建住址表:

create table address (addressid int, province string, city string)

row format delimited

fields terminated by ‘,’;

  1. 创建address.txt文件,内容如下:

1,guangdong,guangzhou

2,guangdong,shenzhen

3,shanxi,xian

4,shanxi,hanzhong

6,jiangshu,nanjing

  1. 加载数据到address表:

load data

local inpath ‘/home/hadoop/temp/202010/25/address.txt’

into table address;

  1. 创建学生表,其addressid字段关联了address表的addressid字段:

create table student (name string, age int, addressid int)

row format delimited

fields terminated by ‘,’;

  1. 创建student.txt文件,内容如下:

tom,11,1

jerry,12,2

mike,13,3

john,14,4

mary,15,5

  1. 加载数据到student表:

load data

local inpath ‘/home/hadoop/temp/202010/25/student.txt’

into table student;

  1. 至此,本次操作所需数据已准备完毕,如下所示:

hive> select * from address;

OK

1 guangdong guangzhou

2 guangdong shenzhen

3 shanxi xian

4 shanxi hanzhong

6 jiangshu nanjing

Time taken: 0.043 seconds, Fetched: 5 row(s)

hive> select * from student;

OK

tom 11 1

jerry 12 2

mike 13 3

john 14 4

mary 15 5

Time taken: 0.068 seconds, Fetched: 5 row(s)

  • 开始体验内置函数;

总览

  1. 进入hive控制台;

  2. 执行命令show functions;显示内置函数列表:

hive> show functions;

OK

!

!=

%

&

/

<

<=

<=>

<>

=

==

=

^

abs

acos

add_months

and

array

array_contains

ascii

asin

assert_true

atan

avg

base64

between

bin

case

cbrt

ceil

ceiling

coalesce

collect_list

collect_set

compute_stats

concat

concat_ws

context_ngrams

conv

corr

cos

count

covar_pop

covar_samp

create_union

cume_dist

current_database

current_date

current_timestamp

current_user

date_add

date_format

date_sub

datediff

day

dayofmonth

decode

degrees

dense_rank

div

e

elt

encode

ewah_bitmap

ewah_bitmap_and

ewah_bitmap_empty

ewah_bitmap_or

exp

explode

factorial

field

find_in_set

first_value

floor

format_number

from_unixtime

from_utc_timestamp

get_json_object

greatest

hash

hex

histogram_numeric

hour

if

in

in_file

index

initcap

inline

instr

isnotnull

isnull

java_method

json_tuple

lag

last_day

last_value

lcase

lead

least

length

levenshtein

like

ln

locate

log

log10

log2

lower

lpad

ltrim

map

map_keys

map_values

matchpath

max

min

minute

month

months_between

named_struct

negative

next_day

ngrams

noop

noopstreaming

noopwithmap

noopwithmapstreaming

not

ntile

nvl

or

parse_url

parse_url_tuple

percent_rank

percentile

percentile_approx

pi

pmod

posexplode

positive

pow

power

printf

radians

rand

rank

reflect

reflect2

regexp

regexp_extract

regexp_replace

repeat

reverse

rlike

round

row_number

rpad

rtrim

second

sentences

shiftleft

shiftright

shiftrightunsigned

sign

sin

size

sort_array

soundex

space

split

sqrt

stack

std

stddev

stddev_pop

stddev_samp

str_to_map

struct

substr

substring

sum

tan

to_date

to_unix_timestamp

to_utc_timestamp

translate

trim

trunc

ucase

unbase64

unhex

unix_timestamp

upper

var_pop

var_samp

variance

weekofyear

when

windowingtablefunction

xpath

xpath_boolean

xpath_double

xpath_float

xpath_int

xpath_long

xpath_number

xpath_short

xpath_string

year

|

~

Time taken: 0.003 seconds, Fetched: 216 row(s)

  1. 以lower函数为例,执行命令describe function lower;即可查看lower函数的说明:

hive> describe function lower;

OK

lower(str) - Returns str with all characters changed to lowercase

Time taken: 0.005 seconds, Fetched: 1 row(s)

  • 接下来从计算函数开始,体验常用函数;

  • 先执行以下命令,使查询结果中带有字段名:

set hive.cli.print.header=true;

计算函数

  1. 加法+:

hive> select name, age, age+1 as add_value from student;

OK

name age add_value

tom 11 12

jerry 12 13

mike 13 14

john 14 15

mary 15 16

Time taken: 0.098 seconds, Fetched: 5 row(s)

  1. 减法(-)、乘法(*)、除法(/)的使用与加法类似,不再赘述了;

  2. 四舍五入round:

hive> select round(1.1), round(1.6);

OK

_c0 _c1

1.0 2.0

Time taken: 0.028 seconds, Fetched: 1 row(s)

  1. 向上取整ceil:

hive> select ceil(1.1);

OK

_c0

2

Time taken: 0.024 seconds, Fetched: 1 row(s)

  1. 向下取整floor:

hive> select floor(1.1);

OK

_c0

1

Time taken: 0.024 seconds, Fetched: 1 row(s)

  1. 平方pow,例如pow(2,3)表示2的三次方,等于8:

hive> select pow(2,3);

OK

_c0

8.0

Time taken: 0.027 seconds, Fetched: 1 row(s)

  1. 取模pmod:

hive> select pmod(10,3);

OK

_c0

1

Time taken: 0.059 seconds, Fetched: 1 row(s)

字符函数

  1. 转小写lower,转大写upper:

hive> select lower(name), upper(name) from student;

OK

_c0 _c1

tom TOM

jerry JERRY

mike MIKE

john JOHN

mary MARY

Time taken: 0.051 seconds, Fetched: 5 row(s)

  1. 字符串长度length:

hive> select name, length(name) from student;

OK

tom 3

jerry 5

mike 4

john 4

mary 4

Time taken: 0.322 seconds, Fetched: 5 row(s)

  1. 字符串拼接concat:

hive> select concat(“prefix_”, name) from student;

OK

prefix_tom

prefix_jerry

prefix_mike

prefix_john

prefix_mary

Time taken: 0.106 seconds, Fetched: 5 row(s)

  1. 子串substr,substr(xxx,2)表示从第二位开始到右边所有,substr(xxx,2,3)表示从第二位开始取三个字符:

hive> select substr(“0123456”,2);

OK

123456

Time taken: 0.067 seconds, Fetched: 1 row(s)

hive> select substr(“0123456”,2,3);

OK

123

Time taken: 0.08 seconds, Fetched: 1 row(s)

  1. 去掉前后空格trim:

hive> select trim(" 123 ");

OK

123

Time taken: 0.065 seconds, Fetched: 1 row(s)

json处理(get_json_object)

为了使用json处理的函数,先准备一些数据:

  1. 先创建表t15,只有一个字段用于保存字符串:

create table t15(json_raw string)

row format delimited;

  1. 创建t15.txt文件,内容如下:

{“name”:“tom”,“age”:“10”}

{“name”:“jerry”,“age”:“11”}

  1. 加载数据到t15表:

load data

local inpath ‘/home/hadoop/temp/202010/25/015.txt’

into table t15;

  1. 使用get_json_object函数,解析json_raw字段,分别取出指定name和age属性:

select

get_json_object(json_raw, “$.name”),

get_json_object(json_raw, “$.age”)

from t15;

得到结果:

hive> select

get_json_object(json_raw, “$.name”),

get_json_object(json_raw, “$.age”)

from t15;

OK

tom 10

jerry 11

Time taken: 0.081 seconds, Fetched: 2 row(s)

日期

  1. 获取当前日期current_date:

hive> select current_date();

OK

2020-11-02

Time taken: 0.052 seconds, Fetched: 1 row(s)

  1. 获取当前时间戳current_timestamp:

hive> select current_timestamp();

OK

2020-11-02 10:07:58.967

Time taken: 0.049 seconds, Fetched: 1 row(s)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值