hive中regexp_replace,regexp_extract,replace和translate的区别

1.Hive内嵌函数说明
官方地址:hive函数大全官方地址

   Hive字符处理函数系列1:hive内嵌字符串函数1

String Functions
Return Type

Name(Signature)

Description

regexp_extract(string subject, string pattern, int index)

Returns the string extracted using the pattern. For example, regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 2) returns ‘bar.’ Note that some care is necessary in using predefined character classes: using ‘\s’ as the second argument will match the letter s; ‘\s’ is necessary to match whitespace, etc. The ‘index’ parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the ‘index’ or Java regex group() method…

抽取字符串subject中符合正则表达式pattern的第index个部分的子字符串,注意些预定义字符的使用,如第二个参数如果使用’\s’将被匹配到s,’\s’才是匹配空格

hive> select regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 2) —注意2是匹配分组的

         'bar.'

regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)	

Returns the string resulting from replacing all substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT. For example, regexp_replace(“foobar”, “oo|ar”, “”) returns ‘fb.’ Note that some care is necessary in using predefined character classes: using ‘\s’ as the second argument will match the letter s; ‘\s’ is necessary to match whitespace, etc…

按照Java正则表达式PATTERN将字符串INTIAL_STRING中符合条件的部分成REPLACEMENT所指定的字符串,如里REPLACEMENT这空的话,抽符合正则的部分将被去掉 如:regexp_replace(“foobar”, “oo|ar”, “”) = ‘fb.’ 注意些预定义字符的使用,如第二个参数如果使用’\s’将被匹配到s,’\s’才是匹配空格

string replace(string A, string OLD, string NEW) Returns the string A with all non-overlapping occurrences of OLD replaced with NEW (as of Hive 1.3.0 and 2.1.0). Example: select replace(“ababab”, “abab”, “Z”); returns “Zab”.
split(string str, string pat)
Splits str around pat (pat is a regular expression)…

按照正则表达式pat来分割字符串str,并将分割后的数组字符串的形式返回

string

translate(string|char|varchar input, string|char|varchar from, string|char|varchar to)

Translates the input string by replacing the characters present in the from string with the corresponding characters in the to string. This is similar to the translate function in PostgreSQL. If any of the parameters to this UDF are NULL, the result is NULL as well. (Available as of Hive 0.10.0, for string types)

Char/varchar support added as of Hive 0.14.0.

解释:

hive>select translate(‘abcdefga’,‘abc’,‘wo’)

   wodefgw   注意,结果不是wodefga,

hive>select replace(‘abcdefga’,‘abc’,‘wo’)

   wodefga   注意,两个结果,这就是replace和translate的区别。

map<string,string>

str_to_map(text[, delimiter1, delimiter2])

Splits text into key-value pairs using two delimiters. Delimiter1 separates text into K-V pairs, and Delimiter2 splits each K-V pair. Default delimiters are ‘,’ for delimiter1 and ‘:’ for delimiter2.

解释:

使用两个分隔符将文本拆分为键值对。

分隔符1将文本分成K-V对,分隔符2分割每个K-V对。对于分隔符1默认分隔符是 ‘,’,对于分隔符2默认分隔符是 ‘=’。

hive> select
str_to_map(concat(path_id,’:’,filter_name )) —是map类型的
from FDM_SOR.T_FIBA_MULTI_UBA_CFG_PATH_DETAIL_D
group by path_id,filter_Name

hive>

2.split,regexp_replace,regexp_extract的使用
2.1 split函数,支持使用正则表达式对字符串进行切割,返回值为数组

SELECT
SPLIT(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1’,’\?’) [0] AS A,
–对url进行使用?进行切割,返回值是数组,这里取?前面的值。
SPLIT(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1’,’\?’) [1] AS A1,
SPLIT(‘http://facebook.com/index.html’,’\?’) [0] AS B ,
SPLIT(‘http://facebook.com/index.html’,’\?’) [1] AS B1,
SPLIT(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1’,’\\w\/’) [0] AS C1
FROM FDM_SOR.T_PLPLFIS_TB_LOAN_APPLY_CEBANK_ED

注意:所有正则表达式中的预定义字符比如?,},|等需要在这里用\进行反转义才能表达本意。比如正则表达式中\w表示匹配字母,所以也属于预定义字符,单独的\w表示匹配的是字母w,而\\w才表示匹配字母。

2.2. regexp_replace函数,比较简单,难的是里面参数正则表达式的书写。

select
case when regexp_replace(uniscid,’[0-9A-HJ-NPQRTUWXY]{2}\d{6}[0-9A-HJ-NPQRTUWXY]{10}’,‘fbietl’) = ‘fbietl’ then uniscid
else null end uniscid,
from fdm_sor.aaaaaaaaaaaaaaa;
2.3 replace和translate的区别

replace:字符串级别的代替
translate:字符级别的代替
hive>select translate(‘abcdefga’,‘abc’,‘wo’)

   wodefgw   注意,结果不是wodefga,

hive>select replace(‘abcdefga’,‘abc’,‘wo’)

   wodefga   注意,两个结果,这就是replace和translate的区别。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值