translate介绍
String Functions
Return Type | Name(Signature) | Description |
---|---|---|
string | translate(string input, string from, string to) | Translates the input string by replacing the characters present in the from string with the corresponding characters in the to string. This is similar to the translate function in PostgreSQL. If any of the parameters to this UDF are NULL, the result is NULL as well. (Available as of Hive 0.10.0, for string types) Char/varchar support added as of Hive 0.14.0. |
string | regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT) | Returns the string resulting from replacing all substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT |
这里拿translate函数和regexp_replace进行了对比。简答来说,对字符串进行替换操作时,可以选用translate和regexp_replace,不过它们的原理和具体含义不太相同,下面会一一解析。
translate函数测试
//sql1
select TRANSLATE('abcdef-abcdef','abcd','1234');
_co
1234ef-1234ef
//sql2
select TRANSLATE('abcdef-abcdef','ace','123');
_co
1b2d3f-1b2d3f
结论:translate(input, from, to)
from这里是你想要替换的字符,而不是字符串,to 即想用哪些字符替换或者删除,而且两个里面的字符是一一对应的;
//sql3
select TRANSLATE('abcdef-abcdef','abcd','123');
_co
123ef-123ef
//sql4
select TRANSLATE ('abcdaabbaaabbb','aa','12');
_co
1bcd11bb111bbb
//sql5
select TRANSLATE ('abcdaabbaaabbb','a','123');
_co
1bcd11bb111bbb
结论:
01.如果 from 字符串长度>to的字符串长度,from中多余的字段会被删除(替换为"")
02.from里有重复字符,重复的对应to的替换不会起作用;
03.from长度<to的长度,不报错但是to里面长的字符没有意义;
需求
删除字符串"ABCDEFGHIOUUFEBCAADDEEII",中的元音字符,也就是出现在该字段中包含"AEIOU"任意一个字符都需要删除掉。
实现1:
select TRANSLATE ('ABCDEFGHIOUUFEBCAADDEEII','AEIOU','');
_co
BCDFGHFBCDD
实现2:
select regexp_replace ('ABCDEFGHIOUUFEBCAADDEEII','A|E|I|O|U','');
_co
BCDFGHFBCDD