Impala String函数大全

最新推荐文章于 2024-08-18 22:53:24 发布

原创最新推荐文章于 2024-08-18 22:53:24 发布 · 4.9w 阅读

80 ·

CC 4.0 BY-SA版权

文章标签：

#impala #sql

Impala 专栏收录该内容

3 篇文章

订阅专栏

本文详细介绍了Impala中的各种字符串函数及其用法，包括加密解密、字符串操作、搜索替换等实用功能，适用于处理varchar、char和string类型的数据。

Impala字符串函数

Impala中字符串函数主要应用于 varchar、char、string类型，如果把varchar或者char类型的值传递给字符串函数，返回将是一个string类型的值

函数列表

base64encode(string str)

base64decode(string str)

加密和解密,返回值为4字节的倍数，可以用来存储特殊字符串

--将hello world加密
[master:21000] > select base64encode('hello world') as encoded;
+------------------+
| encoded          |
+------------------+
| aGVsbG8gd29ybGQ= |
+------------------+
--将加密后的密文解密
[master:21000] > select base64decode('aGVsbG8gd29ybGQ=') as decoded;
+-------------+
| decoded     |
+-------------+
| hello world |
+-------------+

ascii(string str)

返回参数字符串的第一个字符的ascii码

--得到字符a的ascii码
[master:21000] > select ascii('a') as ascii;
+-------+
| ascii |
+-------+
| 97    |
+-------+
--验证是否只能返回第一个字符
[master:21000] > select ascii('abc') as ascii;
+-------+
| ascii |
+-------+
| 97    |
+-------+

chr(int character_code)

返回数值ascii码对应的字符

--得到数值97对应的字符
[master:21000] > select chr(97) as chr;
+-----+
| chr |
+-----+
| a   |
+-----+

btrim(string a)

去除字符串之前和之后的任意个数的空格

--去除hello前的空格
[master:21000] > select btrim('    hello ') as btrim;
+-------+
| btrim |
+-------+
| hello |
+-------+

btrim(string a,string chars_to_trim)

去除第一个字符串之前和之后的任何包含在第二个字符串中出现任意次数的字符（真的难理解QAQ）

--去除xyz并验证是否去除空格
[master:21000] > select btrim('xy    hello zyzzxx','xyz') as btrim;
+------------+
| btrim      |
+------------+
|     hello  |
+------------+
--验证是否会去除其他字符中间的应去除字符
[master:21000] > select btrim('xyhelxyzlozyzzxx','xyz') as btrim;
+----------+
| btrim    |
+----------+
| helxyzlo |
+----------+

char_length(string a)

character_length(string a)

返回字符串的长度，两个函数功能相同

--char_length得到hello world的长度
[master:21000] > select char_length('hello world') as char_length;
+-------------+
| char_length |
+-------------+
| 11          |
+-------------+
--通过函数character_length得到hello world的长度
[master:21000] > select character_length('hello world') as character_length;
+------------------+
| character_length |
+------------------+
| 11               |
+------------------+

concat(string a,string b…)

拼接多个字符串

--连接hello和world两个字符串
[master:21000] > select concat('hello','world') as concat;
+------------+
| concat     |
+------------+
| helloworld |
+------------+
--连接hello、world、cauchy三个字符串
[master:21000] > select concat('hello','world','cauchy') as concat;
+------------------+
| concat           |
+------------------+
| helloworldcauchy |
+------------------+

concat_ws(string sep,string a,string b…)

拼接多个字符串，由指定分隔符分割

--通过'-'连接两个字符串
[master:21000] > select concat_ws('-','hello','world') as concat_ws;
+-------------+
| concat_ws   |
+-------------+
| hello-world |
+-------------+

find_in_set(string str,string strList)

查找某个字符串在一个以逗号为分隔符的列表中第一次出现的位置（以1为起点），如果查询不到或查询字符串中出现’，’(逗号)，返回则为0

--在以逗号间隔的abcdefg中字符c第一次出现的位置
[master:21000] > select find_in_set('c','a,b,c,d,e,f,g') as find_in_set;
+-------------+
| find_in_set |
+-------------+
| 3           |
+-------------+
--在查询'，'的位置时的返回值
[master:21000] > select find_in_set(',','a,b,c,d,e,f,g') as find_in_set;
+-------------+
| find_in_set |
+-------------+
| 0           |
+-------------+
--在查询不存在字符的位置时的返回值
[master:21000] > select find_in_set('h','a,b,c,d,e,f,g') as find_in_set;
+-------------+
| find_in_set |
+-------------+
| 0           |
+-------------+

initcap(string str)

将字符串首字符大写并返回

--将'abc'首字母大写
[master:21000] > select initcap('abc') as initcap;
+---------+
| initcap |
+---------+
| Abc     |
+---------+

instr(string str,string substr)

返回较长字符串中第一次出现子字符串的位置（从1开始）

--在字符串'abcdefg'中查找'bcd'第一次出现的位置
[master:21000] > select instr('abcdefg','bcd') as instr;
+-------+
| instr |
+-------+
| 2     |
+-------+

length(string a)

返回参数字符串的字符长度

--得到字符串'abcdefg'的长度
[master:21000] > select length('abcdefg') as length;
+--------+
| length |
+--------+
| 7      |
+--------+

locate(string substr,string str,[int pos])

返回字符串中第一次出现子字符串的位置（从1开始），可指定位置

--返回长字符串中'bc'第一次出现的位置
[master:21000] > select locate('bc','abcdefgabc') as locate;
+--------+
| locate |
+--------+
| 2      |
+--------+
--返回长字符串中'bc'从第三位之后第一次出现的位置
[master:21000] > select locate('bc','abcdefgabc',3) as locate;
+--------+
| locate |
+--------+
| 9      |
+--------+

lower(string a)

lcase(string a)

返回全部为小写字符的字符串

--使用lower返回全小写的hello world
[master:21000] > select lower('Hello World') as lower;
+-------------+
| lower       |
+-------------+
| hello world |
+-------------+
--使用lcase返回全小写的hello world
[master:21000] > select lcase('Hello World') as lcase;
+-------------+
| lcase       |
+-------------+
| hello world |

upper(string a)

ucase(string a)

返回全部为大写字符的字符串

--使用upper返回全小写的hello world
[master:21000] > select upper('hello world') as upper;
+-------------+
| upper       |
+-------------+
| HELLO WORLD |
+-------------+
--使用ucase返回全小写的hello world
[master:21000] > select ucase('hello world') as ucase;
+-------------+
| ucase       |
+-------------+
| HELLO WORLD |
+-------------+

lpad(string str,int len,string pad)

返回更改了长度的第一个字符串，如果小于长度，则用pad字符串在左边补齐，如果大于长度，则从左边截取对应长度字符串返回

--从左边截取长度为7的'hello world'
[master:21000] > select lpad('hello world',7,'/') as lpad;
+---------+
| lpad    |
+---------+
| hello w |
+---------+
--从左边截取长度为13的'hello world',长度不足在左侧用'/'补齐
[master:21000] > select lpad('hello world',13,'/') as lpad;
+---------------+
| lpad          |
+---------------+
| //hello world |
+---------------+

rpad(string str,int len,string pad)

返回更改了长度的第一个字符串，如果小于长度，则用pad字符串在右边补齐，如果大于长度，则从左边截取对应长度字符串返回

--从左边截取长度为7的'hello world'
[master:21000] > select rpad('hello world',7,'/') as rpad;
+---------+
| rpad    |
+---------+
| hello w |
+---------+
--从左边截取长度为13的'hello world',长度不足在右侧用'/'补齐
[master:21000] > select rpad('hello world',13,'/') as rpad;
+---------------+
| rpad          |
+---------------+
| hello world// |
+---------------+

ltrim(string a)

返回参数字符串，并从左侧删除任何前导空格

--删除字符串'  hello  '左侧的所有空格
[master:21000] > select ltrim('  hello  ') as ltrim;
+---------+
| ltrim   |
+---------+
| hello   |
+---------+

rtrim(string a)

返回参数字符串，并从右侧删除任何后置空格

--删除字符串'  hello  '右侧的所有空格
[master:21000] > select rtrim('  hello  ') as rtrim;
+---------+
| rtrim   |
+---------+
|   hello |
+---------+

trim(string a)

去掉字符串中所有前导和后置空格

--去掉'  hello world  '的前导和后置空格
[master:21000] > select trim('  hello world  ') as trim;
+-------------+
| trim        |
+-------------+
| hello world |
+-------------+

regexp_extract(string subject,string pattern,int index)

返回通过正则表达式提取的字符串，
impala使用\字符进行转义，所以\d需要\d，也可以采用[[:digit:]]

--匹配任意字符以数字结尾，返回匹配的整个字符串
[master:21000] >  select regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',0);
+------------------------------------------------------+
| regexp_extract('abcdef123ghi456jkl', '.*?(\\d+)', 0) |
+------------------------------------------------------+
| abcdef123ghi456                                      |
+------------------------------------------------------+
--匹配任意字符以数字结尾，只返回匹配的第一个值
[master:21000] > select regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',1);
+------------------------------------------------------+
| regexp_extract('abcdef123ghi456jkl', '.*?(\\d+)', 1) |
+------------------------------------------------------+
| 456                                                  |
+------------------------------------------------------+
--匹配任意字符以小写字母结尾，返回匹配的整个字符串
[master:21000] > select regexp_extract('AbcdBCdefGHI','.*?([[:lower:]]+)',0);
+--------------------------------------------------------+
| regexp_extract('abcdbcdefghi', '.*?([[:lower:]]+)', 0) |
+--------------------------------------------------------+
| AbcdBCdef                                              |
+--------------------------------------------------------+
--匹配任意字符以小写字母结尾，只返回匹配的第一个值
[master:21000] > select regexp_extract('AbcdBCdefGHI','.*?([[:lower:]]+)',1);
+--------------------------------------------------------+
| regexp_extract('abcdbcdefghi', '.*?([[:lower:]]+)', 1) |
+--------------------------------------------------------+
| def                                                    |
+--------------------------------------------------------+

regexp_like(string source,string pattern,[string options])

返回true或者false，表示字符串是否包含正则表达式的值
options参数：
- c：区分大小写匹配（默认）
- i：不区分大小写
- m：多行匹配
- n：换行符匹配

--判断字符'foo'是否包含'f'
[master:21000] > select regexp_like('foo','f');
+-------------------------+
| regexp_like('foo', 'f') |
+-------------------------+
| true                    |
+-------------------------+
--判断字符'foo'是否包含'F'
[master:21000] > select regexp_like('foo','F');
+-------------------------+
| regexp_like('foo', 'f') |
+-------------------------+
| false                   |
+-------------------------+
--判断字符'foo'是否包含'f',设置参数不区分大小写
[master:21000] > select regexp_like('foo','F','i');
+------------------------------+
| regexp_like('foo', 'f', 'i') |
+------------------------------+
| true                         |
+------------------------------+

regexp_replace(string initial,string pattern,string replacement)

替换字符串与正则表达式匹配项为新字符串并返回

--将字符串中任意的字符'b'替换为'xyz'
[master:21000] > select regexp_replace('aaabbbaaa','b+','xyz');
+------------------------------------------+
| regexp_replace('aaabbbaaa', 'b+', 'xyz') |
+------------------------------------------+
| aaaxyzaaa                                |
+------------------------------------------+
--将字符串中任意的非数字字符替换为''(空)
[master:21000] > select regexp_replace('123-456-789','[^[:digit:]]','');
+---------------------------------------------------+
| regexp_replace('123-456-789', '[^[:digit:]]', '') |
+---------------------------------------------------+
| 123456789                                         |
+---------------------------------------------------+

repeat(string str,int n)

返回指定重复次数的字符串

--将'hello'重复5次
[master:21000] > select repeat('hello',5) as repeat;
+---------------------------+
| repeat                    |
+---------------------------+
| hellohellohellohellohello |
+---------------------------+

reverse(string a)

返回反转字符串

--反转字符串'hello world'
[master:21000] > select reverse('hello world') as reverse;
+-------------+
| reverse     |
+-------------+
| dlrow olleh |
+-------------+

space(int n)

返回指定数量的空格的连接字符串

--返回5个连续空格的字符串
[master:21000] > select space(5) as space;
+-------+
| space |
+-------+
|       |
+-------+

split_part(string source,string delimiter,bigint n)

以delimiter字符串作为拆分项，取第n个字符串返回

--以','为分隔符拆分'x,y,z'并返回第1个字符串
[master:21000] > select split_part('x,y,z',',',1);
+-----------------------------+
| split_part('x,y,z', ',', 1) |
+-----------------------------+
| x                           |
+-----------------------------+
--以','为分隔符拆分'x,y,z'并返回第2个字符串
[master:21000] > select split_part('x,y,z',',',2);
+-----------------------------+
| split_part('x,y,z', ',', 2) |
+-----------------------------+
| y                           |
+-----------------------------+
--以','为分隔符拆分'x,y,z'并返回第3个字符串
[master:21000] > select split_part('x,y,z',',',3);
+-----------------------------+
| split_part('x,y,z', ',', 3) |
+-----------------------------+
| z                           |
+-----------------------------+

strleft(string a,int num_chars)

截取字符串，返回左边的n个字符

--从左边截取字符串'hello world'，返回长度为4的字符串
[master:21000] > select strleft('hello world',4) as strleft;
+---------+
| strleft |
+---------+
| hell    |
+---------+

strright(string a,int num_chars)

截取字符串，返回右边的n个字符

--从右边截取字符串'hello world'，返回长度为4的字符串
[master:21000] > select strright('hello world',4) as strright;
+----------+
| strright |
+----------+
| orld     |
+----------+

substr(string a,int start,[int len])

substring(string a,int start,[int len])

返回从指定点开始的字符串部分,可选地指定最大长度

--截取字符串'hello world'，从第6位开始
[master:21000] > select substr('hello world',6) as substr;
+--------+
| substr |
+--------+
|  world |
+--------+
--截取字符串'hello world'，从第6位开始，长度为3
[master:21000] > select substr('hello world',6,3) as substr;
+--------+
| substr |
+--------+
|  wo    |
+--------+
--截取字符串'hello world'，从第6位开始
[master:21000] > select substring('hello world',6) as substring;
+-----------+
| substring |
+-----------+
|  world    |
+-----------+
--截取字符串'hello world'，从第6位开始，长度为3
[master:21000] > select substring('hello world',6,3) as substring;
+-----------+
| substring |
+-----------+
|  wo       |
+-----------+

translate(string input,string from,string to)

将字符串中的一些字符替换为其他字符
注：不能替换字符串，from字符串与to字符串一一对应，再替换 input字符串中所有对应字符

--将'world'替换为'cauchy',只能匹配到想相同长度,即'cauch',且拆分为w->c,o->a,r->u,l->c,d->h
[master:21000] > select translate('hello world','world','cauchy') as translate;
+-------------+
| translate   |
+-------------+
| hecca cauch |
+-------------+
--替换字符串中所有属于'world'的字符为'abcde'
[master:21000] > select translate('hello world','world','abcde') as translate;
+-------------+
| translate   |
+-------------+
| heddb abcde |
+-------------+