hive 内置正则匹配函数 regexp_replace,regexp_extract的分组捕获

最新推荐文章于 2024-09-19 16:34:12 发布

星期天的亚索

最新推荐文章于 2024-09-19 16:34:12 发布

阅读量2.7k

点赞数

分类专栏： hive 文章标签： hive 正则表达式

本文链接：https://blog.csdn.net/zbz1006572352/article/details/104944294

版权

hive 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

1. regexp_replace

先看下官网定义：

返回值	函数格式	解释
string	regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)	Returns the string resulting from replacing all substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT. For example, regexp_replace("foobar", "oo\|ar", "") returns 'fb.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc.

需要注意的是这里使用正则规定的符号时需要多加一个反斜杠 '\',比如想要匹配数据，就需要用'\\d'来匹配

在自己试验时发现该函数也可以做到分组捕获，使用'\$1'来匹配第一个分组，以此类推，直接看例子：

-- 分组捕获
select regexp_replace('123-456-789',"(\\d+)-(\\d+)-(\\d+)",'\$1+\$3+');

123+789


-- 匹配空白字符
select regexp_replace('hello world!',"\\s",'-');

hello-world!

2.regexp_extract

官方解释：

string

regexp_extract(string subject, string pattern, int index)

Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method.

重点也是在使用正则内置变量时需要多加一个反斜杠进行转义，比如匹配空白字符需要用'\\s'；

最后一个入参index就是java正则匹配的group()方法的入参，入参只能是一个整数，也就是每次只能取到一个分组

看例子：

--获取匹配的第2个分组
select regexp_extract('123-456-789','(\\d+)-(\\d+)-(\\d+)',2);

456

星期天的亚索

关注

0
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录