hive函数之-regexp_extract

最新推荐文章于 2024-07-11 17:15:11 发布

cclovezbf

最新推荐文章于 2024-07-11 17:15:11 发布

阅读量1.6k

点赞数

分类专栏： hive 文章标签： hive regexp_extract 正则表达式

本文链接：https://blog.csdn.net/cclovezbf/article/details/116230439

版权

hive 专栏收录该内容

49 篇文章 11 订阅

订阅专栏

要学习hive函数首先肯定是

desc function extended regexp_extract;

regexp_extract(str, regexp[, idx]) - extracts a group that matches regexp
Example:
> SELECT regexp_extract('100-200', '(\d+)-(\d+)', 1) FROM src LIMIT 1;
'100'

然而。。。。。。。看到结果后的我眼泪都要掉下来。我直接复制的啊，怎么和hive说明不一样？是我错了还是hive错了。

懒得去看官网了，直接百度查了下https://blog.csdn.net/jv_rookie/article/details/55211955

别人的案例是好的呀。灵机一动，没有转义。。

翻阅官网https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

string

regexp_extract(string subject, string pattern, int index)

Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method.

翻译下，

1返回值是string

2 函数使用规则regexp_extract(string subject, string pattern, int index)

记住index=从0开始 0代表匹配的整个字符串 1代表pattern的第一个括号() 2代表pattern的第二个括号() ，熟悉java匹配的应该清楚的匹配的group

3. 根据你指定的规则返回字符串，

regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar 解析正则foo(.*?)(bar) 匹配到了foothebar 其中the被第1个括号匹配到 bar被第2个括号匹配到。index=2 所以返回值=bar

注意\s会匹配到字符s , \\s才是正确的正则匹配匹配到空格/

示例

SELECT regexp_extract('100-200-陈池-abcd', '(\\d+)-(\\d+)(.*)', 0) ,--'我要全部配到的'
regexp_extract('100-200-陈池-abcd', '(\\d+)-(\\d+).*', 1) ,--'我要 100'
regexp_extract('100-200-陈池-abcd', '(\\d+)-(\\d+).*', 2) ,--'我要 200'
regexp_extract('100-200-陈池-abcd', '(\\d+)-(\\d+).*', 2) ,--'我要 -陈池-abcd'
regexp_extract('100-200-陈池-abcd', '(\\d+)-(\\d+)-(\\W+)-,*', 3)--我要陈池

cclovezbf

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
hive函数之-regexp_extract

要学习hive函数首先肯定是desc function extended regexp_extract;regexp_extract(str, regexp[, idx]) - extracts a group that matches regexpExample: > SELECT regexp_extract('100-200', '(\d+)-(\d+)', 1) FROM src LIMIT 1; '100'然而。。。。。。。看到结果后的我眼泪都要掉下来。我直接复制的..
复制链接

扫一扫