Spark sql报错:Caused by: java.util.regex.PatternSyntaxException: Illegal repetition near index 1

Spark sql报错:Caused by: java.util.regex.PatternSyntaxException: Illegal repetition near index 1

一、问题描述

Spark执行sql去解析hive中json形式的string,结果如下执行语句

spark.sql("SELECT split(regexp_replace(regexp_replace(data,'\\[\\]',''),'\\}\\,\\{','\\}\\;\\{'),';') from tt").show

报错:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5rVT2kLe-1623895891500)(tmp.assets/1623895629977.png)]

Caused by: java.util.regex.PatternSyntaxException: Illegal repetition near index 1
},{
 ^
  at java.util.regex.Pattern.error(Pattern.java:1957)
  at java.util.regex.Pattern.closure(Pattern.java:3159)
  at java.util.regex.Pattern.sequence(Pattern.java:2136)
  at java.util.regex.Pattern.expr(Pattern.java:1998)
  at java.util.regex.Pattern.compile(Pattern.java:1698)
  at java.util.regex.Pattern.<init>(Pattern.java:1351)
  at java.util.regex.Pattern.compile(Pattern.java:1028)
  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)

但是,在hive中执行没有问题

 select * from tt;
 +----------------------------------------------------+
|                      tt.data                       |
+----------------------------------------------------+
| {"movie":"594","rate":"4","timeStamp":"978302268","uid":"1"} |
| [{"website":"www.baidu.com","name":"百"},{"website":"google.com","name":"谷歌"}] |
+----------------------------------------------------+


SELECT split(regexp_replace(regexp_replace(data,'\\[\\]',''),'\\}\\,\\{','\\}\\;\\{'),';') from tt;

+----------------------------------------------------+
|                        _c0                         |
+----------------------------------------------------+
| ["{\"movie\":\"594\",\"rate\":\"4\",\"timeStamp\":\"978302268\",\"uid\":\"1\"}"] |
| ["[{\"website\":\"www.baidu.com\",\"name\":\"百\"}","{\"website\":\"google.com\",\"name\":\"谷歌\"}]"] |
+----------------------------------------------------+

二、问题原因

在spark sql中,’\’这种特殊字符,需要转义字符’\’,所以,导致需要添加’\’来转义’\’这样的字符。所以,需要在\前面再添加\即可。

三、解决方案

scala> spark.sql("SELECT split(regexp_replace(regexp_replace(data,'\\\\[\\\\]',''),'\\\\}\\\\,\\\\{','\\\\}\\\\;\\\\{'),';') from tt").show
      
+----------------------------------------------------------------------+        
|split(regexp_replace(regexp_replace(data, \[\], ), \}\,\{, \}\;\{), ;)|
+----------------------------------------------------------------------+
|                                                  [[{"website":"www...|
|                                                  [{"movie":"594","...|
+----------------------------------------------------------------------+


  • 6
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值