Spark sql报错:Caused by: java.util.regex.PatternSyntaxException: Illegal repetition near index 1
一、问题描述
Spark执行sql去解析hive中json形式的string,结果如下执行语句
spark.sql("SELECT split(regexp_replace(regexp_replace(data,'\\[\\]',''),'\\}\\,\\{','\\}\\;\\{'),';') from tt").show
报错:
Caused by: java.util.regex.PatternSyntaxException: Illegal repetition near index 1
},{
^
at java.util.regex.Pattern.error(Pattern.java:1957)
at java.util.regex.Pattern.closure(Pattern.java:3159)
at java.util.regex.Pattern.sequence(Pattern.java:2136)
at java.util.regex.Pattern.expr(Pattern.java:1998)
at java.util.regex.Pattern.compile(Pattern.java:1698)
at java.util.regex.Pattern.<init>(Pattern.java:1351)
at java.util.regex.Pattern.compile(Pattern.java:1028)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
但是,在hive中执行没有问题
select * from tt;
+----------------------------------------------------+
| tt.data |
+----------------------------------------------------+
| {"movie":"594","rate":"4","timeStamp":"978302268","uid":"1"} |
| [{"website":"www.baidu.com","name":"百"},{"website":"google.com","name":"谷歌"}] |
+----------------------------------------------------+
SELECT split(regexp_replace(regexp_replace(data,'\\[\\]',''),'\\}\\,\\{','\\}\\;\\{'),';') from tt;
+----------------------------------------------------+
| _c0 |
+----------------------------------------------------+
| ["{\"movie\":\"594\",\"rate\":\"4\",\"timeStamp\":\"978302268\",\"uid\":\"1\"}"] |
| ["[{\"website\":\"www.baidu.com\",\"name\":\"百\"}","{\"website\":\"google.com\",\"name\":\"谷歌\"}]"] |
+----------------------------------------------------+
二、问题原因
在spark sql中,’\’这种特殊字符,需要转义字符’\’,所以,导致需要添加’\’来转义’\’这样的字符。所以,需要在\前面再添加\即可。
三、解决方案
scala> spark.sql("SELECT split(regexp_replace(regexp_replace(data,'\\\\[\\\\]',''),'\\\\}\\\\,\\\\{','\\\\}\\\\;\\\\{'),';') from tt").show
+----------------------------------------------------------------------+
|split(regexp_replace(regexp_replace(data, \[\], ), \}\,\{, \}\;\{), ;)|
+----------------------------------------------------------------------+
| [[{"website":"www...|
| [{"movie":"594","...|
+----------------------------------------------------------------------+