一、背景
环境 | 服务器 | mysql | mysql-connector-java.jar |
本地 | mac | 5.7.32 | 5.1.48 |
线上 | centos | 5.7.32 | 5.1.48 |
utf8mb4支持的mysql版本为5.5.3+,若mysql低于该版本请先升级
二、问题
网上查阅,都明确指向mysql的表情问题
22/05/30 14:47:41 WARN [task-result-getter-0] TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, dasd-test-100, executor 1): java.sql.SQLException: Incorrect string value: '\xF0\x9F\x8E\x81' for column 'nick_name' at row 1 Query:
at org.apache.commons.dbutils.AbstractQueryRunner.rethrow(AbstractQueryRunner.java:527)
at org.apache.commons.dbutils.QueryRunner.batch(QueryRunner.java:195)
at org.apache.commons.dbutils.QueryRunner.batch(QueryRunner.java:151)
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2118)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2118)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
本地执行的代码没有问题。但是线上执行就会报错。起初一直怀疑是服务器的问题,可能是服务器对表情的解析异常,查阅资料显示不是。注意比较坑的一个地方是,mysql的utf8不是java 的utf8。
如果在设置jdbc连接的时候不指定characterEncoding=utf8的话,系统就会自动识别匹配类型。
三、解决
-
服务端: 检查mysql的编码格式是否是utf8mb4,在mysql 里面utf8是3个字节分割的,表情是4个字节
-
客户端:检查Java代码是否是utf8编码的
-
连接: Connection是utf8mb4
-
val qr: QueryRunner = new QueryRunner() val conn: Connection = DbInstance.getDataSource.getConnection qr.update(conn, "SET NAMES utf8mb4") qr.batch(conn, sql, params)