报错内容:
2021-08-06 15:18:30 : Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.sql.BatchUpdateException: Incorrect string value: '\xF0\x9F\x93\xB1\xE7\x8F...' for column 'abc993' at row 7773
at com.gbase.jdbc.PreparedStatement.executeBatchedInserts(PreparedStatement.java:1825)
at com.gbase.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1450)
at org.apache.spark.sql.execution.datasources.jdbc.CntJdbcUtilsEnhance$.savePartitionWithCnt(CntJdbcUtilsEnhance.scala:117)
at org.apache.spark.sql.execution.datasources.jdbc.CntJdbcUtilsEnhance$$anonfun$saveTableWithCnt$1.apply(CntJdbcUtilsEnhance.scala:44)
at org.apache.spark.sql.execution.datasources.jdbc.CntJdbcUtilsEnhance$$anonfun$saveTableWithCnt$1.apply(CntJdbcUtilsEnhance.scala:44)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:935)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:935)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748). Driver stacktrace:
解决:字符集问题,oracle表数据中可能存在表情包或者其他特殊数据,通过oracle字符集转换函数CONVERT对报错字段abc993进行字符集转换,然后起别名。
CONVERT(ABC993, 'AL24UTFFSS') abc993