关于sparkSQL向MySQL写数据,网上已经有很多代码了。我所保存的数据中含有中文,因此编码存在问题,数据导入成功,但是乱码。本篇文章主要记录较简洁的一种解决办法。
先展示下错误数据:
解决办法:
- 首先在MySQL中创建一个table,注意字段要和所保存的字段一致。
- 之后设置各个中文字段的编码格式。注意:是设置字段,如果只设置table的编码格式为UTF8,还是会失败。
- sparkSQL的代码的设置。
建表:
CREATE TABLE Top3Goods(
area varchar(30),
product_name varchar(30),
total_clicks bigint,
city_remarks varchar(50));
#参数含义:旧字段名 第二个:新字段名
alter table Top3Goods change area area varchar(30) character set utf8;
alter table Top3Goods change product_name product_name varchar(30) character set utf8;
alter table Top3Goods change city_remarks city_remarks varchar(50) character set utf8;
sparkSQL的代码:在URL末尾加上UTF-8的编码设置
?useUnicode=true&characterEncoding=utf8
.mode(SaveMode.Append)模式表示追加。
val frame: DataFrame = spark.sql("select * from tmp4")
frame.show()
//mysql数据库连接所需参数:url、表名、驱动类、用户名、密码
//project1为database
val url = "jdbc:mysql://192.168.67.161:3306/project1?useUnicode=true&characterEncoding=utf8"
//table name
val table = "Top3Goods"
val driver = "com.mysql.jdbc.Driver"
val user = "root"
val password = "123456"
//表自动创建
frame.write.format("jdbc").option("url",url).option("driver", driver)
.option("dbtable", table).option("user", user).option("password", password)
.mode(SaveMode.Append)
.save()
保存后的数据:
最后展示一下只修改table编码的错误:
alter table score default character set utf8;
报错:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 14 in stage 21.0 failed 1 times, most recent failure: Lost task 14.0 in stage 21.0 (TID 606, LAPTOP-I491QSKF, executor driver): java.sql.BatchUpdateException: Incorrect string value: '\xE5\x8D\x8E\xE4\xB8\x9C' for column 'area' at row 1
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:404)
at com.mysql.jdbc.Util.getInstance(Util.java:387)
Caused by: java.sql.SQLException: Incorrect string value: '\xE5\x8D\x8E\xE4\xB8\x9C' for column 'area' at row 1
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:959)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3870)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3806)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2470)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2617)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2550)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861)
at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2073)
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1751)
... 16 more