一、 spark.write.mode().jdbc()查询后写入
1、代码
package com.cn.sparkSql
import java.util.Properties
import org.apache.spark.sql.{Dataset, Row, SaveMode, SparkSession}
object SparksqlWriteToMysql {
def main(args: Array[String]): Unit = {
val spark: SparkSession = SparkSession
.builder()
.appName("writeToMysql")
.master("local[*]")
.getOrCreate()
spark.sparkContext.setLogLevel("WARN")
val properties = new Properties()
properties.setProperty("user","root")
properties.setProperty("password","123456")
val url="jdbc:mysql://192.168.230.21:3306/test?characterEncoding=utf8&useSSL=true"
val df: Dataset[Row] = spark.read.jdbc(url, "walk_info", properties)
.where("newwalknum>1")
df.write.mode(SaveMode.Append).jdbc(url,"walk_info_tmp",properties)
spark.stop()
}
}
2、说明
(1)walk_info_tmp会自动创建,类型会有所改变
mysql> desc walk_info;
+------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| user | varchar(20) | YES | | NULL | |
| counttime | varchar(40) | YES | | NULL | |
| walkplace | varchar(100) | YES | | NULL | |
| newwalknum | int(20) | YES | | NULL | |
+------------+--------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
mysql> desc walk_info_tmp;
+------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------+------+-----+---------+-------+
| user | text | YES | | NULL | |
| counttime | text | YES | | NULL | |
| walkplace | text | YES | | NULL | |
| newwalknum | int(11) | YES | | NULL | |
+------------+---------+------+-----+---------+-------+
4 rows in set (0.00 sec)
(2)在执行时插入中文会报下面错误
java.sql.BatchUpdateException: Incorrect string value: '\xE6\x93\x8D\xE5\x9C\xBA...' for column 'walkplace' at row 1
解决办法:
mysql> ALTER TABLE walk_info_tmp CHANGE walkplace walkplace VARCHAR(64) CHARACTER SET utf8mb4 ;
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0
二、spark.write.mode().jdbc()通过构建dataFrame再写入
1、代码
package com.cn.sparkSql
import java.util.Properties
import com.sun.org.apache.xalan.internal.xsltc.compiler.util.IntType
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.types._
import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
object SparlsqlSaveMysql {
def main(args: Array[String]): Unit = {
val spark: SparkSession = SparkSession
.builder()
.appName("writeToMysql")
.master("local[*]")
.getOrCreate()
spark.sparkContext.setLogLevel("WARN")
val properties = new Properties()
properties.setProperty("user","root")
properties.setProperty("password","123456")
val url="jdbc:mysql://192.168.230.21:3306/test?characterEncoding=utf8&useSSL=true"
val rdd: RDD[(String, String, String, Int)] = spark.sparkContext.parallelize(Array(("zhangSan", "2020-07-02 13:44:43", "草场北门", 2),
("Lisi", "2020-07-03 13:44:43", "草场南门", 5),
("WangWu", "2020-07-04 13:44:43", "草场西门", 8),
("Maliu", "2020-07-05 13:44:43", "草场东门", 3)
))
val rdd2: RDD[Row] = rdd.map(e => {
Row(e._1, e._2, e._3, e._4)
})
val scheme = StructType {
List(
StructField("user", StringType, true),
StructField("counttime", StringType, true),
StructField("walkplace",StringType, true),
StructField("newwalknum", IntegerType, true)
)
}
val df: DataFrame = spark.createDataFrame(rdd2,scheme)
df.write.mode(SaveMode.Append).jdbc(url,"walk_info_test",properties)
spark.stop()
}
}
2、说明
(1)walk_info_test会自动创建
mysql> desc walk_info_test;
+------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------+------+-----+---------+-------+
| user | text | YES | | NULL | |
| counttime | text | YES | | NULL | |
| walkplace | text | YES | | NULL | |
| newwalknum | int(11) | YES | | NULL | |
+------------+---------+------+-----+---------+-------+
4 rows in set (0.01 sec)
(2)执行时报和第一种方式相同的错误
java.sql.BatchUpdateException: Incorrect string value: '\xE6\x93\x8D\xE5\x9C\xBA...' for column 'walkplace' at row 1
解决办法:
mysql> ALTER TABLE walk_info_test CHANGE walkplace walkplace VARCHAR(200) CHARACTER SET utf8mb4 ;
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> desc walk_info_test;
+------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| user | text | YES | | NULL | |
| counttime | text | YES | | NULL | |
| walkplace | varchar(200) | YES | | NULL | |
| newwalknum | int(11) | YES | | NULL | |
+------------+--------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
mysql> select * from walk_info_test;
Empty set (0.00 sec)
三、存储模式的含义
SaveMode.ErrorIfExists【默认】模式,该模式下,如果数据库中已经存在该表,则会直接报异常,导致数据不能存入数据库;
SaveMode.Append 如果表已经存在,则追加在该表中;若该表不存在,则会先创建表,再插入数据;
SaveMode.Overwrite 重写模式,其实质是先将已有的表及其数据全都删除,再重新创建该表,最后插入新的数据;
SaveMode.Ignore 若表不存在,则创建表,并存入数据;在表存在的情况下,直接跳过数据的存储,不会报错。
四、注意
(1)如果将存储模式设置成SaveMode.Overwrite,则上面的错误解决不了,因为使用了解决办法,在执行代码就会重新生成同样的表和同样的错误;
(2)若提前手动创建Mysql表,需要注意Spark SQL 中Schema中的数据类型与Mysql中的数据类型的对应关系,如下图所示: