sparksql写数据入mysql的两种方式

一、 spark.write.mode().jdbc()查询后写入

1、代码

package com.cn.sparkSql

import java.util.Properties

import org.apache.spark.sql.{Dataset, Row, SaveMode, SparkSession}

object SparksqlWriteToMysql {
  def main(args: Array[String]): Unit = {
    val spark: SparkSession = SparkSession
      .builder()
      .appName("writeToMysql")
      .master("local[*]")
      .getOrCreate()
    spark.sparkContext.setLogLevel("WARN")

    val properties = new Properties()
    properties.setProperty("user","root")
    properties.setProperty("password","123456")
    val url="jdbc:mysql://192.168.230.21:3306/test?characterEncoding=utf8&useSSL=true"

    val df: Dataset[Row] = spark.read.jdbc(url, "walk_info", properties)
      .where("newwalknum>1")

    df.write.mode(SaveMode.Append).jdbc(url,"walk_info_tmp",properties)

    spark.stop()
  }
}

2、说明

(1)walk_info_tmp会自动创建,类型会有所改变

mysql> desc walk_info;
+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| user       | varchar(20)  | YES  |     | NULL    |       |
| counttime  | varchar(40)  | YES  |     | NULL    |       |
| walkplace  | varchar(100) | YES  |     | NULL    |       |
| newwalknum | int(20)      | YES  |     | NULL    |       |
+------------+--------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> desc walk_info_tmp;
+------------+---------+------+-----+---------+-------+
| Field      | Type    | Null | Key | Default | Extra |
+------------+---------+------+-----+---------+-------+
| user       | text    | YES  |     | NULL    |       |
| counttime  | text    | YES  |     | NULL    |       |
| walkplace  | text    | YES  |     | NULL    |       |
| newwalknum | int(11) | YES  |     | NULL    |       |
+------------+---------+------+-----+---------+-------+
4 rows in set (0.00 sec)

(2)在执行时插入中文会报下面错误

java.sql.BatchUpdateException: Incorrect string value: '\xE6\x93\x8D\xE5\x9C\xBA...' for column 'walkplace' at row 1

解决办法:

mysql>  ALTER TABLE walk_info_tmp CHANGE walkplace walkplace VARCHAR(64) CHARACTER SET utf8mb4 ;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

二、spark.write.mode().jdbc()通过构建dataFrame再写入

1、代码

package com.cn.sparkSql

import java.util.Properties

import com.sun.org.apache.xalan.internal.xsltc.compiler.util.IntType
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.types._
import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}

object SparlsqlSaveMysql {
  def main(args: Array[String]): Unit = {
    val spark: SparkSession = SparkSession
      .builder()
      .appName("writeToMysql")
      .master("local[*]")
      .getOrCreate()
    spark.sparkContext.setLogLevel("WARN")

    val properties = new Properties()
    properties.setProperty("user","root")
    properties.setProperty("password","123456")
    val url="jdbc:mysql://192.168.230.21:3306/test?characterEncoding=utf8&useSSL=true"

    val rdd: RDD[(String, String, String, Int)] = spark.sparkContext.parallelize(Array(("zhangSan", "2020-07-02 13:44:43", "草场北门", 2),
      ("Lisi", "2020-07-03 13:44:43", "草场南门", 5),
      ("WangWu", "2020-07-04 13:44:43", "草场西门", 8),
      ("Maliu", "2020-07-05 13:44:43", "草场东门", 3)
    ))
    val rdd2: RDD[Row] = rdd.map(e => {
      Row(e._1, e._2, e._3, e._4)
    })

    val scheme = StructType {
      List(
        StructField("user", StringType, true),
        StructField("counttime", StringType, true),
        StructField("walkplace",StringType, true),
        StructField("newwalknum", IntegerType, true)
      )
    }

    val df: DataFrame = spark.createDataFrame(rdd2,scheme)
    df.write.mode(SaveMode.Append).jdbc(url,"walk_info_test",properties)
    spark.stop()
  }
}

2、说明

(1)walk_info_test会自动创建

mysql> desc walk_info_test;
+------------+---------+------+-----+---------+-------+
| Field      | Type    | Null | Key | Default | Extra |
+------------+---------+------+-----+---------+-------+
| user       | text    | YES  |     | NULL    |       |
| counttime  | text    | YES  |     | NULL    |       |
| walkplace  | text    | YES  |     | NULL    |       |
| newwalknum | int(11) | YES  |     | NULL    |       |
+------------+---------+------+-----+---------+-------+
4 rows in set (0.01 sec)

(2)执行时报和第一种方式相同的错误

java.sql.BatchUpdateException: Incorrect string value: '\xE6\x93\x8D\xE5\x9C\xBA...' for column 'walkplace' at row 1

解决办法:

mysql> ALTER TABLE walk_info_test CHANGE walkplace walkplace VARCHAR(200) CHARACTER SET utf8mb4 ;
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> desc walk_info_test;
+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| user       | text         | YES  |     | NULL    |       |
| counttime  | text         | YES  |     | NULL    |       |
| walkplace  | varchar(200) | YES  |     | NULL    |       |
| newwalknum | int(11)      | YES  |     | NULL    |       |
+------------+--------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> select * from walk_info_test;
Empty set (0.00 sec)

三、存储模式的含义

SaveMode.ErrorIfExists【默认】模式,该模式下,如果数据库中已经存在该表,则会直接报异常,导致数据不能存入数据库;
SaveMode.Append 如果表已经存在,则追加在该表中;若该表不存在,则会先创建表,再插入数据;
SaveMode.Overwrite 重写模式,其实质是先将已有的表及其数据全都删除,再重新创建该表,最后插入新的数据;
SaveMode.Ignore 若表不存在,则创建表,并存入数据;在表存在的情况下,直接跳过数据的存储,不会报错。

四、注意

(1)如果将存储模式设置成SaveMode.Overwrite,则上面的错误解决不了,因为使用了解决办法,在执行代码就会重新生成同样的表和同样的错误;

(2)若提前手动创建Mysql表,需要注意Spark SQL 中Schema中的数据类型与Mysql中的数据类型的对应关系,如下图所示:

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

郝少

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值