Spark2.2 (八) SparkSql读写MySQL

本文介绍了如何使用Spark SQL读写MySQL数据库,强调了配置信息、需求示例和完整代码实现。在实践中,建议在保存数据前预先创建表并根据业务需求调整DataFrame分区,以避免Text类型带来的资源消耗和数据倾斜导致的效率问题。
摘要由CSDN通过智能技术生成

1.配置信息

#mysql数据库配置
mysql.driver=com.mysql.jdbc.Driver
mysql.url=jdbc:mysql://localhost:3306/test?serverTimezone=UTC
mysql.user=******
mysql.password=******

2.需求示例

1.mysql中score表示例数据
mysql> select * from score;
+-------+-------------+---------------+-------------+---------------+
| st_id | st_name     | chinese_score | match_score | english_score |
+-------+-------------+---------------+-------------+---------------+
|     1 | SunHui      |            84 |          91 |            71 |
|     2 | ZhouWen     |            91 |        74.5 |            94 |
|     4 | ZhengShuang |            86 |          73 |            79 |
|     5 | LiuHua      |            89 |          86 |            95 |
|     6 | ZhaoSi      |            79 |          88 |            81 |
|     7 | ZhangSan    |            85 |          96 |          92.5 |
|     8 | WeiZai      |            76 |          79 |            82 |
|     9 | WuFei       |            77 |          89 |            92 |
|    10 | ChenYe      |            89 |        76.5 |            84 |
|    11 | LiSi        |            81 |          90 |            93 |
|    12 | LiuChang    |            76 |          92 |            97 |
|    13 | WuHao       |            86 |          92 |            97 |
|    14 | HuangXin    |            87 |          76 |            98 |
|    15 | HuKai       |            73 |          89 |            90 |
|    16 | LiuBin      |            77 |          79 |            94 |
|    17 | XiaLe       |            76 |          89 |            84 |
|    18 | WangGang    |            76 |          94 |            87 |
+-------+-------------+---------------+-------------+---------------+
2.需求:统计学生的三科成绩之和,写入到total_score表个中

3.完整代码实现

package apply

import java.io.FileInputStream
import java.sql.{Connection, DriverManager}
import java.util.Properties

import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}

/**
  * @Program: apply
  * @Author: H.w
  * @Date: 2020/10/16 
  * @description: Spark读写MySQL
  */

object OperateDataFrameToMySQL {

  // 调整控制台日志输出级别
  Logger.getLogger("org").setLevel(Level.ERROR)

  val spark = SparkSession
    .builder()
    .appName("")
    .master("local[*]")
    .getOrCreate()

  /**
    * 读取Mysql
    * @param spark
    * @param tableName 表名
    * @param proPath 配置文件路径
    * @return
    */
  def readMySQLTable(spark:SparkSession, tableName: String, proPath: String) = {
    val properties = getProperties(proPath)
    spark.read
      .format("jdbc")
      .option("url", properties.getProperty("mysql.url"))
      .option("driver", properties.getProperty("mysql.driver"))
      .option("user", properties.getProperty("mysql.user"))
      .option("password", properties.getProperty("mysql.password"))
      .option("dbtable", tableName)
      .load()

  }

  /**
    * 将DataFrame保存为MySQL表
    * @param dataFrame
    * @param tableName 存些的表名
    * @param saveMode 保存模式
    * @param proPath 配置文件路径
    */
  def saveAsMysqlTable(dataFrame:DataFrame, tableName:String, saveMode: SaveMode, proPath: String) = {
    var table = tableName
    val properties = getProperties(proPath)
    val prop = new Properties() //配置文件中的key 与 spark 中的 key 不同 所以 创建prop 按照spark 的格式 进行配置数据库
    prop.setProperty("url", properties.getProperty("mysql.url"))
    prop.setProperty("driver", properties.getProperty("mysql.driver"))
    prop.setProperty("user", properties.getProperty("mysql.user"))
    prop.setProperty("password", properties.getProperty("mysql.password"))
    //当SaveMode=OverWrite时,需要先清理表再写数据 清理表的方法分两种
    //  第一种是truncate即清空表,如果是这种的话,则先清空表,然后再写数据
    //  第二种是drop掉表,如果是这种的话,则先drop表,然后建表,最后写数据
    if (saveMode == SaveMode.Overwrite) {
      var coon:Connection = null
      try {
        coon = DriverManager.getConnection(
          prop.getProperty("url"),
          prop.getProperty("user"),
          prop.getProperty("password")
        )
        val stmt = coon.createStatement()
        table = table.toUpperCase()
        stmt.execute(s"truncate table $table")
        coon.close()
      }
      catch {
        case e:Exception => {
          println("MySQL Error:")
          e.printStackTrace()
        }
      }
    }
    dataFrame.write.mode(SaveMode.Append).jdbc(prop.getProperty("url"),table, prop)

  }

  /**
    * 获取配置信息
    * @param proPath 配置文件路径
    * @return
    */
  def getProperties(proPath: String) = {
    val properties = new Properties()
    properties.load(new FileInputStream(proPath))
    properties
  }

  def main(args: Array[String]): Unit = {

    val proPath = "src/main/resources/job.properties"
    val scoreDF:DataFrame = readMySQLTable(spark,"score",proPath)

    scoreDF.createOrReplaceTempView("score")
    val totalScoreDF = spark.sql("SELECT st_id,st_name,chinese_score+match_score+english_score as total_score FROM score")
    saveAsMysqlTable(totalScoreDF,"total_score",SaveMode.Append,proPath)

  }

}

4.输出结果

mysql> select * from total_score;
+-------+-------------+-------------+
| st_id | st_name     | total_score |
+-------+-------------+-------------+
|     1 | SunHui      |         246 |
|     2 | ZhouWen     |       259.5 |
|     4 | ZhengShuang |         238 |
|     5 | LiuHua      |         270 |
|     6 | ZhaoSi      |         248 |
|     7 | ZhangSan    |       273.5 |
|     8 | WeiZai      |         237 |
|     9 | WuFei       |         258 |
|    10 | ChenYe      |       249.5 |
|    11 | LiSi        |         264 |
|    12 | LiuChang    |         265 |
|    13 | WuHao       |         275 |
|    14 | HuangXin    |         261 |
|    15 | HuKai       |         252 |
|    16 | LiuBin      |         250 |
|    17 | XiaLe       |         249 |
|    18 | WangGang    |         257 |
+-------+-------------+-------------+

5.总结

1.需要保存的表最好事先建好,否则字段类型会使用默认的,Text类型实在是耗资源

mysql> desc score;
+---------------+-------------+------+-----+---------+-------+
| Field         | Type        | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+-------+
| st_id         | int(11)     | NO   | PRI | NULL    |       |
| st_name       | varchar(20) | YES  |     | NULL    |       |
| chinese_score | double      | YES  |     | NULL    |       |
| match_score   | double      | YES  |     | NULL    |       |
| english_score | double      | YES  |     | NULL    |       |
+---------------+-------------+------+-----+---------+-------+
5 rows in set (0.07 sec)

mysql> desc total_score;
+-------------+---------+------+-----+---------+-------+
| Field       | Type    | Null | Key | Default | Extra |
+-------------+---------+------+-----+---------+-------+
| st_id       | int(11) | YES  |     | NULL    |       |
| st_name     | text    | YES  |     | NULL    |       |
| total_score | double  | YES  |     | NULL    |       |
+-------------+---------+------+-----+---------+-------+

2.自带的方法就是按照分区来存的,每一个分区开启一个mysql连接,所以最简单的优化方式就是在保存之前对DataFrame进行重新分区,注意数据倾斜问题,不然可能效率没有提升

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值