Spark应用程序支持通过JDBC方式连接MySQL数据库并且读取或者保存数据。
1. 驱动报错
如果 spark/jars 目录下没有MySQL的连接驱动会报下面错误:
解决方法:
cp /export/software/mysql-connector-java-8.0.13.jar /export/server/spark/jars # 将驱动拷贝到spark安装目录
cd /export/server/spark/bin/
./pyspark --jars /export/server/spark/jars/mysql-connector-java-8.0.13.jar --driver-class-path /export/server/spark/jars/mysql-connector-java-8.0.13.jar # 启动时添加jars和driver参数
2. 从数据库读数据
spark.read.format("jdbc").\
option("url", "jdbc:mysql://Master:3306/bigdata").\
option("dbtable", "province_total_sale").\
option("user", "root").\
option("password", "root").\
load()
3. 保存数据到数据库
方式1:
perp = {}
perp["user"]="root"
perp["password"]="root"
province_sale_df.write.jdbc("jdbc:mysql://Master:3306/bigdata?useSSL=false&useUnicode=true&characterEncoding=utf8&createDatabaseIfNotExist=true", "province_total_sale", "overwrite", perp)
方式2:
province_sale_df.write.mode("overwrite").format("jdbc").\
option("url", "jdbc:mysql://Master:3306/bigdata?useSSL=false&useUnicode=true&characterEncoding=utf8").\
option("dbtable", "province_total_sale").\
option("user", "root").\
option("password", "root").\
option("encoding", "utf-8").\
save()
4. 保存到Hive的数据仓库中
province_sale_df.write.mode("overwrite").saveAsTable("default.province_total_sale", "parquet") # default库的province_total_sale表