要先向环境中添加mysql的jar包:
数据库端口:3306
用户:root
密码:root
数据库:spark
第一种写法:在已有的表中读取和添加数据
创建数据库sql语句:
CREATE database spark;
use spark;
create table student (id int(4), name char(20), gender char(4), age int(4));
alter table student change id id int auto_increment primary key;
insert into student values(1,'Xueqian','F',23);
insert into student values(2,'Weiliang','M',24);
select * from student;
from pyspark.sql.types import *
from pyspark.sql import SparkSession
if __name__ == '__main__':
# 0. 构建执行环境入口对象SparkSession
spark = SparkSession.builder.\
appName("test").\
master("local[*]").\
config("spark.sql.shuffle.partitions", 2).\
getOrCreate()
sc = spark.sparkContext
#尝试连接本地数据库
jdbcDF= spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/spark?serverTimezone=UTC&useSSL=false&useUnicode=true").\
option("driver","com.mysql.cj.jdbc.Driver").option(
"dbtable", "student").option("user", "root").option("password", "root").load()
jdbcDF.show()
#想本地数据库写数据
studentRDD = spark.sparkContext.parallelize(["3 Rongcheng M 26","4 Guanhua M 27"]).map(lambda line : line.split(" "))
# 下面要设置模式信息
schema = StructType([StructField("name", StringType(), True),StructField("gender", StringType(), True),StructField("age",IntegerType(), True)])
rowRDD = studentRDD.map(lambda p : Row(p[1].strip(), p[2].strip(),int(p[3])))
# 建立起Row对象和模式之间的对应关系,也就是把数据和模式对应起来
studentDF = spark.createDataFrame(rowRDD, schema)
prop = {}
prop['user'] = 'root'
prop['password'] = 'root'
prop['driver'] = "com.mysql.cj.jdbc.Driver"
studentDF.write.jdbc("jdbc:mysql://localhost:3306/spark?serverTimezone=UTC&useSSL=false&useUnicode=true",'student','append', prop)
第二种写法:读取文件的数据写入到表中
# coding:utf8
import time
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StringType, IntegerType
import pandas as pd
from pyspark.sql import functions as F
if __name__ == '__main__':
# 0. 构建执行环境入口对象SparkSession
spark = SparkSession.builder.\
appName("test").\
master("local[*]").\
config("spark.sql.shuffle.partitions", 2).\
getOrCreate()
sc = spark.sparkContext
# 1. 读取数据集
schema = StructType().add("user_id", StringType(), nullable=True). \
add("movie_id", IntegerType(), nullable=True). \
add("rank", IntegerType(), nullable=True). \
add("ts", StringType(), nullable=True)
df = spark.read.format("csv"). \
option("sep", "\t"). \
option("header", False). \
option("encoding", "utf-8"). \
schema(schema=schema). \
load("../data/input/sql/u.data")
# 1. 写出df到mysql数据库中
df.write.mode("overwrite").\
format("jdbc").\
option("url", "jdbc:mysql://localhost:3306/spark?serverTimezone=UTC&useSSL=false&useUnicode=true").\
option("dbtable", "movie_data"). \
option("driver", "com.mysql.cj.jdbc.Driver").\
option("user", "root").\
option("password", "root").\
save()
df2 = spark.read.format("jdbc"). \
option("url", "jdbc:mysql://localhost:3306/spark?serverTimezone=UTC&useSSL=false&useUnicode=true"). \
option("dbtable", "student"). \
option("driver", "com.mysql.cj.jdbc.Driver").\
option("user", "root"). \
option("password", "root"). \
load()
df2.printSchema()
df2.show()
"""
JDBC写出, 会自动创建表的.
因为DataFrame中有表结构信息, StructType记录的 各个字段的 名称 类型 和是否运行为空
"""
其中,如果出现报错信息:
py4j.protocol.Py4JJavaError: An error occurred while calling o46.load.
: java.sql.SQLException: Unable to load authentication plugin 'caching_sha2_password'.at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:868)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:864)
at
因为com.mysql.jdbc.Driver 是 mysql-connector-java 5中的,
com.mysql.cj.jdbc.Driver 是 mysql-connector-java 6以及以上中的
所以要加上
option("driver", "com.mysql.cj.jdbc.Driver")