(1)在MySQL数据库中新建数据库sparktest,再建表employee,包含下列两行数据;
表1 employee表原有数据
id | name | gender | age |
1 | Alice | F | 22 |
2 | John | M | 25 |
create database sparktest;
use sparktest;
create table employee (id int(4), name char(20), gender char(4), age int(4));
insert into employee values(1,'Alice','F',22);
insert into employee values(2,'John','M',25);
select * from employee
(2)配置Spark通过JDBC连接数据库MySQL,编程实现利用DataFrame插入下列数据到MySQL,最后打印出age的最大值和age的总和。
表2 employee表新增数据
id | name | gender | age |
3 | Mary | F | 26 |
4 | Tom | M | 23 |
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import Row
from pyspark.sql.types import StructType
from pyspark.sql.types import StructField
from pyspark.sql.types import StringType
from pyspark.sql.types import IntegerType
sc = SparkContext( 'local', 'test')
spark=SQLContext(sc)
jdbcDF=spark.read.format("jdbc").option("url","jdbc:mysql://localhost:3306/sparktest").option("driver","com.mysql.cj.jdbc.Driver").option("dbtable","employee").option("user", "root").option("password", "Gh123456/").load()
jdbcDF.filter(jdbcDF.age>20).collect()
studentRDD = sc.parallelize(["3 Mary F 26","4 Tom M 23"]).map(lambda line : line.split(" "))
schema = StructType([StructField("id",IntegerType(),True),StructField("name", StringType(), True),StructField("gender", StringType(), True),StructField("age",IntegerType(), True)])
rowRDD = studentRDD.map(lambda p : Row(int(p[0]),p[1].strip(), p[2].strip(),int(p[3])))
employeeDF = spark.createDataFrame(rowRDD, schema)
prop = {}
prop['user'] = 'root'
prop['password'] = 'Gh123456/'
prop['driver'] = "com.mysql.cj,jdbc.Driver"
employeeDF.write.jdbc("jdbc:mysql://localhost:3306/sparktest",'employee','append', prop)
jdbcDF.collect()
jdbcDF.agg({"age": "max"}).show()
jdbcDF.agg({"age": "sum"}).show()
总结
- 通过对比MySQL8.0和MySQL5.7了解了mysql8.0更改了validate_password_policy相关的配置名称,这使得修改密码时。密码设置必须要大小写字母数字和特殊符号(,/';:等),不然不能配置成功,如果想使用简单的密码,可以修改MySQL8.0的密码策略。
- 编程中也遇到很多问题,如:在安装mysql时报错源……的GPG密钥已安装,但不适用于此安装包。请检查源的公钥URL是否配置正确。解决办法是输入如下命令:具体可以参考我的另一篇博客
记一次Linux安装mysql报错问题:失败的软件包是:mysql-community-common-8.0.30-1.el7.x86_64
rpm --import https://repo.mysql.com/RPM-GPG-KEY-mysql-2022
yum install -y mysql-community-server