当使用pyspark连接mysql时,出现:
com.mysql.cj.exceptions.InvalidConnectionAttributeException: The server time zone value 'PDT' is unrecognized or represents more than one time zone. You must configure either the server or JDBC driver (via the serverTimezone configuration property) to use a more specifc time zone value if you want to utilize time zone support.
大致为spark无法解析MySQL时区“PDT”
解决方法:将系统的时区更改为Asia,并重新启动MySQL,最后再次连接JDBC
1.我们先查询MySQL的时区
(启动mysql shell)
show variables like '%time_zone%';
可以看到MySQL的时区是跟随系统的,为PDT(美国时区),所以我们要修改系统的时区
2.将系统时区改为上海时区
(回退至terminal)
timedatectl #查看时区
sudo timedatectl set-timezone Asia/Shanghai #修改系统时区
查看系统时区是否改为CST
3.重新启动MySQL
service mysqld restart
再次查询MySQL的时区
(启动mysql shell)
show variables like '%time_zone%';
4.再次连接mysql
(启动pyspark)
jdbcDF = spark.read .format("jdbc") .option("driver","com.mysql.jdbc.Driver") .option("url", "jdbc:mysql://localhost:3306/spark") .option("dbtable", "student") .option("user", "root") .option("password", "hadoop") .load()
注:这里设置了查看mysql的student表,如mysql中无student,请先创建spark库student表
查看表结果
jdbcDF.show()
时区统一!连接成功!