pyspark 写入MySQL报错 An error occurred while calling o45.jdbc.: scala.MatchError: null 解决方案

64 篇文章 10 订阅
44 篇文章 6 订阅

当我尝试使用pySpark连接MySQL,将简单的spark dataframe写入MySQL数据时报错,

py4j.protocol.Py4JJavaError: An error occurred while calling o45.jdbc.: scala.MatchError: null 错误解决方案

(1)错误提示:

Fri Jul 13 16:22:56 CST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Traceback (most recent call last):
  File "/Users/a6/Downloads/speiyou_di/hive/log_task/111.py", line 47, in <module>
    df1.write.mode("append").jdbc(url="jdbc:mysql://localhost:3306/spark_db?user=root&password=yyz!123456", table="test_person", properties={"driver": 'com.mysql.jdbc.Driver'})
  File "/Library/Python/2.7/site-packages/pyspark/sql/readwriter.py", line 765, in jdbc
    self._jwrite.mode(mode).jdbc(url, table, jprop)
  File "/Library/Python/2.7/site-packages/py4j-0.10.6-py2.7.egg/py4j/java_gateway.py", line 1160, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/Library/Python/2.7/site-packages/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/Library/Python/2.7/site-packages/py4j-0.10.6-py2.7.egg/py4j/protocol.py", line 320, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o45.jdbc.
: scala.MatchError: null
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:63)
	at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
	at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:446)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)

(2)出错代码:

# !/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
# 设置spark_home
import os
os.environ["SPARK_HOME"] = "/Users/a6/Applications/spark-2.1.0-bin-hadoop2.6"

from pyspark.sql import SQLContext
from pyspark import SparkContext
sc = SparkContext(appName="pyspark mysql demo")
sqlContext = SQLContext(sc)

# 创建连接获取数据

# 本地测试
dataframe_mysql=sqlContext.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/spark_db").option("dbtable", "test_person").option("user", "root").option("password", "yyz!123456").load()

# 输出数据
print "\nstep1 、dataframe_mysql.collect()\n",dataframe_mysql.collect()
dataframe_mysql.registerTempTable("temp_table")
print dataframe_mysql.show()
print dataframe_mysql.count()

print "step 2、 准备待写入的数据"

from pyspark.sql.types import *

# user defined schema for json file.
schema = StructType([StructField("name", StringType()), StructField("age", IntegerType())])

# loading the contents of the json to the data frame with the user defined schema for json data.
d = [{'name': 'Alice1', 'age': 1}, {'name': 'tome1', 'age': 20}]
df1 = sqlContext.createDataFrame(d, schema)

# display the contents of the dataframe.
print df1.show()

# display the schema of the dataframe.
print df1.printSchema()

print "step3、写入数据"

# 本地测试
#  出错代码A
df1.write.mode("append").jdbc(url="jdbc:mysql://localhost:3306/spark_db?user=root&password=yyz!123456", table="test_person", properties={"driver": 'com.mysql.jdbc.Driver'})

# 正确代码B
#df1.write.jdbc(mode="overwrite", url="jdbc:mysql://localhost:3306/spark_db?user=root&password=yyz!123456", table="test_person", properties={"driver": 'com.mysql.jdbc.Driver'})

print "step4、写入成功,读取验证数据"
df1.show()

# 本地测试
dataframe_mysql=sqlContext.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/spark_db").option("dbtable", "test_person").option("user", "root").option("password", "yyz!123456").load()

# 输出数据
print "dataframe_mysql.collect()\n",dataframe_mysql.collect()

print "step 5、 所有执行成功"

(3)解决方案

        将【出错代码A】换成【正确代码B】,即可执行成功。比较可知,我们只是轻微做了调整。

(4)错误场景还原需要

首先,需要在本地创建数据库spark_db,同时创建test_person数据,具体如下:

create database spark_db;

CREATE TABLE `test_person` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `name` varchar(100) DEFAULT NULL,
  `age` int(3) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;

insert into test_person(name,age) values('yyz',18);

参考:https://stackoverflow.com/questions/49391933/pyspark-jdbc-write-error-an-error-occurred-while-calling-o43-jdbc-scala-matc

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值