pyspark 读mysql数据_使用PySpark读取MySQL

本文介绍了在PySpark中遇到`java.lang.ClassNotFoundException: com.mysql.jdbc.Driver`错误时,如何通过设置环境变量或修改`spark-defaults.conf`来解决,以便成功读取MySQL数据。提供了使用`os.environ`在Python代码中设置`PYSPARK_SUBMIT_ARGS`或在`spark-defaults.conf`添加依赖的解决方案。
摘要由CSDN通过智能技术生成

bd96500e110b49cbb3cd949968f18be7.png

I have the following test code:

from pyspark import SparkContext, SQLContext

sc = SparkContext('local')

sqlContext = SQLContext(sc)

print('Created spark context!')

if __name__ == '__main__':

df = sqlContext.read.format("jdbc").options(

url="jdbc:mysql://localhost/mysql",

driver="com.mysql.jdbc.Driver",

dbtable="users",

user="user",

password="****",

properties={"driver": 'com.mysql.jdbc.Driver'}

).load()

print(df)

When I run it, I get the following error:

java.lang.ClassNotFoundException: com.mysql.jdbc.Driver

In Scala, this is solved by importing the .jar mysql-connector-java into the project.

However, in python I have no idea how to tell the pyspark module to link the mysql-connector file.

I have seen this solved with examples like

spark --package=mysql-connector-java testfile.py

But I don't want this since it forces me to run my script in a weird way. I would like an all python solution or copy a file somewhere or, add something to the Path.

解决方案

You can pass arguments to spark-submit when creating your sparkContext before SparkConf is initialized:

import os

from pyspark import SparkConf, SparkContext

SUBMIT_ARGS = "--packages mysql:mysql-connector-java:5.1.39 pyspark-shell"

os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS

conf = SparkConf()

sc = SparkContext(conf=conf)

or you can add them to your $SPARK_HOME/conf/spark-defaults.conf

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值