Python写的Spark示例,报错与解决方法

 spark运行环境参考:https://blog.csdn.net/max_cola/article/details/78902597

对应的环境变量:

#java
export JAVA_HOME=/usr/local/jdk1.8.0_181  
export PATH=$JAVA_HOME/bin:$PATH
#python
export PYTHON_HOME=/usr/local/python3
export PATH=$PYTHON_HOME/bin:$PATH
#spark
export SPARK_HOME=/usr/local/spark                                                                              export PATH=$SPARK_HOME/bin:$PATH
#add spark to python
export PYTHONPATH=/usr/local/spark/python
#add pyspark to jupyter
export PYSPARK_PYTHON=/usr/local/python3/bin/python3 # 因为我们装了两个版本的python,所以要指定pyspark_python,>否则pyspark执行程序会报错。
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook --allow-root'

使用 python写的Spark示例:

# -*- coding: utf-8 -*-
from __future__ import print_function
from pyspark import *
import os
if __name__ == '__main__':
    sc = SparkContext("local[4]")
    sc.setLogLevel("WARN")
    rdd = sc.parallelize("hello Pyspark world".split(" "))
    counts = rdd \                                                                                              
       .flatMap(lambda line: line) \
       .map(lambda word: (word, 1)) \
       .reduceByKey(lambda a, b: a + b) \
       .foreach(print)
    sc.stop

出现如下错误

Traceback (most recent call last):
  File "test1.py", line 3, in <module>
    from pyspark import *
  File "/usr/local/spark/python/pyspark/__init__.py", line 46, in <module>
    from pyspark.context import SparkContext
  File "/usr/local/spark/python/pyspark/context.py", line 29, in <module>
    from py4j.protocol import Py4JError
ImportError: No module named py4j.protocol

解决方法:

#进入python的目录
/usr/local/python3/lib/python3.6/site-packages

#拷贝日志包过来
cp /usr/local/spark/python/lib/py4j-0.10.7-src.zip ./

#解压
unzip py4j-0.10.7-src.zip 


 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值