【问题】
小编集群使用Cloudera Manager搭建的,Spark的版本是1.6.1,Python是用Anaconda安装的,版本是3.6.3。在使用pyspark的时候,报了如下的错误:
[root@slave01 ~]# pyspark
Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 13 2017, 12:02:49)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-5.7.3-1.cdh5.7.3.p0.5/lib/spark/python/pyspark/shell.py", line 30, in <module>
import pyspark
File "/opt/cloudera/parcels/CDH-5.7.3-1.cdh5.7.3.p0.5/lib/spark/python/pyspark/__init__.py", line 41, in <module>
from pyspark.context import SparkContext
File "/opt/cloudera/parcels/CDH-5.7.3-1.cdh5.7.3.p0.5/lib/spark/python/pyspark/context.py", line 33, in <module>
from pyspark.java_gateway import launch_gateway
File "/opt/cloudera/parcels/CDH-5.7.3-1.cdh5.7.3.p0.5/lib/spark/python/pyspark/java_gateway.py", line 31, in <module>
from py4j.java_gateway import java_import, JavaGateway, GatewayClient
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 656, in _load_unlocked
File "<frozen importlib._bootstrap>", line 626, in _load_backward_compatible
File "/opt/cloudera/parcels/CDH-5.7.3-1.cdh5.7.3.p0.5/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 18, in <module>
File "/root/anaconda3/lib/python3.6/pydoc.py", line 59, in <module>
import inspect
File "/root/anaconda3/lib/python3.6/inspect.py", line 361, in <module>
Attribute = namedtuple('Attribute', 'name kind defining_class object')
File "/opt/cloudera/parcels/CDH-5.7.3-1.cdh5.7.3.p0.5/lib/spark/python/pyspark/serializers.py", line 381, in namedtuple
cls = _old_namedtuple(*args, **kwargs)
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
【解决办法】
pyspark的使用过程中,spark版本小于2.1的是不支持python3.6版本的,所以将python3.6的版本更换成小于3.6的版本。至此,问题解决~