pyspark:rdd.foreach(print)报错NameError

本文介绍了如何解决pyspark中因Python2版本导致的报错,包括`rdd.foreach(print)`报错和`collect()`输出`u`字符的问题。方法包括临时导入`print_function`,彻底升级到Python3,以及修改pyspark环境变量指向Python3。
部署运行你感兴趣的模型镜像

目录

报错原因

如何查看是不是这个错误

简便解决方法

彻底解决方法


报错原因

应该是pyspark里自带一个Python2版本,可以通升级pyspark自带的python版本来解决

除了rdd.foreach(print)报错NameError还有一个表现就是:

当使用rdd.collect()时,会出现不正常的u字母

[(u'DataStructure', 5), (u'Music', 1), (u'Algorithm', 5), (u'DataBase', 5)]

如何查看是不是这个错误

在启动的时候可以看到你的python版本

上图的python版本就是python 2.7.5

简便解决方法

每次启动pyspark时,先输入一条语句(from __future__ import print_function)即可

 from __future__ import print_function

彻底解决方法

1.安装python3

参考链接:centos安装python3详细教程_知行合一-CSDN博客_centos安装python3

如果要用的3.0以上的版本需要手动安装,下载地址:https://www.python.org/ftp/python/

(1)先查看系统python的位置在哪儿

whereis python

python2.7默认安装是在 /usr/bin目录中,切换到/usr/bin/

cd /usr/bin/

查看有关python的文件的详细信息

ll python*

从下面的图中可以看出,python指向的是python2,python2指向的是python2.7,因此我们可以装个python3,然后将python指向python3,然后python2指向python2.7,那么两个版本的python就能共存了

(2)下载python3的包之前,要先安装相关的依赖包,用于下载编译python3

yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gcc make

运行了以上命令以后,就安装了编译python3所用到的相关依赖

(3)默认的centos7是没有安装pip,先添加epel扩展源

yum -y install epel-release

(4)安装pip

yum install python-pip

(5)安装wget

yum -y install wget

(6)用wget下载python3的源码包,或者自己先下载好,上传到服务器再安装,如果网络快可以直接安装

wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tar.xz

(7)编译python3源码包,解压

xz -d Python-3.6.8.tar.xz
tar -xf Python-3.6.8.tar

(8)进入解压后的目录,依次执行下面命令进行手动编译

cd Python-3.6.8

./configure prefix=/usr/local/python3

./configure --enable-optimizations

make && make install

(9)安装依赖zlib、zlib-deve

yum install zlib zlib
yum install zlib zlib-devel

(10)最后没提示出错,就代表正确安装了,在/usr/local/目录下就会有python3目录

(11)添加软链接,将原来的链接备份

mv /usr/bin/python /usr/bin/python.bak

(12)添加python3的软链接

ln -s /usr/local/bin/python3.6 /usr/bin/python

(13)测试是否安装成功

python -V

(14)更改yum配置,因为其要用到python2才能执行,否则会导致yum不能正常使用

vi /usr/bin/yum

(15)把第一行的#! /usr/bin/python 修改为如下

#! /usr/bin/python2

(16)还有一个地方也需要修改

vi /usr/libexec/urlgrabber-ext-down

(17)把第一行的#! /usr/bin/python 修改如下

#! /usr/bin/python2

(18)启动python2

python2

(19)启动python3

python

改好之后的链接

2.pyspark设置python版本

参考链接:pyspark设置python的版本_abc_321a的博客-CSDN博客_spark指定python版本

(1)修改spark-env.sh文件,在末尾添加export PYSPARK_PYTHON=/usr/local/bin/python3.6

cd /home/hadoop/softs/spark-2.4.7/conf   //spark-env.sh文件在该目录下

(2)修改spark安装包bin目录下的pyspark

cd /home/hadoop/softs/spark-2.4.7/bin
vi pyspark

修改如图两个位置,将原来的python改成python3,如下图

(3)启动pyspark可以看到python版本变了

3.执行语句测试应该就不会报错了

如有侵权,联系删除

您可能感兴趣的与本文相关的镜像

Python3.11

Python3.11

Conda
Python

Python 是一种高级、解释型、通用的编程语言,以其简洁易读的语法而闻名,适用于广泛的应用,包括Web开发、数据分析、人工智能和自动化脚本

D:\Python\python.exe D:\Python代码\HelloWord\Pysparks\Pyspark.py WARNING: Using incubator modules: jdk.incubator.vector 25/11/05 22:03:58 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://cwiki.apache.org/confluence/display/HADOOP2/WindowsProblems Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 25/11/05 22:03:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Exception ignored in: <_io.BufferedRWPair object at 0x0000010E3D13AA40> Traceback (most recent call last): File "D:\Python\Lib\socket.py", line 737, in write OSError: [WinError 10038] 在一个非套接字上尝试了一个操作。 25/11/05 22:04:01 ERROR Executor: Exception in task 6.0 in stage 0.0 (TID 6) org.apache.spark.SparkException: Python worker exited unexpectedly (crashed). Consider setting 'spark.sql.execution.pyspark.udf.faulthandler.enabled' or'spark.python.worker.faulthandler.enabled' configuration to 'true' for the better Python traceback. at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:621) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:599) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:945) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:925) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:532) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.mutable.Growable.addAll(Growable.scala:61) at scala.collection.mutable.Growable.addAll$(Growable.scala:57) at scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:75) at scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1505) at scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1498) at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1057) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2524) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) at org.apache.spark.scheduler.Task.run(Task.scala:147) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:647) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:80) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:77) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:650) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1570) Caused by: java.io.EOFException at java.base/java.io.DataInputStream.readFully(DataInputStream.java:210) at java.base/java.io.DataInputStream.readInt(DataInputStream.java:385) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:933) ... 22 more 25/11/05 22:04:01 WARN TaskSetManager: Lost task 6.0 in stage 0.0 (TID 6) (192.168.245.225 executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed). Consider setting 'spark.sql.execution.pyspark.udf.faulthandler.enabled' or'spark.python.worker.faulthandler.enabled' configuration to 'true' for the better Python traceback. at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:621) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:599) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:945) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:925) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:532) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.mutable.Growable.addAll(Growable.scala:61) at scala.collection.mutable.Growable.addAll$(Growable.scala:57) at scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:75) at scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1505) at scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1498) at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1057) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2524) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) at org.apache.spark.scheduler.Task.run(Task.scala:147) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:647) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:80) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:77) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:650) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1570) Caused by: java.io.EOFException at java.base/java.io.DataInputStream.readFully(DataInputStream.java:210) at java.base/java.io.DataInputStream.readInt(DataInputStream.java:385) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:933) ... 22 more 25/11/05 22:04:01 ERROR TaskSetManager: Task 6 in stage 0.0 failed 1 times; aborting job Traceback (most recent call last): File "D:\Python代码\HelloWord\Pysparks\Pyspark.py", line 17, in <module> print(rdd2.collect()) ~~~~~~~~~~~~^^ File "D:\Python\Lib\site-packages\pyspark\core\rdd.py", line 1700, in collect sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) File "D:\Python\Lib\site-packages\py4j\java_gateway.py", line 1362, in __call__ return_value = get_return_value( answer, self.gateway_client, self.target_id, self.name) File "D:\Python\Lib\site-packages\py4j\protocol.py", line 327, in get_return_value raise Py4JJavaError( "An error occurred while calling {0}{1}{2}.\n". format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 0.0 failed 1 times, most recent failure: Lost task 6.0 in stage 0.0 (TID 6) (192.168.245.225 executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed). Consider setting 'spark.sql.execution.pyspark.udf.faulthandler.enabled' or'spark.python.worker.faulthandler.enabled' configuration to 'true' for the better Python traceback. at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:621) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:599) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:945) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:925) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:532) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.mutable.Growable.addAll(Growable.scala:61) at scala.collection.mutable.Growable.addAll$(Growable.scala:57) at scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:75) at scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1505) at scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1498) at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1057) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2524) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) at org.apache.spark.scheduler.Task.run(Task.scala:147) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:647) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:80) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:77) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:650) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1570) Caused by: java.io.EOFException at java.base/java.io.DataInputStream.readFully(DataInputStream.java:210) at java.base/java.io.DataInputStream.readInt(DataInputStream.java:385) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:933) ... 22 more Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$3(DAGScheduler.scala:2935) at scala.Option.getOrElse(Option.scala:201) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2935) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2927) at scala.collection.immutable.List.foreach(List.scala:334) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2927) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1295) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1295) at scala.Option.foreach(Option.scala:437) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1295) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3207) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3141) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3130) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:50) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1009) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2484) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2505) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2524) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2549) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1057) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:417) at org.apache.spark.rdd.RDD.collect(RDD.scala:1056) at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:203) at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:184) at py4j.ClientServerConnection.run(ClientServerConnection.java:108) at java.base/java.lang.Thread.run(Thread.java:1570) Caused by: org.apache.spark.SparkException: Python worker exited unexpectedly (crashed). Consider setting 'spark.sql.execution.pyspark.udf.faulthandler.enabled' or'spark.python.worker.faulthandler.enabled' configuration to 'true' for the better Python traceback. at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:621) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:599) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:945) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:925) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:532) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.mutable.Growable.addAll(Growable.scala:61) at scala.collection.mutable.Growable.addAll$(Growable.scala:57) at scala.collection.mutable.ArrayBuilder.addAll(ArrayBuilder.scala:75) at scala.collection.IterableOnceOps.toArray(IterableOnce.scala:1505) at scala.collection.IterableOnceOps.toArray$(IterableOnce.scala:1498) at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28) at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1057) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2524) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) at org.apache.spark.scheduler.Task.run(Task.scala:147) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:647) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:80) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:77) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:650) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ... 1 more Caused by: java.io.EOFException at java.base/java.io.DataInputStream.readFully(DataInputStream.java:210) at java.base/java.io.DataInputStream.readInt(DataInputStream.java:385) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:933) ... 22 more 进程已结束,退出代码为 1
最新发布
11-06
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值