17/05/09 02:06:54 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.3.134, executor 2): org.apache.spark.SparkException:
Error from python worker:
/usr/local/bin/python: can't decompress data; zlib not available
PYTHONPATH was:
/home/orient/spark-2.1.0/python/lib/pyspark.zip:/home/orient/spark-2.1.0/python/lib/py4j-0.10.4-src.zip:/home/orient/spark-2.1.0/jars/spark-core_2.11-2.1.0.jar:/home/orient/spark-2.1.0/python/lib/py4j-0.10.4-src.zip:/home/orient/spark-2.1.0/python:
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:166)
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:65)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:116)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Spark集群运行python脚本报错。
按照上一篇文章的提示重新编译还是不可以。虽然两者报错类型差不多,但解决方案不同
解决办法
下载对应的依赖包zlib
yum install zlib zlib-devel
编辑Modules/Setup文件
vim Modules/Setup
找到下面这句,去掉注释
#zlib zlibmodule.c -I$(prefix)/include -L$(exec_prefix)/lib -lz
然后重新 ./configure && make && make install
或者
Edit /Modules/Setup and uncomment the line:
zlib zlibmodule.c -I$(prefix)/include -L$(exec_prefix)/lib -lz
change to directory /Modules/zlib:
./configure
make
make install