pyspark javagataway 问题 Java gateway process exited before sending the driver its port number args



在读取文档等一类最基础的pyspark代码时出现了如下错误


Java gateway process exited before sending the driver its port number args = ('Java gateway process exited before sending the driver its port number',) message = 'Java gateway process exited before sending the driver its port number'

搜遍了百度有几种方法,修改路径,不让路径中出现特殊符号

在代码前加入

  1. import os
  2. os.environ['SPARK_HOME'] = "D:\spark-2.2.0-bin-hadoop2.7\spark-2.2.0-bin-hadoop2.7“

通通不好用,最后根据上面的代码突然发现configuration中的SPARK_HOME设置的不对

spark为官网下载的spark-2.3.0-bin-hadoop2.7.tgz版本 解压缩之后,我使用的是pycharm,点击右上角运行符号的左边,编辑configuration,在environment中添加SPARK_HOME和PATHONPATH,其中SPARK_HOME设置的是spark解压缩后的位置,而解压缩的地址为D:\spark-2.2.0-bin-hadoop2.7\spark-2.2.0-bin-hadoop2.7,有两个spark-2.0 的标志,只输入第一层则会出现javagateway 的问题,填写正确后可以出正确结果,附上代码。

from pyspark import SparkContext


logFile = "zhuce.txt"
sc = SparkContext("local","Simple App")
logData = sc.textFile(logFile).cache()

numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()

print("Lines with a: %i, lines with b: %i"%(numAs, numBs))

运行后效果:

C:\Users\Administrator\PycharmProjects\pyspark\venv\Scripts\python.exe C:/Users/Administrator/PycharmProjects/pyspark/venv/test.py
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/05/16 20:21:15 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
 at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
 at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
 at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
 at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
 at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
 at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
 at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
 at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
 at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
 at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
 at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2430)
 at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2430)
 at scala.Option.getOrElse(Option.scala:121)
 at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2430)
 at org.apache.spark.SparkContext.<init>(SparkContext.scala:295)
 at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
 at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
 at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
 at py4j.Gateway.invoke(Gateway.java:236)
 at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
 at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
 at py4j.GatewayConnection.run(GatewayConnection.java:214)
 at java.lang.Thread.run(Thread.java:748)
18/05/16 20:21:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Lines with a: 2, lines with b: 0

Process finished with exit code 0



阅读更多
想对作者说点什么?

博主推荐

换一批

没有更多推荐了,返回首页