下载安装教程可随意百度,例如:Spark介绍与安装详解(Centos7)
此处着重强调两个Bug:
1 具体的error名称忘记了
原因是jdk版本过低,Spark2.4需要对应jdk1.8及以上才行。
2
[root@centos spark-2.4.0-bin-hadoop2.7]# ./bin/pyspark
Python 3.7.0 (default, Feb 27 2019, 17:29:18)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-23)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.SparkConf$.<init>(SparkConf.scala:714)
at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
at org.apache.spark.SparkConf$$anonfun$getOption$1.apply(SparkConf.scala:388)
at org.apache.spark.SparkConf$$anonfun$getOption$1.apply(SparkConf.scala:388)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.SparkConf.getOption(SparkConf.scala:388)
at org.apache.spark.SparkConf.get(SparkConf.scala:250)
at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopConfigurations(SparkHadoopUtil.scala:463)
at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:436)
at org.apache.spark.deploy.SparkSubmit$$anonfun$2.apply(SparkSubmit.scala:334)
at org.apache.spark.deploy.SparkSubmit$$anonfun$2.apply(SparkSubmit.scala:334)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: centos: centos: 未知的名称或服务
at java.net.InetAddress.getLocalHost(InetAddress.java:1506)
at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:946)
at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress$lzycompute(Utils.scala:939)
at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress(Utils.scala:939)
at org.apache.spark.util.Utils$$anonfun$localCanonicalHostName$1.apply(Utils.scala:996)
at org.apache.spark.util.Utils$$anonfun$localCanonicalHostName$1.apply(Utils.scala:996)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:996)
at org.apache.spark.internal.config.package$.<init>(package.scala:296)
at org.apache.spark.internal.config.package$.<clinit>(package.scala)
... 18 more
Caused by: java.net.UnknownHostException: centos: 未知的名称或服务
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getLocalHost(InetAddress.java:1501)
... 27 more
Traceback (most recent call last):
File "/usr/local/spark/spark-2.4.0-bin-hadoop2.7/python/pyspark/shell.py", line 38, in <module>
SparkContext._ensure_initialized()
File "/usr/local/spark/spark-2.4.0-bin-hadoop2.7/python/pyspark/context.py", line 298, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/usr/local/spark/spark-2.4.0-bin-hadoop2.7/python/pyspark/java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
里面有一句已经说明:Caused by: java.net.UnknownHostException: centos: 未知的名称或服务
所以按照解决java.net.UnknownHostException: 主机名: 主机名: 未知的名称或服务修改hosts文件即可