安装scala, spark bin with hadoop, hadoop
安装spark一直出现错误,可能是spark配置文件的问题。
PS C:\BigData\spark-2.4.3-bin-hadoop2.7\bin> pyspark
Python 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Warning:
This Python interpreter is in a conda environment, but the environment has
not been activated. Libraries may fail to load. To activate this environment
please see https://conda.io/activation
Type “help”, “copyright”, “credits” or “license” for more information.
Using Spark’s default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to “WARN”.
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/08/01 16:14:48 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Invalid Spark URL: spark://[email protected]:51129
at org.apache.spark.rpc.RpcEndpointAddress . a p p l y ( R p c E n d p o i n t A d d r e s s . s c a l a : 66 ) a t o r g . a p a c h e . s p a r k . r p c . n e t t y . N e t t y R p c E n v . a s y n c S e t u p E n d p o i n t R e f B y U R I ( N e t t y R p c E n v . s c a l a : 134 ) a t o r g . a p a c h e . s p a r k . r p c . R p c E n v . s e t u p E n d p o i n t R e f B y U R I ( R p c E n v . s c a l a : 101 ) a t o r g . a p a c h e . s p a r k . r p c . R p c E n v . s e t u p E n d p o i n t R e f ( R p c E n v . s c a l a : 109 ) a t o r g . a p a c h e . s p a r k . u t i l . R p c U t i l s .apply(RpcEndpointAddress.scala:66) at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:134) at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101) at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109) at org.apache.spark.util.RpcUtils .apply(RpcEndpointAddress.scala:66)atorg.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:134)atorg.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)atorg.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)atorg.apache.spark.util.RpcUtils.makeDriverRef(RpcUtils.scala:32)
at org.apache.spark.executor.Executor.(Executor.scala:184)
at org.apache.spark.scheduler.local.LocalEndpoint.(LocalSchedulerBackend.scala:59)
at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:127)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:183)
at org.apache.spark.SparkContext.(SparkContext.scala:501)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
19/08/01 16:14:48 ERROR Utils: Uncaught exception in thread Thread-3
java.lang.NullPointerException
at org.apache.spark.scheduler.local.LocalSchedulerBackend.org a p a c h e apache apachespark s c h e d u l e r scheduler schedulerlocal L o c a l S c h e d u l e r B a c k e n d LocalSchedulerBackend LocalSchedulerBackend s t o p ( L o c a l S c h e