开发环境
Spark 2.4.4使用Scala2.12:
Java安装
bash-3.2$ java -version
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
scala安装
解压
tar -zxvf scala-2.12.10.tgz
配置
vi .bash_profile
export SCALA_HOME=/Users/linhongzheng/software/scala-2.12.10
export PATH=$PATH:$SCALA_HOME/bin
生效
source .bash_profile
验证
bash-3.2$ scala
Welcome to Scala 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 11.0.2).
Type in expressions for evaluation. Or try :help.
scala> 1+2
res0: Int = 3
Spark安装
解压
tar -zxvf spark-2.4.4-bin-hadoop2.7.tgz
配置
vi .bash_profile
export SPARK_HOME=/Users/linhongzheng/software/spark-2.4.4-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH
生效
source .bash_profile
验证
运行官网的例子
bash-3.2$ $SPARK_HOME/bin/run-example SparkPi 10
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/linhongzheng/software/spark-2.4.4-bin-hadoop2.7/jars/spark-unsafe_2.11-2.4.4.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
19/11/11 09:53:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/11/11 09:53:25 INFO SparkContext: Running Spark version 2.4.4
19/11/11 09:53:25 INFO SparkContext: Submitted application: Spark Pi
19/11/11 09:53:25 INFO SecurityManager: Changing view acls to: linhongzheng
19/11/11 09:53:25 INFO SecurityManager: Changing modify acls to: linhongzheng
19/11/11 09:53:25 INFO SecurityManager: Changing view acls groups to:
19/11/11 09:53:25 INFO SecurityManager: Changing modify acls groups to:
19/11/11 09:53:25 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(linhongzheng); groups with view permissions: Set(); users with modify permissions: Set(linhongzheng); groups with modify permissions: Set()
19/11/11 09:53:25 INFO Utils: Successfully started service 'sparkDriver' on port 58107.
19/11/11 09:53:25 INFO SparkEnv: Registering MapOutputTracker
19/11/11 09:53:25 INFO SparkEnv: Registering BlockManagerMaster
19/11/11 09:53:25 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/11/11 09:53:25 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/11/11 09:53:25 INFO DiskBlockManager: Created local directory at /private/var/folders/rq/kl4ybn097tzbfbwr91cwv_640000gn/T/blockmgr-f239aac2-9b34-490c-a65e-ec96ef9b1087
19/11/11 09:53:25 INFO MemoryStore: MemoryStore started with capacity 434.4 MB
19/11/11 09:53:25 INFO SparkEnv: Registering OutputCommitCoordinator
19/11/11 09:53:25 INFO Utils: Successfully started service 'SparkUI' on port 4040.
19/11/11 09:53:25 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.16.0.12:4040
19/11/11 09:53:25 INFO SparkContext: Added JAR file:///Users/linhongzheng/software/spark-2.4.4-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.4.jar at spark://172.16.0.12:58107/jars/spark-examples_2.11-2.4.4.jar with timestamp 1573437205747
19/11/11 09:53:25 INFO SparkContext: Added JAR file:///Users/linhongzheng/software/spark-2.4.4-bin-hadoop2.7/examples/jars/scopt_2.11-3.7.0.jar at spark://172.16.0.12:58107/jars/scopt_2.11-3.7.0.jar with timestamp 1573437205747
19/11/11 09:53:25 INFO Executor: Starting executor ID driver on host localhost
19/11/11 09:53:25 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 58110.
19/11/11 09:53:25 INFO NettyBlockTransferService: Server created on 172.16.0.12:58110
19/11/11 09:53:25 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/11/11 09:53:25 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.16.0.12, 58110, None)
19/11/11 09:53:25 INFO BlockManagerMasterEndpoint: Registering block manager 172.16.0.12:58110 with 434.4 MB RAM, BlockManagerId(driver, 172.16.0.12, 58110, None)
19/11/11 09:53:25 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.16.0.12, 58110, None)
19/11/11 09:53:25 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.16.0.12, 58110, None)
19/11/11 09:53:26 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
19/11/11 09:53:26 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions
19/11/11 09:53:26 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
19/11/11 09:53:26 INFO DAGScheduler: Parents of final stage: List()
19/11/11 09:53:26 INFO DAGScheduler: Missing parents: List()
19/11/11 09:53:26 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
19/11/11 09:53:26 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 434.4 MB)
19/11/11 09:53:26 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 434.4 MB)
19/11/11 09:53:26 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.16.0.12:58110 (size: 1256.0 B, free: 434.4 MB)
19/11/11 09:53:26 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161
19/11/11 09:53:26 INFO DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD