3.saprk集群hdfstest

基于spark的测试

对50g数据进行Hdfstest,task,executor划分

 

 

19/12/27 17:27:37 INFO spark.SparkContext: Created broadcast 10 from broadcast at DAGScheduler.scala:1164

19/12/27 17:27:37 INFO scheduler.DAGScheduler: Submitting 410 missing tasks from ResultStage 9 (MapPartitionsRDD[4] at map at HdfsTest.scala:37) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))

19/12/27 17:27:37 INFO cluster.YarnScheduler: Adding task set 9.0 with 410 tasks

 

19/12/27 17:02:33 INFO cluster.YarnScheduler: Removed TaskSet 11.0, whose tasks have all completed, from pool

19/12/27 17:02:33 INFO scheduler.DAGScheduler: ResultStage 11 (sum at HdfsTest.scala:45) finished in 27.827 s

19/12/27 17:02:33 INFO scheduler.DAGScheduler: Job 11 finished: sum at HdfsTest.scala:45, took 27.851985 s

Returned length(s) of: 1373955.0

19/12/27 17:02:33 INFO server.AbstractConnector: Stopped Spark@7072bc39{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}

19/12/27 17:02:33 INFO ui.SparkUI: Stopped Spark web UI at http://hyt-bigdata01:4040

19/12/27 17:02:33 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread

19/12/27 17:02:33 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors

19/12/27 17:02:33 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down

19/12/27 17:02:33 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices

(serviceOption=None,

services=List(),

started=false)

19/12/27 17:02:33 INFO cluster.YarnClientSchedulerBackend: Stopped

19/12/27 17:02:33 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!

19/12/27 17:02:33 INFO memory.MemoryStore: MemoryStore cleared

19/12/27 17:02:33 INFO storage.BlockManager: BlockManager stopped

19/12/27 17:02:33 INFO storage.BlockManagerMaster: BlockManagerMaster stopped

19/12/27 17:02:33 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!

19/12/27 17:02:33 INFO spark.SparkContext: Successfully stopped SparkContext

19/12/27 17:02:33 INFO util.ShutdownHookManager: Shutdown hook called

19/12/27 17:02:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-829dbd76-4b21-4e8b-b7e6-77c6ec66223e

19/12/27 17:02:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-0e1437db-71e5-4915-b5c1-a1dca5c709df

[root@hyt-bigdata01 jars]#

 
以下是Spark 3.x集群部署的步骤: 1. 下载Spark安装包: ```shell wget https://archive.apache.org/dist/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz ``` 2. 解压安装Spark: ```shell tar -zxvf spark-3.1.2-bin-hadoop3.2.tgz -C /opt mv /opt/spark-3.1.2-bin-hadoop3.2/ /opt/spark ``` 3. 配置Spark集群: - 在每个节点上,编辑Spark配置文件`/opt/spark/conf/spark-env.sh`,设置以下环境变量: ```shell export SPARK_HOME=/opt/spark export JAVA_HOME=/path/to/java export HADOOP_HOME=/path/to/hadoop export SPARK_MASTER_HOST=<master-node-ip> export SPARK_MASTER_PORT=7077 export SPARK_WORKER_CORES=2 export SPARK_WORKER_MEMORY=2g ``` 其中,`<master-node-ip>`是Spark主节点的IP地址。 - 在Spark主节点上,编辑`/opt/spark/conf/slaves`文件,将所有工作节点的IP地址添加到文件中,每行一个IP地址。 4. 启动Spark集群: - 在Spark主节点上,启动Spark主节点: ```shell /opt/spark/sbin/start-master.sh ``` - 在每个工作节点上,启动Spark工作节点: ```shell /opt/spark/sbin/start-worker.sh spark://<master-node-ip>:7077 ``` 其中,`<master-node-ip>`是Spark主节点的IP地址。 5. 验证Spark集群部署: - 在浏览器中访问Spark主节点的Web界面:`http://<master-node-ip>:8080`,确认Spark主节点和工作节点都已成功启动。 - 在Spark主节点上,运行Spark Shell进行测试: ```shell /opt/spark/bin/spark-shell --master spark://<master-node-ip>:7077 ``` 确认Spark Shell能够连接到Spark集群并正常工作。 以上是Spark 3.x集群部署的步骤。请根据实际情况进行配置和操作。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值