Kubernetes与大数据之一:使用Kubernetes scheduler运行Spark

5 篇文章 0 订阅
4 篇文章 0 订阅
从Spark 2.3.0版本开始,Spark支持使用Kubernetes作为原生资源调度器,本文详细介绍了如何在Kubernetes上部署和运行Spark任务,包括下载Spark、构建Docker镜像、创建Spark的运行serviceaccount和clusterrole,以及提交Pi计算任务的过程。
摘要由CSDN通过智能技术生成

一、前言

从版本2.3.0起,Spark开始支持使用Kubernetes作为native的资源调度器,现在Spark一共支持如下四种资源调度方式:

  • Standalone Deploy Mode
  • Apache Mesos
  • Hadoop YARN
  • Kubernetes

现在使用Kubernetes作为原生调度器还只是一个试验功能,并且需要如下前提条件:

  • Spark 2.3+
  • Kubernetes 1.6+
  • 有增删改查POD的能力
  • Kubernetes配置了DNS

同传统的Spark运行方式一样,通过spark-submit向Kubernetes提交任务,只是将master设置为Kubernetes的master的地址就可使用Kubernetes的scheduler对spark的任务进行调度。在提交任务之后,会先启动一个driver POD,driver和Kubernetes沟通并启动一系列executor POD执行任务。在任务完成之后,所有executor POD都会被删除,但是driver POD会被保留,并处于complete状态,不占用任何内存和CPU资源,所有的log和结果都可以在drvier POD中找到。

转载自https://blog.csdn.net/cloudvtech

二、在Kubernetes运行Spark任务

1. 下载spark-2.3.0-bin-hadoop2.7.tgz

wget http://archive.apache.org/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz

2. 构建docker镜像

cd spark-2.3.0-bin-hadoop2.7

docker build -t  192.168.56.10:5000/spark:2.3.0 -f kubernetes/dockerfiles/spark/Dockerfile .

docker push 192.168.56.10:5000/spark:2.3.0

3. 建立Spark的运行service account和cluster role

kubectl create serviceaccount spark

kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default

4. submit Pi计算任务

bin/spark-submit \

    --master k8s://https://192.168.56.10:6443 \  #Kubernetes master地址

    --deploy-mode cluster \

    --name spark-pi \ #POD名字前缀

    --class org.apache.spark.examples.SparkPi \ 

    --conf spark.executor.instances=2 \ #executor POD数目

    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ #使用的Kubernetes service account

    --conf spark.kubernetes.container.image=192.168.56.10:5000/spark:2.3.0 \ #driver和executor使用的docker image

    local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar #Spark任务jar在docker image里面的位置

5. 任务运行

[root@k8s-install-node ~]# kubectl get pods --all-namespaces -o wide | grep spark | grep -v Completed

default       spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-driver   1/1       Running     0          7s        10.244.61.202    k8s-01             <none>

default       spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-exec-1   1/1       Running     0          2s        10.244.165.201   k8s-03             <none>

default       spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-exec-2   1/1       Running     0          2s        10.244.179.10    k8s-02             <none>

日志

kubectl logs spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-driver
++ id -u
+ myuid=0
++ id -g
+ mygid=0
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/ash
+ '[' -z root:x:0:0:root:/root:/bin/ash ']'
+ SPARK_K8S_CMD=driver
+ '[' -z driver ']'
+ shift 1
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ sed 's/[^=]*=\(.*\)/\1/g'
+ grep SPARK_JAVA_OPT_
+ env
+ readarray -t SPARK_JAVA_OPTS
+ '[' -n /opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar:/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar ']'
+ SPARK_CLASSPATH=':/opt/spark/jars/*:/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar:/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar'
+ '[' -n '' ']'
+ case "$SPARK_K8S_CMD" in
+ CMD=(${JAVA_HOME}/bin/java "${SPARK_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" -Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS $SPARK_DRIVER_ARGS)
+ exec /sbin/tini -s -- /usr/lib/jvm/java-1.8-openjdk/bin/java -Dspark.driver.port=7078 -Dspark.kubernetes.executor.podNamePrefix=spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8 -Dspark.kubernetes.container.image=192.168.56.10:5000/spark:2.3.0 -Dspark.app.id=spark-8f8a389851e7483e8de1850eb1418856 -Dspark.executor.instances=2 -Dspark.jars=/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar,/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar -Dspark.kubernetes.authenticate.driver.serviceAccountName=spark -Dspark.submit.deployMode=cluster -Dspark.app.name=spark-pi -Dspark.driver.host=spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-driver-svc.default.svc -Dspark.driver.blockManager.port=7079 -Dspark.kubernetes.driver.pod.name=spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-driver -Dspark.master=k8s://https://192.168.56.10:6443 -cp ':/opt/spark/jars/*:/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar:/opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar' -Xms1g -Xmx1g -Dspark.driver.bindAddress=10.244.61.202 org.apache.spark.examples.SparkPi
2018-09-06 10:38:20 INFO  SparkContext:54 - Running Spark version 2.3.0
2018-09-06 10:38:20 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-09-06 10:38:20 INFO  SparkContext:54 - Submitted application: Spark Pi
2018-09-06 10:38:20 INFO  SecurityManager:54 - Changing view acls to: root
2018-09-06 10:38:20 INFO  SecurityManager:54 - Changing modify acls to: root
2018-09-06 10:38:20 INFO  SecurityManager:54 - Changing view acls groups to: 
2018-09-06 10:38:20 INFO  SecurityManager:54 - Changing modify acls groups to: 
2018-09-06 10:38:20 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
2018-09-06 10:38:21 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 7078.
2018-09-06 10:38:21 INFO  SparkEnv:54 - Registering MapOutputTracker
2018-09-06 10:38:21 INFO  SparkEnv:54 - Registering BlockManagerMaster
2018-09-06 10:38:21 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-09-06 10:38:21 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-09-06 10:38:21 INFO  DiskBlockManager:54 - Created local directory at /tmp/blockmgr-ea53142c-5b84-4958-b2de-187cbad3f64b
2018-09-06 10:38:21 INFO  MemoryStore:54 - MemoryStore started with capacity 408.9 MB
2018-09-06 10:38:21 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2018-09-06 10:38:21 INFO  log:192 - Logging initialized @1888ms
2018-09-06 10:38:21 INFO  Server:346 - jetty-9.3.z-SNAPSHOT
2018-09-06 10:38:21 INFO  Server:414 - Started @1978ms
2018-09-06 10:38:21 INFO  AbstractConnector:278 - Started ServerConnector@7b573144{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-09-06 10:38:21 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@34645867{/jobs,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4fcee388{/jobs/json,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6f80fafe{/jobs/job,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@f9879ac{/jobs/job/json,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@37f21974{/stages,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5f4d427e{/stages/json,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6e521c1e{/stages/stage,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@303e3593{/stages/stage/json,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4ef27d66{/stages/pool,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@362a019c{/stages/pool/json,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1d9bec4d{/storage,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5c48c0c0{/storage/json,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@10c8f62{/storage/rdd,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@674c583e{/storage/rdd/json,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@25f7391e{/environment,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3f23a3a0{/environment/json,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5ab14cb9{/executors,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5fb97279{/executors/json,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@439a8f59{/executors/threadDump,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@61861a29{/executors/threadDump/json,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@31024624{/static,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5d25e6bb{/,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@ce5a68e{/api,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7c041b41{/jobs/job/kill,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7f69d591{/stages/stage/kill,null,AVAILABLE,@Spark}
2018-09-06 10:38:21 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-driver-svc.default.svc:4040
2018-09-06 10:38:21 INFO  SparkContext:54 - Added JAR /opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar at spark://spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-driver-svc.default.svc:7078/jars/spark-examples_2.11-2.3.0.jar with timestamp 1536230301601
2018-09-06 10:38:21 WARN  KubernetesClusterManager:66 - The executor's init-container config map is not specified. Executors will therefore not attempt to fetch remote or submitted dependencies.
2018-09-06 10:38:21 WARN  KubernetesClusterManager:66 - The executor's init-container config map key is not specified. Executors will therefore not attempt to fetch remote or submitted dependencies.
2018-09-06 10:38:22 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
2018-09-06 10:38:22 INFO  NettyBlockTransferService:54 - Server created on spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-driver-svc.default.svc:7079
2018-09-06 10:38:22 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-09-06 10:38:22 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-driver-svc.default.svc, 7079, None)
2018-09-06 10:38:22 INFO  BlockManagerMasterEndpoint:54 - Registering block manager spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-driver-svc.default.svc:7079 with 408.9 MB RAM, BlockManagerId(driver, spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-driver-svc.default.svc, 7079, None)
2018-09-06 10:38:22 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-driver-svc.default.svc, 7079, None)
2018-09-06 10:38:22 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-driver-svc.default.svc, 7079, None)
2018-09-06 10:38:22 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@16073fa8{/metrics/json,null,AVAILABLE,@Spark}
2018-09-06 10:38:23 INFO  KubernetesClusterSchedulerBackend:54 - Requesting a new executor, total executors is now 0
2018-09-06 10:38:23 INFO  KubernetesClusterSchedulerBackend:54 - Requesting a new executor, total executors is now 0
2018-09-06 10:38:25 INFO  KubernetesClusterSchedulerBackend:54 - Executor pod spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-exec-2 ready, launched at k8s-02 as IP 10.244.179.10.
2018-09-06 10:38:25 INFO  KubernetesClusterSchedulerBackend:54 - Executor pod spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-exec-1 ready, launched at k8s-03 as IP 10.244.165.201.
2018-09-06 10:38:26 INFO  KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.244.179.10:47982) with ID 2
2018-09-06 10:38:27 INFO  KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.244.165.201:51378) with ID 1
2018-09-06 10:38:27 INFO  KubernetesClusterSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
2018-09-06 10:38:27 INFO  BlockManagerMasterEndpoint:54 - Registering block manager 10.244.179.10:44581 with 408.9 MB RAM, BlockManagerId(2, 10.244.179.10, 44581, None)
2018-09-06 10:38:27 INFO  BlockManagerMasterEndpoint:54 - Registering block manager 10.244.165.201:46685 with 408.9 MB RAM, BlockManagerId(1, 10.244.165.201, 46685, None)
2018-09-06 10:38:27 INFO  SparkContext:54 - Starting job: reduce at SparkPi.scala:38
2018-09-06 10:38:27 INFO  DAGScheduler:54 - Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
2018-09-06 10:38:27 INFO  DAGScheduler:54 - Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
2018-09-06 10:38:27 INFO  DAGScheduler:54 - Parents of final stage: List()
2018-09-06 10:38:27 INFO  DAGScheduler:54 - Missing parents: List()
2018-09-06 10:38:27 INFO  DAGScheduler:54 - Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
2018-09-06 10:38:27 INFO  MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 408.9 MB)
2018-09-06 10:38:27 INFO  MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 1181.0 B, free 408.9 MB)
2018-09-06 10:38:27 INFO  BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-driver-svc.default.svc:7079 (size: 1181.0 B, free: 408.9 MB)
2018-09-06 10:38:27 INFO  SparkContext:54 - Created broadcast 0 from broadcast at DAGScheduler.scala:1039
2018-09-06 10:38:27 INFO  DAGScheduler:54 - Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
2018-09-06 10:38:27 INFO  TaskSchedulerImpl:54 - Adding task set 0.0 with 2 tasks
2018-09-06 10:38:27 INFO  TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, 10.244.165.201, executor 1, partition 0, PROCESS_LOCAL, 7865 bytes)
2018-09-06 10:38:27 INFO  TaskSetManager:54 - Starting task 1.0 in stage 0.0 (TID 1, 10.244.179.10, executor 2, partition 1, PROCESS_LOCAL, 7865 bytes)
2018-09-06 10:38:28 INFO  BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on 10.244.165.201:46685 (size: 1181.0 B, free: 408.9 MB)
2018-09-06 10:38:28 INFO  BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on 10.244.179.10:44581 (size: 1181.0 B, free: 408.9 MB)
2018-09-06 10:38:28 INFO  TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 615 ms on 10.244.165.201 (executor 1) (1/2)
2018-09-06 10:38:28 INFO  TaskSetManager:54 - Finished task 1.0 in stage 0.0 (TID 1) in 613 ms on 10.244.179.10 (executor 2) (2/2)
2018-09-06 10:38:28 INFO  TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool 
2018-09-06 10:38:28 INFO  DAGScheduler:54 - ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.872 s
2018-09-06 10:38:28 INFO  DAGScheduler:54 - Job 0 finished: reduce at SparkPi.scala:38, took 0.939725 s
Pi is roughly 3.145395726978635
2018-09-06 10:38:28 INFO  AbstractConnector:318 - Stopped Spark@7b573144{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-09-06 10:38:28 INFO  SparkUI:54 - Stopped Spark web UI at http://spark-pi-5b0a5f65b7f832929c83a6c2aa4346a8-driver-svc.default.svc:4040
2018-09-06 10:38:28 INFO  KubernetesClusterSchedulerBackend:54 - Shutting down all executors
2018-09-06 10:38:28 INFO  KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint:54 - Asking each executor to shut down
2018-09-06 10:38:28 INFO  KubernetesClusterSchedulerBackend:54 - Closing kubernetes client
2018-09-06 10:38:28 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2018-09-06 10:38:28 INFO  MemoryStore:54 - MemoryStore cleared
2018-09-06 10:38:28 INFO  BlockManager:54 - BlockManager stopped
2018-09-06 10:38:28 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
2018-09-06 10:38:28 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2018-09-06 10:38:28 INFO  SparkContext:54 - Successfully stopped SparkContext
2018-09-06 10:38:28 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-09-06 10:38:28 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-4d2effad-dbde-4076-b357-ad581cec98d6

转载自https://blog.csdn.net/cloudvtech

 

 

 

 

 

 

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值