Spark on kubernetes环境搭建
实验要求
键 | 值 |
---|---|
kubernetes集群 | 版本1.15 |
spark | 版本 2.4.5 |
私建docker仓库 | 已经被kubernetes集群的docker信任 |
jdk | 1.8 |
scala | 2.12 |
构建kubernetes
- 单机测试的话,可以使用minikube
- 为每一台kubernetes节点,下载jdk和Scala,并配置JAVA_HOME和SCALA_HOME
搭建私有docker镜像仓库
- 准备一台单独的设备安装docker,或者,在kubernetes集群中寻找一台设备
- 配置insecure docker仓库,docker镜像仓库本身,以及每一台kubernetes集群中的节点都需要配置
# 编辑/etc/docker/daemon.json,内容如下 { "insecure-registries": ["${设备的外网IP}:5000"] }
- 重启docker,docker镜像仓库所在的设备和kubernetes的每一台设备都需要重启docker
systemctl restart docker.service
- 启动docker 私有仓库
docker run -d --restart=always --name registry-test -p 5000:5000 registry:2
构建spark docker images
- 在一台已经安装了docker的设备上下载spark
wget http://apache.communilink.net/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
- build spark docker镜像
tar -zxvf spark-2.4.5-bin-hadoop2.7.tgz cd spark-2.4.5-bin-hadoop2.7 # ./bin/docker-image-tool.sh -r ${你的镜像仓库地址} -t ${你的镜像tag} build ./bin/docker-image-tool.sh -r 10.211.55.24:5000 -t v1 build
- 将build好的image导入到自建的docker镜像仓库中
# 方式一 docker push ${你的镜像} # 方式二 ./bin/docker-image-tool.sh -r 10.211.55.24:5000 -t v1 push # 方式三 docker save 10.211.55.24:5000/spark:v1 > tmp.tar scp tmp.tar ${你的镜像仓库}/${path} ssh ${你的镜像仓库} docker load < 10.211.55.24:5000/spark:v1 docker push 10.211.55.24:5000/spark:v1
为spark程序创建serviceaccount和对应的rbac
- 为spark应用程序构建运行需要的角色和账户
kubectl create serviceaccount spark kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
运行spark的example sparkPi
- 在kubernetes master上运行spark提交脚本
- 在kubernetes master运行proxy
kubectl proxy
- spark运行脚本
bin/spark-submit \ --master k8s://http://127.0.0.1:8001 \ --deploy-mode cluster \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=5 \ --conf spark.kubernetes.container.image=${你的镜像仓库IP地址}/apache/spark:v1 \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ /opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar
可能会遇到的问题
- spark无法与kubernetes apiserver连接,或者应该是java无法与apiserver连接
- 报错
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onFailure(WatchConnectionManager.java:212) at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571) at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:221) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:215) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:134) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:350) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:759) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:738) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:69) at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$1.apply(KubernetesClientApplication.scala:140) at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$1.apply(KubernetesClientApplication.scala:140) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2542) at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1959) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:328) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:322) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1614) at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1052) at sun.security.ssl.Handshaker.process_record(Handshaker.java:987) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1072) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397) at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:319) at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:283) at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:168) at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:257) at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135) at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:112) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:200) ... 4 more Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:397) at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:302) at sun.security.validator.Validator.validate(Validator.java:260) at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324) at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229) at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1596) ... 39 more Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:392) ... 45 more
- 原因
kubernetes集群是自建CA的,kubernetes的证书java是不认的 - 解决方案
- 目前有一种流传的解决方案:将kubernetes的ca证书导入到java的证书信任链中,我目前还没能成功
- 另一种解决方案是,使用kubectl proxy开一个http服务器,然后使用http服务器的url做spark-submit的–master
- 报错
- 权限问题
- 报错
Exception in thread "main" org.apache.spark.SparkException: External scheduler cannot be instantiated at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794) at org.apache.spark.SparkContext.<init>(SparkContext.scala:493) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/default/pods/spark-pi-1573016872026-driver. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "spark-pi-1573016872026-driver" is forbidden: User "system:serviceaccount:default:default" cannot get resource "pods" in API group "" in the namespace "default". at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:478) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:415) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:381) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:344) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:313) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:296) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:801) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:218) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:185) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55) at scala.Option.map(Option.scala:146) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:55) at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89) at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788) ... 20 more
- 解决方案
为spark应用程序创建serviceaccount和rbac
- 报错
- 示例程序的java class找不到
- 原因
运行在kubernetes的spark程序,找local的jar包是在镜像内部找的 - 解决方案
将jar包放到http服务器或者hdfs中
- 原因
目前尚未解决的问题
- 目前运行spark指定master只能用kubectl proxy运行一个http服务器,然后spark的–master指定http的url;尚不能直接在–master处指定kubernetes的https url
- 目前运行的sparkPi的jar包是直接构建到spark的镜像内部的,脚本中写的jar包的脚本是jar包在镜像内部的路径
- 目前尚不能从http server或者hdfs获取jar包
- 目前还在考虑如何让kubernetes访问到我的数据存储系统