基于kubernetes部署spark的两种方式
- 方式一:使用kubernetes作为集群管理器(Cluster Manager),类似与mesos和yarn,使用方式可搜索github查看running-on-kubernetes。但这个部署方式,一是还不成熟(目前Deprecated),不推荐在生产环境使用.
- 方式二:standalone的方式,即便是不用集群也能很方便的调用sbin下的脚本来部署,而使用k8s有几点好处,一是提高机器使用率(一般服务器资源白天使用率较高,晚上空闲,刚好能拿来跑数据);二是方便一键扩容,一键升级;三是能复用本身在k8s集群上做好的监控以及日志收集.
因此,我们选择使用第二种方式部署集群,写到这里时候spark 为2.4.x版本。假设为ecc电商项目构建spark集群,规划如下
命名空间 | 账户rbac | master | worker | pvc |
---|---|---|---|---|
ecc-spark-cluster | spark-cdp | ecc-spark-master | ecc-spark-worker | ecc-spark-pvc |
1、spark rbac配置
创建rbac账户,并分配资源权限,Pod服务账户创建参考;
cat > ecc-spark-rbac.yaml << EOF
---
apiVersion: v1
kind: Namespace
metadata:
name: ecc-spark-cluster
labels:
name: ecc-spark-cluster
---
#基于namespace创建服务账户spark-cdp
apiVersion: v1
kind: ServiceAccount
metadata:
name: spark-cdp
namespace: ecc-spark-cluster
---
#创建角色资源权限
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: spark-cdp
namespace: ecc-spark-cluster
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- '*'
- apiGroups:
- ""
resources:
- configmaps
verbs:
- '*'
- apiGroups:
- ""
resources:
- services
- secrets
verbs:
- create
- get
- delete
- apiGroups:
- extensions
resources:
- ingresses
verbs:
- create
- get
- delete
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- apiGroups:
- ""
resources:
- resourcequotas
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- create
- update
- patch
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- create
- get
- update
- delete
- apiGroups:
- admissionregistration.k8s.io
resources:
- mutatingwebhookconfigurations
- validatingwebhookconfigurations
verbs:
- create
- get
- update
- delete
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications
- scheduledsparkapplications
- sparkapplications/status
- scheduledsparkapplications/status
verbs:
- '*'
- apiGroups:
- scheduling.volcano.sh
resources:
- podgroups
- queues
- queues/status
verbs:
- get
- list
- watch
- create
- delete
- update
---
#服务账户spark-cdp绑定角色
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: spark-cdp
namespace: ecc-spark-cluster
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: spark-cdp
subjects:
- kind: ServiceAccount
name: spark-cdp
EOF
2、spark pv,pvc
- 构建pv
挂载NFS,定义pv访问模式(accessModes)和存储容量(capacity);
cat >ecc-spark-pv.yaml <<EOF
apiVersion: v1
kind: PersistentVolume
metadata:
name: ecc-spark-pv-static
namespace: ecc-spark-cluster
spec:
capacity:
storage: 10Gi
accessModes:
#访问三种模式:ReadWriteOnce,ReadOnlyMany,ReadWriteMany
- ReadWriteOnce
nfs:
path: /data/nfs
server: 192.168.0.135
EOF
- 构建pvc
cat >ecc-spark-pvc.yaml <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ecc-spark-pvc-static
namespace: ecc-spark-cluster
spec:
accessModes:
#匹配模式
- ReadWriteOnce
resources:
requests:
storage: 10Gi
EOF
3、spark master创建
spark master分为两个部分,一个是类型为ReplicationController的主体,命名为ecc-spark-master.yaml,另一部分为一个service,暴露master的7077端口
给slave使用。
(1)可调节参数
- 副本数(replicas);
- args参数,日志打印路径;
- 挂载pvc(ecc-spark-pvc),挂载到容器路径(/opt/usrjars);
- resources资源分配等;
cat >ecc-spark-master.yaml <<EOF
kind: ReplicationController
apiVersion: v1
metadata:
name: spark-master-controller
namespace: ecc-spark-cluster
spec:
replicas: 1
selector:
component: ecc-spark-master
template:
metadata:
labels:
component: ecc-spark-master
spec:
serviceAccountName: spark-cdp
securityContext: {}
dnsPolicy: ClusterFirst
hostname: ecc-spark-master
containers:
- name: ecc-spark-master
image: acpimgehub.com.cn/eccp_dev/spark:202107_YD_0714_v3.2
imagePullPolicy: IfNotPresent
command: ["/bin/sh"]
args: ["-c","sh /opt/spark/sbin/start-master.sh && tail -f /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-*"]
ports:
- containerPort: 7077
- containerPort: 8080
volumeMounts:
- mountPath: /opt/usrjars/
name: ecc-spark-pvc
livenessProbe:
failureThreshold: 9
initialDelaySeconds: 2
periodSeconds: 15
successThreshold: 1
tcpSocket:
port: 7077
timeoutSeconds: 10
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "1"
memory: "1Gi"
volumes:
- name: ecc-spark-pvc
persistentVolumeClaim:
claimName: ecc-spark-pvc-static
---
kind: Service
apiVersion: v1
metadata:
name: ecc-spark-master
namespace: ecc-spark-cluster
spec:
ports:
- port: 7077
targetPort: 7077
name: spark
- port: 8080
targetPort: 8080
name: http
selector:
component: ecc-spark-master
EOF
4、spark worker 创建
在启动spark worker脚本中需要传入master的地址,因为有kubernetes dns且设置了service的缘故,可以通过ecc-spark-master.ecc-spark-cluster:7077访问。
(1)可调节参数
- 副本数(replicas),replicas设置为N即可启动N个worker;
- args参数,日志打印路径;
- 挂载pvc(ecc-spark-pvc),挂载到容器路径(/opt/usrjars);
- resources资源分配等;
cat >ecc-spark-worker.yaml <<EOF
kind: ReplicationController
apiVersion: v1
metadata:
name: spark-worker-controller
namespace: ecc-spark-cluster
spec:
replicas: 2
selector:
component: ecc-spark-worker
template:
metadata:
labels:
component: ecc-spark-worker
spec:
serviceAccountName: spark-cdp
securityContext: {}
dnsPolicy: ClusterFirst
hostname: ecc-spark-worker
containers:
- name: ecc-spark-worker
image: acpimgehub.com.cn/eccp_dev/spark:202107_YD_0714_v3.2
imagePullPolicy: IfNotPresent
command: ["/bin/sh"]
args: ["-c","sh /opt/spark/sbin/start-slave.sh spark://ecc-spark-master.ecc-spark-cluster:7077;tail -f /opt/spark/logs/spark--org.apache.spark.deploy.worker.Worker*"]
ports:
- containerPort: 8081
volumeMounts:
- mountPath: /opt/usrjars/
name: ecc-spark-pvc
resources:
requests:
cpu: "200m"
memory: "128Mi"
limits:
cpu: "1"
memory: "1Gi"
volumes:
- name: ecc-spark-pvc
persistentVolumeClaim:
claimName: ecc-spark-pvc-static
EOF
5、spark ui proxy
spark ui代理服务,image为elsonrodriguez/spark-ui-proxy:1.0在k8s集群里是不可或缺。它提供查看master的管理页面,并可以从master访问woker的ui页面(因每个worker都有自己的ui地址,ip分配很随机,这些ip只能在集群内部访问)。因此需要一个代理服务,从内部访问完需要的master ui页面后,返回给我们,这样我们只需要暴露一个代理的地址即可。
cat >ecc-spark-ui-proxy.yaml <<EOF
kind: ReplicationController
apiVersion: v1
metadata:
name: spark-ui-proxy-controller
namespace: ecc-spark-cluster
spec:
replicas: 1
selector:
component: ecc-spark-ui-proxy
template:
metadata:
labels:
component: ecc-spark-ui-proxy
spec:
containers:
- name: ecc-spark-ui-proxy
image: elsonrodriguez/spark-ui-proxy:1.0
ports:
- containerPort: 80
resources:
requests:
cpu: 100m
args:
- ecc-spark-master:8080
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 120
timeoutSeconds: 5
---
kind: Service
apiVersion: v1
metadata:
name: ecc-spark-ui-proxy
namespace: ecc-spark-cluster
spec:
type: NodePort
ports:
- port: 80
targetPort: 80
nodePort: 32180
selector:
component: ecc-spark-ui-proxy
EOF
至此,spark整个集群搭建完毕。即可通过集群暴露的32180端口访问管理页面。