基于Kubernetes的Spark集群部署配置

Spark是新一代分布式内存计算框架,Apache开源的顶级项目。相比于Hadoop Map-Reduce计算框架,Spark将中间计算结果保留在内存中,速度提升10~100倍;同时它还提供更丰富的算子,采用弹性分布式数据集(RDD)实现迭代计算,更好地适用于数据挖掘、机器学习算法,极大提升开发效率。相比于在物理机上部署,在Kubernetes集群上部署Spark集群,具有以下优势:

  快速部署:安装1000台级别的Spark集群,在Kubernetes集群上只需设定worker副本数目replicas=1000,即可一键部署。
  快速升级:升级Spark版本,只需替换Spark镜像,一键升级。
  弹性伸缩:需要扩容、缩容时,自动修改worker副本数目replicas即可。
  高一致性:各个Kubernetes节点上运行的Spark环境一致、版本一致。
  高可用性:如果Spark所在的某些node或pod死掉,Kubernetes会自动将计算任务,转移到其他node或创建新pod。
  强隔离性:通过设定资源配额等方式,可与WebService应用部署在同一集群,提升机器资源使用效率,从而降低服务器成本。

创建Spark集群的配置文件如下:

spark-cluster.yaml

# ================================= Spark Master =================================
kind: ReplicationController
apiVersion: v1
metadata:
  name: spark-master-controller
  namespace: spark-cluster
spec:
  replicas: 1
  selector:
    component: spark-master
  template:
    metadata:
      labels:
        component: spark-master
    spec:
      containers:
        - name: spark-master
          image: index.docker.io/caicloud/spark:1.5.2
          env:
            - name: TZ
              value: Asia/Shanghai
          command: ["/start-master"]
          ports:
            - containerPort: 7077
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m

# ================================= Master-Service =================================
---
kind: Service
apiVersion: v1
metadata:
  name: spark-master
  namespace: spark-cluster
spec:
  type: NodePort
  ports:
    - port: 7077
      targetPort: 7077
      name: spark
    - port: 8080
      targetPort: 8080
      nodePort: 8080 
      name: http
  selector:
    component: spark-master


# ================================= Spark Workers =================================
---
kind: ReplicationController
apiVersion: v1
metadata:
  name: spark-worker-controller
  namespace: spark-cluster
spec:
  replicas: 4
  selector:
    component: spark-worker
  template:
    metadata:
      labels:
        component: spark-worker
    spec:
      containers:
        - name: spark-worker
          image: index.docker.io/caicloud/spark:1.5.2
          env:
            - name: TZ
              value: Asia/Shanghai
          command: ["/start-worker"]
          ports:
            - containerPort: 8081
          resources:
            requests:
              cpu: 100m
# ================================= Spark UI Proxy =================================
---
kind: ReplicationController
apiVersion: v1
metadata:
  name: spark-ui-proxy-controller
  namespace: spark-cluster
spec:
  replicas: 1
  selector:
    component: spark-ui-proxy
  template:
    metadata:
      labels:
        component: spark-ui-proxy
    spec:
      containers:
        - name: spark-ui-proxy
          image: elsonrodriguez/spark-ui-proxy:1.0
          env:
            - name: TZ
              value: Asia/Shanghai
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: 100m
          args:
            - spark-master:8080
          livenessProbe:
              httpGet:
                path: /
                port: 80
              initialDelaySeconds: 15
              timeoutSeconds: 60
# =============================== Spark UI Proxy Service ===============================
---
kind: Service
apiVersion: v1
metadata:
  name: spark-ui-proxy-service
  namespace: spark-cluster
spec:
  type: NodePort
  ports:
    - port: 80
      targetPort: 80
      nodePort: 8081
  selector:
    component: spark-ui-proxy              

创建zeppelin(非必要)

zeppelin.yaml

kind: ReplicationController
apiVersion: v1
metadata:
  name: zeppelin-controller
  namespace: spark-cluster
spec:
  replicas: 1
  selector:
    component: zeppelin
  template:
    metadata:
      labels:
        component: zeppelin
    spec:
      containers:
        - name: zeppelin
          image: apache/zeppelin:0.9.0
          ports:
            - containerPort: 8080
          env:
            - name: TZ
              value: Asia/Shanghai
          resources:
            requests:
              cpu: 100m

zeppelin-svc.yaml

kind: Service
apiVersion: v1
metadata:
  name: zeppelin
  namespace: spark-cluster
spec:
  type: NodePort
  ports:
    - port: 8079
      targetPort: 8080
      nodePort: 8079
  selector:
    component: zeppelin

参考于:https://github.com/kubernetes/examples/tree/master/staging/spark

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值