spark on k8s 镜像构建

89 篇文章 11 订阅
13 篇文章 2 订阅

spark on k8s 基础镜像的构建

背景

这是跑spark on k8s任务的基础镜像,用来指明executor pod的基础镜像

构建步骤

  • git clone spark特定的版本(假如是3.0.1版本),克隆完后,执行一下命令进行构建,构建出包含kubernetes模块的可运行包:

     ## spark 3.x兼容hadoop cdh版本,处理冲突
    git cherry-pick 8e8afb3a3468aa743d13e23e10e77e94b772b2ed
    
    ./dev/make-distribution.sh --name 2.6.0-cdh5.13.1  --pip --tgz -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes -Dhadoop.version=2.6.0-cdh5.13.1 -DskipTests
    
  • 安装并加入必要的jar包
    按照lzoCodec,安装native-lzo library(用来支持lzo),
    把包含libhadoop.so的目录下的文件复制到assembly/target/scala-2.12/jars/hadoop_native
    把包含libgplcompression.so的目录下的文件复制到assembly/target/scala-2.12/jars/native
    把hadoop-lzo-0.4.15-cdh5.13.1.jar复制到assembly/target/scala-2.12/jars
    配置环境变量

    ENV SPARK_DIST_CLASSPATH=$SPARK_HOME/jars/native:$SPARK_HOME/jars/hadoop_native
    ENV LD_LIBRARY_PATH=$SPARK_HOME/jars/native:$SPARK_HOME/jars/hadoop_native
    ENV JAVA_LIBRARY_PATH=$SPARK_HOME/jars/native:$SPARK_HOME/jars/hadoop_native
    
    
  • 修改镜像代码为

    # distribution, the docker build command should be invoked from the top level directory
    # of the Spark distribution. E.g.:
    # docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
    
    RUN set -ex && \
        sed -i 's/http:\/\/deb.\(.*\)/https:\/\/deb.\1/g' /etc/apt/sources.list && \
        apt-get update && \
        ln -s /lib /lib64 && \
        apt install -y bash tini libc6 libpam-modules krb5-user libnss3 && \
        apt-get install liblzo2-dev -y && \
        mkdir -p /opt/spark && \
        mkdir -p /opt/spark/examples && \
        mkdir -p /opt/spark/work-dir && \
        touch /opt/spark/RELEASE && \
        rm /bin/sh && \
        ln -sv /bin/bash /bin/sh && \
        echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
        chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
        rm -rf /var/cache/apt/*
    
    COPY jars /opt/spark/jars
    COPY bin /opt/spark/bin
    COPY sbin /opt/spark/sbin
    COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/
    COPY examples /opt/spark/examples
    COPY kubernetes/tests /opt/spark/tests
    COPY data /opt/spark/data
    
    ENV SPARK_HOME /opt/spark
    
    ENV SPARK_DIST_CLASSPATH=$SPARK_HOME/jars/native:$SPARK_HOME/jars/hadoop_native
    
    ENV LD_LIBRARY_PATH=$SPARK_HOME/jars/native:$SPARK_HOME/jars/hadoop_native
    
    ENV JAVA_LIBRARY_PATH=$SPARK_HOME/jars/native:$SPARK_HOME/jars/hadoop_native
    
    RUN ln -s $SPARK_HOME/jars/hadoop-lzo-0.4.15-cdh5.13.1.jar $SPARK_HOME/jars/hadoop-lzo.jar
    WORKDIR /opt/spark/work-dir
    RUN chmod g+w /opt/spark/work-dir
    
    ENTRYPOINT [ "/opt/entrypoint.sh" ]
    
  • 构建包含k8s的镜像,执行如下命令:

    ./bin/docker-image-tool.sh -t spark-on-k8s-v3.0.1-cdh-2.6.0-5.13.1 build
    ## 按需进行修改镜像标签
    docker tag spark:spark-on-k8s-v3.0.1-cdh-2.6.0-5.13.1 xxx.xxx.xxx/xxx/spark-on-k8s:v3.0.1-cdh-2.6.0-5.13.1
    

任务镜像的构建

背景

这是用来跑spark on k8s任务的driver端的镜像

构建步骤

  • 按照任务要求进行镜像增加
    注意对于spark on k8s client 在dockerfile中需配置

    ENV HADOOP_CONF_DIR=/opt/hadoop/conf
    
    RUN echo '\nexport SPARK_LOCAL_HOSTNAME=${POD_IP}' >> /path/to/spark/conf/spark-env.sh
    

    这样在driver通行的过程中就不会出现executor连接不上driver端的情况,原因是因为driver端以pod的名字作为host,而exeuctor直接访问该host是访问不了的,具体参考spark on k8s 与spark on k8s operator的对比

  • 配置spark-default.conf
    在/path/to/spark/conf/spark-default.conf配置:

    spark.kubernetes.namespace                              dev
    spark.kubernetes.authenticate.driver.serviceAccountName lijiahong
    spark.kubernetes.authenticate.serviceAccountName        lijiahong
    ## 注意这里是之前构建的spark-on-k8s的基础镜像,如果是以cluster形式运行,则driver和executor的镜像分开配置
    ## spark.kubernetes.driver.container.image 
    ## spark.kubernetes.executor.container.image	
    spark.kubernetes.container.image                        xxx.xxx.xxx./xxx/spark-on-k8s:v3.0.1-cdh-2.6.0-5.13.1
    spark.kubernetes.container.image.pullSecrets            regsecret
    spark.kubernetes.file.upload.path                       hdfs://tmp
    spark.kubernetes.container.image.pullPolicy             Always
    
  • 构建镜像

    docker build -f Dockerfile --pull -t "xxx/xxx/spark-on-k8s:xxx" .
    
  • 提交任务的时候设置POD_IP
    如以下yaml文件:

    apiVersion: v1
    kind: Pod
    metadata:
    name: spark-on-k8s-demo
    labels:
      name: spark-on-k8s-demo
    spec:
     containers:
      - name: spark-on-k8s-demo
        image: xxx/xxx/spark-on-k8s:xxx
        imagePullPolicy: Always
        env:
        - name: POD_IP
          valueFrom:
            fieldRef:
                fieldPath: status.podIP
        - name: NODE_IP
          valueFrom:
            fieldRef:
                fieldPath: status.hostIP
     imagePullSecrets:
      - name: regsecret
     restartPolicy: Never
    

至此,spark on k8s构建完毕

  • 2
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值