jupyterhub+k8s+spark/yarn

3 篇文章 0 订阅

1.生产集群若是是Spark/Yarn,方便集成(docker内部链接到现有spark on yarn集群)
2.自定义镜像

    2.1 work机上安装python3.7  link到/opt/conda/bin/python

FROM jupyter/all-spark-notebook:2ce7c06a61a1


ENV HADOOP_HOME /usr/local/hadoop
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64
ENV HADOOP_CONF_HOME /usr/local/hadoop/etc/hadoop
ENV HADOOP_CONF_DIR /usr/local/hadoop/etc/hadoop
ENV PYSPARK_PYTHON /opt/conda/bin/python
ENV PYSPARK_DRIVER_PYTHON /opt/conda/bin/python

USER root

COPY hadoop  /usr/local/hadoop

# spark-default.conf
RUN echo "spark.driver.extraJavaOptions -Dhdp.version=2.5.3.0-37" >> /usr/local/spark/conf/spark-defaults.conf && \
echo "spark.yarn.am.extraJavaOptions -Dhdp.version=2.5.3.0-37" >> /usr/local/spark/conf/spark-defaults.conf && \
echo "spark.master=yarn" >> /usr/local/spark/conf/spark-defaults.conf && \
echo "spark.yarn.jars=hdfs://192.168.56.103:9000/spark/jars/*" >> /usr/local/spark/conf/spark-defaults.conf && \
echo "spark.eventLog.dir=hdfs://192.168.56.103:9000/spark/logs" >> /usr/local/spark/conf/spark-defaults.conf && \
echo "spark.hadoop.yarn.timeline-service.enabled=false" >> /usr/local/spark/conf/spark-defaults.conf && \
chown -R $NB_USER:users /usr/local/spark/conf/spark-defaults.conf 

RUN jupyter toree install --sys-prefix --spark_opts="--master yarn --deploy-mode cluster --driver-memory 512m \ 
 --executor-memory 512m --executor-cores 1 --driver-java-options -Dhdp.version=2.5.3.0-37 --conf spark.hadoop.yarn.timeline-service.enabled=false"

RUN chown jovyan -R /home/jovyan/.local
COPY slaves  /usr/local/spark/conf
COPY spark-env.sh  /usr/local/spark/conf

USER $NB_USER

3.测试(查看yarn集群application)(scala\pthon)

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值