jupyter-notebook 以yarn模式运行出现的问题及解决

本文记录了在尝试以YARN模式运行Jupyter Notebook时遇到的问题和解决方案。首先,介绍了在两个虚拟机上设置Spark集群的过程,包括配置slaves文件和更新环境变量。然后,详细描述了在启动过程中遇到的错误,如内存不足和核心数问题,以及尝试通过增加内存和处理器来解决。尽管做了这些调整,仍然遇到了新的错误。最后,通过日志分析和配置调整,包括格式化NameNode和修改HDFS配置,最终成功启动并运行了Jupyter Notebook。
摘要由CSDN通过智能技术生成

jupyter-notebook 以yarn模式运行的出现的问题及解决方法

之前用pyspark虚拟机只跑了单机程序,现在想试试分布式运算。
在做之前找了书和博客来看,总是有各种各样的问题,无法成功。现在特记录一下过程:
这里一共有两个虚拟机,一个做master,一个做slave1

  1. 虚拟机slave1安装spark
    slave1之前已经安装了hadoop,并且可以成功进行Hadoop集群运算。这里就不多说了。
    将master的spark安装包复制到slave1,
    (1)进入到spark/conf文件夹中,将slaves.template复制成slaves,在里面添加slave1
    在这里插入图片描述

(2)增加路径到/etc/profile
在这里插入图片描述

master与slave1都要做(1),(2)的步骤

  1. slave1安装anaconda
    可以用scp直接将master的anaconda复制过来,接下来修改/etc/profile就可。上面的图已经显示了修改的内容

  2. 启动,这时候遇到了好多问题
    在master终端输入start-all.sh,使用jps查看,master和slave1都能正常启动
    在master终端输入
    HADOOP_CONF_DIR=/hadoop/hadoop/etc/hadoop PYSPARK_DRIVER_PYTHON="jupyter" PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=yarn-client pyspark
    看资料说,如果没有在spark.env.sh中配置HADOOP_CONF_DIR,需要像上面代码在终端写出。这时候,jupyter-notebook可以成功启动,但是我在其中写入sc.master看它是何种模式运行时,却给我报了好多错误

[root@master home]#HADOOP_CONF_IR=/hadoop/hadoop/etc/hadoop PYSPARK_DRIVER_PYTHON="jupyter"
PYSPARK_DRIVER_PYTHON_OPTS="notebook"  pyspark

[I 18:58:24.475 NotebookApp]
[nb_conda_kernels] enabled, 2 kernels found

[I 18:58:25.101 NotebookApp] ✓ nbpresent HTML export ENABLED
[W 18:58:25.101 NotebookApp] ✗ nbpresent PDF export DISABLED: No module named 'nbbrowserpdf'
[I 18:58:25.163 NotebookApp]
[nb_anacondacloud] enabled
[I 18:58:25.167 NotebookApp] [nb_conda] enabled
[I 18:58:25.167 NotebookApp] Serving
notebooks from local directory: /home
[I 18:58:25.167 NotebookApp] 0 active
kernels 
[I 18:58:25.168 NotebookApp] The Jupyter
Notebook is running at: http://localhost:8888/
[I 18:58:25.168 NotebookApp] Use Control-C
to stop this server and shut down all kernels (twice to skip confirmation).
[I 18:58:33.844 NotebookApp] Kernel
started: c15aabde-b441-45f2-b78d-9933e6534c27
Exception in thread "main"
java.lang.Exception: When running with master 'yarn-client' either
HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
       at
org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:263)
       at
org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:240)
       at
org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:116)
      at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
       at
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[IPKernelApp] WARNING | Unknown error in
handling PYTHONSTARTUP file /hadoop/spark/python/pyspark/shell.py:
[I 19:00:33.829 NotebookApp] Saving file at
/Untitled2.ipynb
[I 19:00:57.754 NotebookApp] Creating new
notebook in 
[I 19:00:59.174 NotebookApp] Kernel
started: ebfbdfd5-2343-4149-9fef-28877967d6c6
Exception in thread "main"
java.lang.Exception: When running with master 'yarn-client' either
HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
       at
org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:263)
       at
org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:240)
       at
org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:116)
       at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
       at
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[IPKernelApp] WARNING | Unknown error in
handling PYTHONSTARTUP file /hadoop/spark/python/pyspark/shell.py:
[I 19:01:12.315 NotebookApp] Saving file at
/Untitled3.ipynb
^C[I 19:01:15.971 NotebookApp] interrupted
Serving notebooks from local directory:
/home
2 active kernels 
The Jupyter Notebook is running at:
http://localhost:8888/
Shutdown this notebook server (y/[n])? y
[C 19:01:17.674 NotebookApp] Shutdown
confirmed
[I 19:01:17.675 NotebookApp] Shutting down
kernels
[I 19:01:18.189 NotebookApp] Kernel
shutdown: ebfbdfd5-2343-4149-9fef-28877967d6c6

[I 19:01:18.190 NotebookApp] Kernel
shutdown: c15aabde-b441-45f2-b78d-9933e6534c27

通过日志显示:

Exception in thread "main"  java.lang.Exception: When running with master 'yarn-client' either  HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.

于是配置spark.env.sh
在这里插入图片描述
再次运行:

[root@master conf]#
HADOOP_CONF_DIR=/hadoop/hadoop/etc/hadoop pyspark --master yarn --deploy-mode
client

[TerminalIPythonApp] WARNING | Subcommand
`ipython notebook` is deprecated and will be removed in future versions.

[TerminalIPythonApp] WARNING | You likely
want to use `jupyter notebook` in the future

[I 19:15:28.816 NotebookApp]
[nb_conda_kernels] enabled, 2 kernels found

[I 19:15:28.923 NotebookApp] ✓ nbpresent HTML export ENABLED

[W 19:15:28.923 NotebookApp] ✗ nbpresent PDF export DISABLED: No module named 'nbbrowserpdf'

[I 19:15:28.986 NotebookApp]
[nb_anacondacloud] enabled

[I 19:15:28.989 NotebookApp] [nb_conda]
enabled

[I 19:15:28.990 NotebookApp] Serving
notebooks from local directory: /hadoop/spark/conf

[I 19:15:28.990 NotebookApp] 0 active
kernels 

[I 19:15:28.990 NotebookApp] The Jupyter
Notebook is running at: http://localhost:8888/

[I 19:15:28.990 NotebookApp] Use Control-C
to stop this server and shut down all kernels (twice to skip confirmation).

[I 19:15:44.862 NotebookApp] Creating new
notebook in 

[I 19:15:45.742 NotebookApp] Kernel
started: 98d8605a-804a-47af-83fb-2efc8b5a3d60

Setting default log level to
"WARN".

To adjust logging level use
sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

18/11/20 19:15:48 WARN
util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable

18/11/20 19:15:51 WARN yarn.Client: Neither
spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading
libraries under SPARK_HOME.

[W 19:15:55.943 NotebookApp] Timeout
waiting for kernel_info reply from 98d8605a-804a-47af-83fb-2efc8b5a3d60

18/11/20 19:16:11 ERROR spark.SparkContext:
Error initializing SparkContext.

org.apache.spark.SparkException: Yarn
application has already ended! It might have been killed or unable to launch
application master.

       at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)

       at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)

       at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173)

       at
org.apache.spark.SparkContext.<init>(SparkContext.scala:509)

       at
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)

       at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

       at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

       at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

       at
java.lang.reflect.Constructor.newInstance(Constructor.java:423)

       at
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)

       at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)

       at
py4j.Gateway.invoke(Gateway.java:236)

       at
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)

       at
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)

       at
py4j.GatewayConnection.run(GatewayConnection.java:214)

       at
java.lang.Thread.run(Thread.java:748)

18/11/20 19:16:11 ERROR
client.TransportClient: Failed to send RPC 7790789781121901013 to
/192.168.127.131:55928: java.nio.channels.ClosedChannelExc
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值