在集群上运行python编写的spark应用程序(过程记录)

启动hadoop

root@master:/usr/local/hadoop-2.7.5/sbin#./start-all.sh

This script is Deprecated. Instead use start-dfs.shand start-yarn.sh

Starting namenodes on [master]

master: starting namenode, logging to/usr/local/hadoop-2.7.5/logs/hadoop-root-namenode-master.out

slave02: starting datanode, logging to/usr/local/hadoop-2.7.5/logs/hadoop-root-datanode-slave02.out

slave01: starting datanode, logging to/usr/local/hadoop-2.7.5/logs/hadoop-root-datanode-slave01.out

Starting secondary namenodes [0.0.0.0]

0.0.0.0: starting secondarynamenode, logging to/usr/local/hadoop-2.7.5/logs/hadoop-root-secondarynamenode-master.out

starting yarn daemons

starting resourcemanager, logging to/usr/local/hadoop-2.7.5/logs/yarn-root-resourcemanager-master.out

slave02: starting nodemanager, logging to/usr/local/hadoop-2.7.5/logs/yarn-root-nodemanager-slave02.out

slave01: starting nodemanager, logging to/usr/local/hadoop-2.7.5/logs/yarn-root-nodemanager-slave01.out

root@master:/usr/local/hadoop-2.7.5/sbin#

启动spark

root@master:/usr/local/spark/sbin# ./start-all.sh

starting org.apache.spark.deploy.master.Master,logging to/usr/local/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out

slave01: starting org.apache.spark.deploy.worker.Worker,logging to/usr/local/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave01.out

slave02: startingorg.apache.spark.deploy.worker.Worker, logging to/usr/local/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave02.out

slave01: failed to launchorg.apache.spark.deploy.worker.Worker:

slave01: full log in/usr/local/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave01.out

slave02: failed to launchorg.apache.spark.deploy.worker.Worker:

slave02: full log in/usr/local/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave02.out

root@master:/usr/local/spark/sbin#

查看运行情况

root@master:/usr/local/spark/sbin#jps

3042 Master

3124 Jps

2565 NameNode

565ResourceManager

2758SecondaryNameNode

 

root@slave01:/usr/bin#jps

1152 Jps

922 NodeManager

812 DataNode

1084 Worker

 

root@slave02:/usr/local/spark/python/lib#jps

993 Worker

721 DataNode

1061 Jps

831 NodeManager

查看web界面


在宿主机(即在安装docker的虚拟机中),打开浏览器,输入master的IP:8080查看,此时宿主机是可以访问docker中的容器的

运行python程序

root@master:~/pysparkfile#python3 text.py

Setting defaultlog level to "WARN".

To adjustlogging level use sc.setLogLevel(newLevel).

SLF4J: Classpath contains multiple SLF4J bindings.

SLF4J: Foundbinding in[jar:file:/usr/local/spark/jars/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Foundbinding in [jar:file:/usr/local/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindingsfor an explanation.

SLF4J: Actualbinding is of type [org.slf4j.impl.Log4jLoggerFactory]

18/04/2207:50:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library foryour platform... using builtin-java classes where applicable

Lines with a:61, Lines with b: 27 

查看web界面并没有什么变化

启动pyspark

root@master:/usr/local/spark# pyspark

/us

  • 1
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
1. 安装Jupyter Notebook 首先需要安装Jupyter Notebook,可以使用以下命令安装: ``` pip install jupyter ``` 2. 安装PySpark 然后需要安装PySpark,可以使用以下命令安装: ``` pip install pyspark ``` 3. 配置环境变量 在使用PySpark前,需要配置环境变量,将Spark的bin目录添加到PATH中。例如: ``` export PATH=$PATH:/path/to/spark/bin ``` 4. 启动Jupyter Notebook 使用以下命令启动Jupyter Notebook: ``` jupyter notebook ``` 5. 创建PySpark Notebook 在浏览器中打开Jupyter Notebook,点击右上角的“New”按钮,选择“Python 3”或“PySpark”即可创建一个新的Notebook。 6. 编写PySpark程序 在Notebook中,可以使用PySpark API来编写Spark程序。例如: ``` from pyspark import SparkContext sc = SparkContext(appName="MyApp") rdd = sc.parallelize([1, 2, 3, 4, 5]) sum = rdd.reduce(lambda x, y: x + y) print(sum) ``` 7. 运行PySpark程序 在Notebook中,可以直接运行PySpark程序。点击“Run”按钮即可运行程序,并在Notebook中输出结果。 8. 配置运行环境 如果需要将PySpark程序运行在YARN上,需要配置运行环境。可以在Notebook中使用以下代码: ``` from pyspark import SparkConf, SparkContext conf = SparkConf().setAppName("MyApp").setMaster("yarn") sc = SparkContext(conf=conf) rdd = sc.parallelize([1, 2, 3, 4, 5]) sum = rdd.reduce(lambda x, y: x + y) print(sum) ``` 在运行程序前,需要先启动YARN集群。可以使用以下命令启动YARN: ``` start-yarn.sh ``` 然后就可以在Notebook中运行PySpark程序,并将程序提交到YARN集群运行了。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值