python提交spark,使用spark-submit向Spark提交Python文件時輸出消息在哪里

最新推荐文章于 2024-05-16 12:28:46 发布

weixin_39819671

最新推荐文章于 2024-05-16 12:28:46 发布

阅读量498

点赞数

文章标签： python提交spark

I'm trying out the spark-submit command to submit my Python app to a cluster (3 machine cluster on AWS-EMR).

我正在嘗試使用spark-submit命令將我的Python應用程序提交到集群(AWS-EMR上的3個機器集群)。

Surprisingly, I cannot see any intended output from the task. Then I simplified my app to only print out some fixed strings, but still I didn't see any of those printed messages. I'm attaching the app and command below. Hope some one could help me find the reason. Many thanks!

令人驚訝的是，我無法看到任務的任何預期輸出。然后我簡化了我的應用程序只打印出一些固定的字符串，但我仍然沒有看到任何打印的消息。我正在附加下面的應用程序和命令。希望有人可以幫我找到原因。非常感謝！

submit-test.py:

import sys

from pyspark import SparkContext

if __name__ == "__main__":

sc = SparkContext(appName="sparkSubmitTest")

for item in range(50):

print "I love this game!"

sc.stop()

Command I used is:

./spark/bin/spark-submit --master yarn-cluster ./submit-test.py

Output I got is below:

[hadoop@ip-172-31-34-124 ~]$ ./spark/bin/spark-submit --master yarn-cluster ./submit-test.py

15/08/04 23:50:25 INFO client.RMProxy: Connecting to ResourceManager at /172.31.34.124:9022

15/08/04 23:50:25 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers

15/08/04 23:50:25 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (11520 MB per container)

15/08/04 23:50:25 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead

15/08/04 23:50:25 INFO yarn.Client: Setting up container launch context for our AM

15/08/04 23:50:25 INFO yarn.Client: Preparing resources for our AM container

15/08/04 23:50:25 INFO yarn.Client: Uploading resource file:/home/hadoop/.versions/spark-1.3.1.e/lib/spark-assembly-1.3.1-hadoop2.4.0.jar -> hdfs://172.31.34.124:9000/user/hadoop/.sparkStaging/application_1438724051797_0007/spark-assembly-1.3.1-hadoop2.4.0.jar

15/08/04 23:50:26 INFO metrics.MetricsSaver: MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false maxMemoryMb: 3072 maxInstanceCount: 500

15/08/04 23:50:26 INFO metrics.MetricsSaver: Created MetricsSaver j-2LU0EQ3JH58CK:i-048c1ded:SparkSubmit:24928 period:60 /mnt/var/em/raw/i-048c1ded_20150804_SparkSubmit_24928_raw.bin

15/08/04 23:50:27 INFO metrics.MetricsSaver: 1 aggregated HDFSWriteDelay 1053 raw values into 1 aggregated values, total 1

15/08/04 23:50:27 INFO yarn.Client: Uploading resource file:/home/hadoop/submit-test.py -> hdfs://172.31.34.124:9000/user/hadoop/.sparkStaging/application_1438724051797_0007/submit-test.py

15/08/04 23:50:27 INFO yarn.Client: Setting up the launch environment for our AM container

15/08/04 23:50:27 INFO spark.SecurityManager: Changing view acls to: hadoop

15/08/04 23:50:27 INFO spark.SecurityManager: Changing modify acls to: hadoop

15/08/04 23:50:27 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)

15/08/04 23:50:27 INFO yarn.Client: Submitting application 7 to ResourceManager

15/08/04 23:50:27 INFO impl.YarnClientImpl: Submitted application application_1438724051797_0007

15/08/04 23:50:28 INFO yarn.Client: Application report for application_1438724051797_0007 (state: ACCEPTED)

15/08/04 23:50:28 INFO yarn.Client:

client token: N/A

diagnostics: N/A

ApplicationMaster host: N/A

ApplicationMaster RPC port: -1

queue: default

start time: 1438732227551

final status: UNDEFINED

tracking URL: http://172.31.34.124:9046/proxy/application_1438724051797_0007/

user: hadoop

15/08/04 23:50:29 INFO yarn.Client: Application report for application_1438724051797_0007 (state: ACCEPTED)

15/08/04 23:50:30 INFO yarn.Client: Application report for application_1438724051797_0007 (state: ACCEPTED)

15/08/04 23:50:31 INFO yarn.Client: Application report for application_1438724051797_0007 (state: ACCEPTED)

15/08/04 23:50:32 INFO yarn.Client: Application report for application_1438724051797_0007 (state: ACCEPTED)

15/08/04 23:50:33 INFO yarn.Client: Application report for application_1438724051797_0007 (state: ACCEPTED)

15/08/04 23:50:34 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)

15/08/04 23:50:34 INFO yarn.Client:

client token: N/A

diagnostics: N/A

ApplicationMaster host: ip-172-31-39-205.ec2.internal

ApplicationMaster RPC port: 0

queue: default

start time: 1438732227551

final status: UNDEFINED

tracking URL: http://172.31.34.124:9046/proxy/application_1438724051797_0007/

user: hadoop

15/08/04 23:50:35 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)

15/08/04 23:50:36 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)

15/08/04 23:50:37 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)

15/08/04 23:50:38 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)

15/08/04 23:50:39 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)

15/08/04 23:50:40 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)

15/08/04 23:50:41 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)

15/08/04 23:50:42 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)

15/08/04 23:50:43 INFO yarn.Client: Application report for application_1438724051797_0007 (state: RUNNING)

15/08/04 23:50:44 INFO yarn.Client: Application report for application_1438724051797_0007 (state: FINISHED)

15/08/04 23:50:44 INFO yarn.Client:

client token: N/A

diagnostics: N/A

ApplicationMaster host: ip-172-31-39-205.ec2.internal

ApplicationMaster RPC port: 0

queue: default

start time: 1438732227551

final status: SUCCEEDED

tracking URL: http://172.31.34.124:9046/proxy/application_1438724051797_0007/A

user: hadoop

2 个解决方案

Posting my answer here, since I didn't find them elsewhere.

在這里發表我的答案，因為我沒有在其他地方找到它們。

I first tried: yarn logs -applicationId applicationid_xxxx was told that "Log aggregation has not completed or is not enabled".

我首先嘗試：yarn logs -applicationId applicationid_xxxx被告知“日志聚合尚未完成或未啟用”。

Here comes the steps to dig out the print message:

這里是挖掘打印消息的步驟：

1. Follow the link at the end of the execution, http://172.31.34.124:9046/proxy/application_1438724051797_0007/A (here reverse ssh and proxy needs to be setup).

2. at the application overview page, find out the AppMaster Node id: ip-172-31-41-6.ec2.internal:9035

3. go back to AWS EMR cluster list, find out the public dns for this id.

4. ssh from the driver node into this AppMaster Node. same key_pair.

5. cd /var/log/hadoop/userlogs/application_1438796304215_0005/container_1438796304215_0005_01_000001 (always choose the first container).

6. cat stdout

As you can see it's very convoluted. Probably will be better off to write output into a file hosted in S3.

正如你所看到的那樣，它非常復雜。將輸出寫入S3中托管的文件可能會更好。

-4

Another quick and dirty thing you can do is pipe your command output to a text file using the tee command:

您可以做的另一件快速而又臟的事情是使用tee命令將命令輸出傳遞給文本文件：

./spark/bin/spark-submit --master yarn-cluster ./submit-test.py | tee temp_output.file

weixin_39819671

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python提交spark,使用spark-submit向Spark提交Python文件時輸出消息在哪里

I'm trying out the spark-submit command to submit my Python app to a cluster (3 machine cluster on AWS-EMR).我正在嘗試使用spark-submit命令將我的Python應用程序提交到集群(AWS-EMR上的3個機器集群)。Surprisingly, I cannot see any inte...
复制链接

扫一扫