(这个博客主要是留给很菜的自己看的)
节点在云上,因为内外网IP和权限问题折腾了一段时间
环境:
spark:2.1.2
hadoop:2.7.5
java:1.8
IDE:idea
各种弯路:
--在IDEA上运行没有成功,spark standalone 模式下始终提示申请不到资源(待解决)
每个节点8个core,32G memory,各取1个core,1024M memory,依然提示:
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
--cluster模式下无法使用spark-shell
Error: Cluster deploy mode is not applicable to Spark shells.
--spark 2.0之后,"yarn-cluster"的写法已经不可用
WARN SparkConf: spark.master yarn-cluster is deprecated in Spark 2.0+, please instead use "yarn" with specified deploy mode. // OR
Warning: Master yarn-cluster is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
--HDFS上存入spark.yarn.jars或者spark.yarn.archive比较好,不用每次上传了
WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
-- $ netstat -nltp #查看端口,基础知识伤不起……
--HDFS在node1内网IP的8020端口上,访问node1的外网IP的8020端口是连接不到HDFS的
*18/03/15 17:43:32 INFO Client: Retrying connect to server: 101.198.186.9/101.198.186.9:8020. Already tried 15 time(s); maxRetries= |