环境:spark是2.1.1,hadoop是2.6.0
1. 对于yarn集群提交方式conf.setMaster("spark://master:7077")不是必须设置的。
2. spark yarn提交命令,网上说的那些很多是基于spark standalone或者local方式的提交,具体也可参考官网http://spark.apache.org/docs/2.1.1/running-on-yarn.html
—cluster模式
spark-submit --class com.tgh.spark.WordCount --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 1g --executor-cores 1 hdfs://hadoop0:9000/tmp/sparktest-1.0-SNAPSHOT.jar hdfs://hadoop0:9000/test/wordcount.txt
spark-submit --class com.tgh.spark.WordCount --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 1g --executor-cores 1 /Users/tamir/Desktop/workspace/SparkTest/target/sparktest-1.0-SNAPSHOT.jar hdfs://hadoop0:9000/test/wordcount.txt
—client模式
spark-submit --class com.tgh.spark.WordCount --master yarn --deploy-mode client --driver-memory 2g --executor-memory 1g --executor-cores 1 /Users/tamir/Desktop/workspace/SparkTest/target/sparktest-1.0-SNAPSHOT.jar hdfs://hadoop0:9000/test/wordcount.txt
3. spark standalone方式,查看spark运行情况,http://master:28080/,端口在SPARK_HOME/conf/spark-defaults.conf指定
“spark.ui.port 28080”或者在文件spark-env.sh中通过“export SPARK_MASTER_WEBUI_PORT=28080”指定,如果都指定以SPARK_MASTER_WEBUI_PORT为准
4. yarn集群查看任务地址http://localhost:8088/,通过HADOOP_HOME/etc/hadoop/yarn-site.xml指定,附上本地一些配置
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
<