参考资料:http://blog.leanote.com/post/du00/单机搭建基于Hadoop的Spark环境
1.从官网下载spark-binary
2.解压后重命名conf/spark-env.sh.template为conf/spark-env.sh
3.配置:添加一行类似于这样的HADOOP_CONF_DIR=/usr/java/hadoop-2.6.2/etc/Hadoop (Hadoop安装目录)
4.以yarn-cluster模式提交一个测试任务
bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--num-executors1 \
lib/spark-examples*.jar \
10
在屏幕输出结果
bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 1 lib/spark-examples*.jar 10
错误:
如果发生类似 is running beyond virtual memory limits. Current usage: 104.4 MB of 1 GB physical memory used; 2.2 G 这样的错误
解决方式:
Add following property in yarn-site.xml
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>