Spark2.0.0 Install And Examples

1.Scala 2.11.8 下载解压
[root@sht-sgmhadoopnn-01 hadoop]# wget http://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.tgz
[root@sht-sgmhadoopnn-01 hadoop]# tar xzvf scala-2.11.8.tgz
[root@sht-sgmhadoopnn-01 hadoop]# mv scala-2.11.8 scala
[root@sht-sgmhadoopnn-01 hadoop]#

2.将scala文件夹同步到集群其他机器
[root@sht-sgmhadoopnn-01 hadoop]# scp -r scala root@sht-sgmhadoopnn-02:/hadoop/
[root@sht-sgmhadoopnn-01 hadoop]# scp -r scala root@sht-sgmhadoopdn-01:/hadoop/
[root@sht-sgmhadoopnn-01 hadoop]# scp -r scala root@sht-sgmhadoopdn-02:/hadoop/
[root@sht-sgmhadoopnn-01 hadoop]# scp -r scala root@sht-sgmhadoopdn-03:/hadoop/

3.在集群的每台机器配置环境变量,生效
###在文件末尾添加两行
[root@sht-sgmhadoopnn-01 hadoop]# vi /etc/profile
export SCALA_HOME=/hadoop/scala
export PATH=$SCALA_HOME/bin:$PATH

[root@sht-sgmhadoopnn-01 hadoop]# scp -r /etc/profile root@sht-sgmhadoopnn-02:/etc/profile
[root@sht-sgmhadoopnn-01 hadoop]# scp -r /etc/profile root@sht-sgmhadoopdn-01:/etc/profile
[root@sht-sgmhadoopnn-01 hadoop]# scp -r /etc/profile root@sht-sgmhadoopdn-02:/etc/profile
[root@sht-sgmhadoopnn-01 hadoop]# scp -r /etc/profile root@sht-sgmhadoopdn-03:/etc/profile


[root@sht-sgmhadoopnn-01 hadoop]# source /etc/profile
[root@sht-sgmhadoopnn-02 hadoop]# source /etc/profile
[root@sht-sgmhadoopdn-01 hadoop]# source /etc/profile
[root@sht-sgmhadoopdn-02 hadoop]# source /etc/profile
[root@sht-sgmhadoopdn-03 hadoop]# source /etc/profile

 

---------------------------------------------------------------------------------------------------------------------
1.Spark2.0.0下载解压
[root@sht-sgmhadoopnn-01 hadoop]# wget http://apache.website-solution.net/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz
[root@sht-sgmhadoopnn-01 hadoop]# tar xzvf spark-2.0.0-bin-hadoop2.7.tgz
[root@sht-sgmhadoopnn-01 hadoop]# mv spark-2.0.0-bin-hadoop2.7 spark

 

2.配置spark-env.sh
[root@sht-sgmhadoopnn-01 conf]# pwd
/hadoop/spark/conf
[root@sht-sgmhadoopnn-01 conf]# cp spark-env.sh.template spark-env.sh
[root@sht-sgmhadoopnn-01 conf]#

###添加以下5行
[root@sht-sgmhadoopnn-01 conf]# vi spark-env.sh

export SCALA_HOME=/hadoop/scala
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
export SPARK_MASTER_IP=172.16.101.55
export SPARK_WORKER_MEMORY=1g
export SPARK_PID_DIR=/hadoop/pid
export HADOOP_CONF_DIR=/hadoop/hadoop/etc/hadoop

3.配置slaves文件
[root@sht-sgmhadoopnn-01 conf]# cp slaves.template slaves
[root@sht-sgmhadoopnn-01 conf]# vi slaves

sht-sgmhadoopdn-01
sht-sgmhadoopdn-02
sht-sgmhadoopdn-03

4.将spark文件夹copy到配置slaves文件的机器上
[root@sht-sgmhadoopnn-01 hadoop]# scp -r spark root@sht-sgmhadoopdn-01:/hadoop/
[root@sht-sgmhadoopnn-01 hadoop]# scp -r spark root@sht-sgmhadoopdn-02:/hadoop/
[root@sht-sgmhadoopnn-01 hadoop]# scp -r spark root@sht-sgmhadoopdn-03:/hadoop/


5.在集群的每台机器配置环境变量,生效
[root@sht-sgmhadoopnn-01 hadoop]# vi /etc/profile
export SPARK_HOME=/hadoop/scala
export PATH=$SPARK_HOME/bin:$PATH

[root@sht-sgmhadoopnn-01 hadoop]# scp -r /etc/profile root@sht-sgmhadoopnn-02:/etc/profile
[root@sht-sgmhadoopnn-01 hadoop]# scp -r /etc/profile root@sht-sgmhadoopdn-01:/etc/profile
[root@sht-sgmhadoopnn-01 hadoop]# scp -r /etc/profile root@sht-sgmhadoopdn-02:/etc/profile
[root@sht-sgmhadoopnn-01 hadoop]# scp -r /etc/profile root@sht-sgmhadoopdn-03:/etc/profile


[root@sht-sgmhadoopnn-01 hadoop]# source /etc/profile
[root@sht-sgmhadoopnn-02 hadoop]# source /etc/profile
[root@sht-sgmhadoopdn-01 hadoop]# source /etc/profile
[root@sht-sgmhadoopdn-02 hadoop]# source /etc/profile
[root@sht-sgmhadoopdn-03 hadoop]# source /etc/profile


6.启动spark
[root@sht-sgmhadoopnn-01 sbin]# ./start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-sht-sgmhadoopnn-01.out
sht-sgmhadoopdn-01: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-sht-sgmhadoopdn-01.telenav.cn.out
sht-sgmhadoopdn-02: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-sht-sgmhadoopdn-02.telenav.cn.out
sht-sgmhadoopdn-03: starting org.apache.spark.deploy.worker.Worker, logging to /hadoop/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-sht-sgmhadoopdn-03.telenav.cn.out
[root@sht-sgmhadoopnn-01 sbin]#


7.web查看
http://sht-sgmhadoopnn-01:8080/

[root@sht-sgmhadoopnn-01 sbin]# jps
27169 HMaster
26233 NameNode
26641 ResourceManager
2312 Jps
26542 DFSZKFailoverController
2092 Master
27303 RunJar
26989 JobHistoryServer

[root@sht-sgmhadoopdn-01 ~]# jps
19907 Worker
2086 jar
17265 DataNode
17486 NodeManager
20055 Jps
17377 JournalNode
17697 HRegionServer
3671 QuorumPeerMain


8.运行WordCount案例
[root@sht-sgmhadoopnn-01 hadoop]# vi wordcount.txt
hello abc 123
abc hadoop hello hdfs
spark yarn
123 abc hello hdfs spark
wjp wjp abc hello

[root@sht-sgmhadoopnn-01 bin]# spark-shell
scala>
scala> val textfile = sc.textFile("file:///hadoop/wordcount.txt")
textfile: org.apache.spark.rdd.RDD[String] = file:///hadoop/wordcount.txt MapPartitionsRDD[1] at textFile at :24

scala> val count=textfile.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at :26

scala> count.collect()
res0: Array[(String, Int)] = Array((hello,4), (123,2), (yarn,1), (abc,4), (wjp,2), (spark,2), (hadoop,1), (hdfs,2))

scala>

###val file=sc.textFile("hadoop fs -ls hdfs://172.16.101.56:8020/wordcount.txt") 

 


val file = sc.textFile("hdfs://namenode:8020/path/to/input")
val counts = file.flatMap(line => line.split(" "))
                 .map(word => (word, 1))
                 .reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://namenode:8020/output")

--------------------------------------------------------------------------------------------------------------------------------------------------------


a.本地模式两线程运行
#[root@sht-sgmhadoopdn-01 ~]# ./bin/run-example SparkPi 2>&1 | grep "Pi is roughly"

[root@sht-sgmhadoopnn-01 spark]# ./bin/run-example SparkPi 10 --master local[2]


b.Spark Standalone 集群模式运行
[root@sht-sgmhadoopnn-01 spark]# ./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://sht-sgmhadoopnn-01:7077 \
examples/jars/spark-examples_2.11-2.0.0.jar \
100


c.注意 Spark on YARN 支持两种运行模式,分别为yarn-cluster和yarn-client,具体的区别可以看这篇博文,
从广义上讲,yarn-cluster适用于生产环境;而yarn-client适用于交互和调试,也就是希望快速地看到application的输出。

#Spark on YARN 集群上 yarn-cluster 模式运行
[root@sht-sgmhadoopnn-01 spark]# ./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster ./examples/jars/spark-examples_2.11-2.0.0.jar \
10

 

spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--num-executors 3 \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
$SPARK_HOME/examples/jars/spark-examples_2.11-2.0.0.jar \
10

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/30089851/viewspace-2122819/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/30089851/viewspace-2122819/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值