- 机器配置
集群全部使用VM虚拟机环境进行部署
主机名角色配置 centos01 NameNode,JournalNode,Master,ResourceManager,QuorumPeerMain 2G,1核,20Gcentos02
Worker,NodeManager,DataNode,QuorumPeerMain,JournalNode1G,1核,20G
centos03
Worker,NodeManager,DataNode,QuorumPeerMain,JournalNode
1G,1核,20G
- 版本信息
软件名称 版本JDK jdk1.7.0_55Zookeeperzookeeper-3.4.6Hadoophadoop-2.6.0Scalascala-2.10.4Sparkspark-1.4.1-bin-hadoop2.6 - 部署过程
- Hadoop 的部署过程参见:http://blog.csdn.net/eric_sunah/article/details/43966593,确保Hadoop集群启动成功
- 配置Scala,每个节点上执行下面操作
- 配置Spark
- 下载Spark:http://www.apache.org/dyn/closer.cgi/spark/spark-1.4.1/spark-1.4.1-bin-hadoop2.6.tgz
- 在每个节点上配置环境变量
export JAVA_HOME=/opt/cloud/jdk1.7.0_55
export SCALA_HOME=/opt/cloud/scala-2.10.4
export SPARK_HOME=/opt/cloud/spark-1.4.1-bin-hadoop2.6
export PATH=$JAVA_HOME/bin:$PATH:$SCALA_HOME/bin:$SPARK_HOME/bin - 在centos01上编辑/opt/cloud/spark-1.4.1-bin-hadoop2.6/conf/slaves文件,内容如下:
centos02
centos03 - 在centos01上追加以下内容到/opt/cloud/spark-1.4.1-bin-hadoop2.6/conf/spark-env.sh 文件:
export JAVA_HOME=/opt/cloud/jdk1.7.0_55
export SCALA_HOME=/opt/cloud/scala-2.10.4
export SPARK_HOME=/opt/cloud/spark-1.4.1-bin-hadoop2.6
export HADOOP_HOME=/opt/cloud/hadoop-2.6.0
export HADOOP_CONF_DIR=/opt/cloud/hadoop-2.6.0/etc/hadoop
export SPARK_MASTER_IP=centos01
export SPARK_WORKER_MEMORY=1g - 将centos01的配置文件拷贝到centos02,centos03上
- 在centos01上执行:/opt/cloud/spark-1.4.1-bin-hadoop2.6/sbin/start-all.sh
- 测试验证
- URL验证:访问 http://centos01:8080/
- 功能验证
- 执行bin/spark-shell,执行完命令后,可以访问http://centos01:4040/jobs/
- 上传测试文件: ./hadoop fs -put /opt/cloud/spark-1.4.1-bin-hadoop2.6/README.md /data/
- 在spark-shell中依次输入下面的代码
- val file=sc.textFile("hdfs://centos01:9000/data/README.md")
- val count=file.flatMap(line=> line.split(" ")).map(word=> (word,1)).reduceByKey(_+_)
- count.collect
- 通过http://centos01:4040/jobs/ 查看成功执行的job
Spark 入门之一:CentOS 6.5 下Spark 1.4 的安装以及配置
最新推荐文章于 2024-04-03 18:01:24 发布