安装spark集群

安装Hadoop

修改主机名

vi /etc/sysconfig/network
HOSTNAME=master
重启:shurdown -r now

配置hosts

vi /etc/hosts
添加
192.168.1.2 master
192.168.1.3 slave1
192.168.1.4 slave2

安装JDK

查看是否安装openjdk
rpm -qa | grep java
如果现实显示如下信息:
java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
java-1.6.0-openjdk-1.6.0.0-1.7.b09.el5
卸载openjdk:
rpm -e –nodeps java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
rpm -e –nodeps java-1.6.0-openjdk-1.6.0.0-1.7.b09.el5
下载JDK的RPM包,如:jdk-8u65-linux-x64.rpm
命令安装: rpm -ivh jdk-8u65-linux-x64.rpm

建立hadoop运行账号

sudo groupadd hadoop
sudo useradd hadoop -g hadoop -G root
sudo passwd hadoop

配置SSH无密码登录

在master节点上面生成公钥秘钥:
ssh-keygen -t rsa
cd ~/.ssh/
cat id_rsa.pub>>authorized_keys
chmod 600 authorized_keys
ssh master
即可看到:
Last login: Thu Dec 3 12:19:24 2015 from 192.168.50.221
exit
拷贝公钥到salve1和slave2:
scp id_rsa.pub root@slave1:/home/
ssh slave1
cd ~/.ssh
mv /home/id_rsa.pub ~/.ssh/
cat id_rsa.pub>>authorized_keys
chmod 600 authorized_keys
exit
ssh slave1
slav2同上操作即可

问题处理:

1:进行ssh登录时,出现:”Agent admitted failure to sign using the key”
答:修改authorized_keys的权限即可
2: 关闭防火墙还是无法用ssh自动登录
答:修改selinux的配置为disable
错误处理:http://jingyan.baidu.com/article/0f5fb099e2b6236d8334eaf9.html
http://bbs.chinaunix.net/thread-4175272-1-1.html

安装hadoop

下载hadoop包
tar -xzvf hadoop-2.6.2.tar.gz
vi /etc/profile
添加
export JAVA_HOME=/usr/java/jdk1.8.0_65
export HADOOP_HOME=/opt/spark/hadoop2.6.2
export PATH=$HADOOP_HOME/bin:$PATH
然后source /etc/profile使环境变量生效
把/opt/spark/hadoop2.6.2权限赋给hadoop用户
chown -R hadoop:hadoop /opt/spark/hadoop2.6.2

修改hadoop的配置文件

修改hadoop2.6.2/etc/hadoop下core-site.xml.在configuration节点下面添加
<property>
/*配置默认的HDFS路径*/
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
/*临时文件夹路径*/
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-2.6.2/hadooptmp</value>
</property>
修改hadoop2.6.2/etc/hadoop下hdfs-site.xml
<property>
/*配置datanode的数据存储目录*/
<name>dfs.name.dir</name>
<value>/opt/spark/hadoop-2.6.2/hadoopname</value>
<final>true</final>
</property>
<property>
/*配置namenode的数据存储目录*/
<name>dfs.data.dir</name>
<value>/opt/hadoop-2.6.2/hadoopdata</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
</property>
修改etc/hadoop/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>192.168.1.2:9001</value>
</property>
<property>
<name>mapredreduce.framework.name</name>
<value>yarn</value>
</property>
修改etc/hadoop/yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
/*resourcemanager端口*/
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
/*调度器端口*/
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
/*resource-tracker端口*/
<name>yarn.resourcemanager.resouece-tracker.address</name>
<value>master:8031</value>
</property>
<property>
/*resourcemanager端口*/
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
/*resourcemanager的web端口,监控job的资源调度*/
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
修改 etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_65
格式化hadoopnode: hadoop namenode -format
看到/hadoopname has been successfully formatted即成功
命令:./sbin/start-all.sh

错误: org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool
答:在hdfs-site.xml配置文件中,配置了dfs.namenode.name.dir,在master中,该配置的目录下有个current文件夹,里面有个VERSION文件,内容中clusterID=CID-8e201022-6faa-440a-b61c-290e4ccfb006和在core-site.xml配置文件中,配置了hadoop.tmp.dir,在slave中,该配置的目录下有个dfs/data/current目录,里面也有一个VERSION文件,内容clusterID=clustername两个内容不一样,导致的。删除slave中的错误内容,重启,搞定。

安装spark

下载spark
解压文件 tar -xzvf spark-1.5.2-bin-hadoop2.6.tgz
修改环境变量(/etc/profile):
export SPARK_HOME=/opt/spark/spark-1.5.2-bin-hadoop2.6
export PATH=$HADOOP_HOME/bin:$SPARK_HOME/bin:$PATH
修改spark的env文件conf/spark-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_65
export HADOOP_HOME=/opt/spark/hadoop-2.6.2
export SPARK_HOME=/opt/spark/spark-1.5.2-bin-hadoop2.6
export SPARK_JAR=/opt/spark/spark-1.5.2-bin-hadoop2.6/spark-assembly-1.5.2-hadoop2.6.0.jar
修改conf/slaves
master
slave1
slave2
启动spark : ./sbin/ start-master.sh
配置spark日志,在Spark的conf目录下,把log4j.properties.template修改为log4j.properties:
log4j.rootCategory=INFO, console,FILE
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.appender.FILE=org.apache.log4j.DailyRollingFileAppender
log4j.appender.FILE.Threshold=DEBUG
log4j.appender.FILE.file=/home/hadoop/spark.log
log4j.appender.FILE.DatePattern='.'yyyy-MM-dd
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern=[%-5p] [%d{yyyy-MM-dd HH:mm:ss}] [%C{1}:%M:%L] %m%n
# spark
log4j.logger.org.apache.spark=INFO


参考资料

搭建Spark分布式集群
hadoop学习之hadoop完全分布式集群安装
如何在Ubuntu下搭建Spark集群?
Hadoop集群安装配置教程
Spark的日志配置
Hadoop安装遇到各种异常以及解决方法

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值