搭建hadoop2.6-yarn-spark1.6大数据集群详细步骤

搭建hadoop2.6-yarn-spark1.6大数据集群详细步骤(三个节点,每个节点都要执行一遍):


配置/etc/hosts:
192.168.3.61 namenode1
192.168.3.62 datanode2
192.168.3.63 datanode3
由于机器数量有限,这里把datanode和secendarynamenode都放在datanode2,实际生产环境中建议分开;
hostname不能是localhost(127.0.0.1),应该设成本机主机名,hostname是节点在集群唯一通行证,并且要保持/etc/hosts与/etc/sysconfig/network一致;


配置SSH免密码登录
准备工作:
    1、确认本机sshd的配置文件(需要root权限)
  $ vim /etc/ssh/sshd_config
  找到以下内容,并去掉注释符”#“
  RSAAuthentication yes
  PubkeyAuthentication yes
  AuthorizedKeysFile      .ssh/authorized_keys
   2、如果修改了配置文件需要重启sshd服务 (需要root权限)
  $ /sbin/service sshd restart
配置SSH无密码登录需要3步:
    1.生成公钥和私钥
    2.导入公钥到认证文件,更改权限
    3.测试
ssh-keygen -t rsa
scp id_rsa.pub 192.168.3.61:/root/.ssh/t62.pub    把从节点的公钥复制到主节点
cat /home/id_rsa.pub >> ~/.ssh/authorized_keys      把所有节点公钥导入认证文件
scp authorized_keys root@192.168.3.63:/root/.ssh/authorized_keys     把认证文件给所有节点复制一份


配置java maven scala环境:
tar zxvf /soft/jdk-8u65-linux-x64.tar.gz -C /work/poa/


export JAVA_HOME=/work/poa/jdk1.8.0_65
export CLASS_PATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$JAVA_HOME/bin


tar zxvf /soft/apache-maven-3.3.9-bin.tar.gz -C /work/poa/


export MAVEN_HOME=/work/poa/apache-maven-3.3.9
export PATH=$PATH:$MAVEN_HOME/bin


tar zxvf /soft/scala-2.10.6.tgz -C /work/poa/


export SCALA_HOME=/work/poa/scala-2.10.6
export PATH=$PATH:$SCALA_HOME/bin


hadoop使用protocol buffer进行通信,需要下载和安装 protobuf-2.5.0.tar.gz(百度云下载地址:http://pan.baidu.com/s/1pJlZubT)
安装PROTOBUF:
由于该软件依赖C编译器和C++编译器,所以要先:yum install gcc  和  yum install gcc-c++
https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
# tar zxvf /soft/protobuf-2.5.0.tar.gz -C /work/poa/
到protobuf根目录下:
# ./configure --prefix=/work/poa/protobuf-2.5.0
# make && make install


# vim /etc/profile
export PROTO_HOME=/work/poa/protobuf-2.5.0
export PATH=$PATH:$PROTO_HOME/bin


# source /etc/profile


# vim /etc/ld.so.conf
/work/poa/protobuf-2.5.0


# /sbin/ldconfig


搭建Hadoop:
tar zxvf /soft/hadoop-2.6.0.tar.gz -C /work/poa/




export HADOOP_HOME=/work/poa/hadoop-2.6.0
export HADOOP_PID_DIR=/data/hadoop/pids
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin


rm -rf /data/hadoop/
mkdir -p /data/hadoop/{pids,storage}
mkdir -p /data/hadoop/storage/{hdfs,tmp}
mkdir -p /data/hadoop/storage/hdfs/{name,data}


vim /work/poa/hadoop-2.6.0/etc/hadoop/core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://namenode1:9000</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/data/hadoop/storage/tmp</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hadoop.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.native.lib</name>
        <value>true</value>
    </property>
</configuration>


vim /work/poa/hadoop-2.6.0/etc/hadoop/hdfs-site.xml
<configuration>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>namenode2:9000</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/data/hadoop/storage/hdfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/data/hadoop/storage/hdfs/data</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
</configuration>


vim /work/poa/hadoop-2.6.0/etc/hadoop/mapred-site.xml
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>namenode1:10020</value>
    </property>                                                                                                                                                                                                            
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>namenode1:19888</value>
    </property>
</configuration>


vim /work/poa/hadoop-2.6.0/etc/hadoop/yarn-site.xml
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>namenode1:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>namenode1:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>namenode1:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>namenode1:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>namenode1:80</value>
    </property>
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <property>
        <description>Where to aggregate logs to.</description>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/tmp/logs</value>
    </property>
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>259200</value>
    </property>
    <property>
        <name>yarn.log-aggregation.retain-check-interval-seconds</name>
        <value>3600</value>
    </property>
    <property>
        <name>yarn.log.server.url</name>
        <value>http://namenode1:19888/jobhistory/logs</value>
    </property>
</configuration>


vim /work/poa/hadoop-2.6.0/etc/hadoop/hadoop-env.sh
vim /work/poa/hadoop-2.6.0/etc/hadoop/mapred-env.sh
vim /work/poa/hadoop-2.6.0/etc/hadoop/yarn-env.sh
配置 hadoop-env.sh、mapred-env.sh、yarn-env.sh【在开头添加,如果能跑通可以不添加】
export JAVA_HOME=/work/poa/jdk1.8.0_65
export CLASS_PATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export HADOOP_HOME=/work/poa/hadoop-2.6.0
export HADOOP_PID_DIR=/data/hadoop/pids
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_PREFIX=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin




数据节点配置:
vim /work/poa/hadoop-2.6.0/etc/hadoop/slaves
datanode2
datanode3


Hadoop 简单测试
cd /work/poa/hadoop-2.6.0
首次启动集群时,做如下操作【主名字节点上执行】
hdfs namenode -format
start-dfs.sh
start-yarn.sh
./mr-jobhistory-daemon.sh stop historyserver     (日志跟踪服务)
hdfs dfs -put /test.txt /
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /test.txt /out
hdfs dfs -ls /out
hdfs dfs -cat /out/part-r-00000


配置Spark:
export SPARK_HOME=/work/poa/spark-1.6.0-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin


Spark日志输出配置:
vim spark-defaults.xml
 spark.eventLog.enabled           true
 spark.eventLog.dir               hdfs://namenode1:9000/directory
 spark.history.fs.logDirectory    hdfs://namenode1:9000/directory


{SparkHome} sbin/start-history-server.sh   用于跟踪Spark作业记录


# vim spark-env.sh(每个节点)
export JAVA_HOME=/work/poa/jdk1.8.0_65
export HADOOP_HOME=/work/poa/hadoop-2.6.0
export SCALA_HOME=/work/poa/scala-2.10.6
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
SPARK_MASTER_IP=namenode1
SPARK_LOCAL_DIRS=/work/poa/spark-1.6.0-bin-hadoop2.6
SPARK_DRIVER_MEMORY=1G




## worker节点的主机名列表
# vim slaves
datanode2
datanode3


./bin/spark-submit  通过该命令提交相应spark作业
测试实例:
./bin/spark-submit --class org.apache.spark.examples.SparkPi     --master yarn     --deploy-mode cluster     --driver-memory 4g     --executor-memory 1g     --executor-cores 1   lib/spark-examples-1.6.0-hadoop2.6.0.jar   10
查看日志:
yarn logs -applicationId application_1460456359574_0002
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值