CentOS 7.0+hadoop 2.7搭建集群

准备工作

准备环境

下载链接:

VM12 : https://pan.baidu.com/s/1hsvkHe8 密码:ycan

CentOS 7.0:https://pan.baidu.com/s/1nvUmu05 密码:ktqh

jdk1.8: https://pan.baidu.com/s/1bo69W67 密码:3vol

hadoop2.7.3: https://pan.baidu.com/s/1qYiJgT2 密码:d96k

准备三台机器,一master 两slave

    192.168.122.128 master
    192.168.122.129 slave1
    192.168.122.130 slave2

在三台机器的host文件中,添加ip和主机名

vi /etc/hosts

免密登录(root身份,也可以新建hadoop用户)

  1. 检查时间一致性 时间同步

    date

    若时间不同步,参考 CentOS 7 中使用NTP进行时间同步

  2. 关闭防火墙

    systemctl stop firewalld.service
  3. 关闭开机自启动

    systemctl disable firewalld.service
  4. 生成ssh秘钥

    ssh-keygen  -t   rsa   -P  ''

    查看秘钥是否生成成功 (有两个文件 id_rsa 和 id_rsa.pub)

    ls    /root/.ssh/
  5. 创建authorized_keys文件

    1. 在master上生成一个authorized_keys文件并查看文件是否成功生成

      touch  /root/.ssh/authorized_keys
      ls   /root/.ssh/
    2. 将三台机器的id_rsa.pub文件中的内容拷贝到authorized_keys中

      例如此次实验中生成的RSA如下:

      ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDuFQVKFAn/hodZUwj9xSybiKIP2XJHqxZfgNz7ADSFwEORMAkKpUyJ51fM/S+uW9oNSlSG/TyfvSSme1fE6xut0t4iPLV2dp9Ia+dPDs9ub3XEyvId1ADMNhO3SveuMVNPpJ50PiBnmqgHQ1OuPMopgfgRFWmbodLmz0gtmJZ6KubI3P90Do44X1+TJdX+eRECFomefayj23x+/xBVLxKXQH7+vNVn4vIM8JIWFFT8XEBN2+HKxCqEN4yilTIk+X5Ov10sfJcQhNlivThV+t9AeBH/T6J4bLOrdiQYNTnMuN+Ii5tNc7fKpzdaCmlJmzaxzESrXQRtu+7C3areZYe9 root@master
      ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDMQ8YA7APRnS5Rt3y5OIJexL2A6a0KR+MhLpMInnGMzEcpItryIU8FBPV3fmsKdhzr99pxryLSxQibvHxQo1Kx2FUN1HUTW4fftZsum+ddGY6w+/iQefjbddrmUzaZUxhsHuqCBb80UclfbR7BcRv1FQDelyig2FU9U28LjU9iTvwdEzttdBq433GL/2lDC1xw2tidWkc0CfjACprzjJ16vzb88awm8VOTp5ExylD7gT8sXmAsmAr3W8FsilKFKCrLwCEop3/r+6g8eIDM53XOt7UciK/FJAyCarKbUexeEfBqpzeilW1wcHd/5DiLJgCZ2fJhJnI+3xQKGv9xdYoR root@slave1
      ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDKuppRgfn5Tx7ST3C17jfguukTaaJVJdWbFziVm/jbwU57o4CmN7CuTzI4VvEnVeVsKTN8S5+rxC3hBIMjuJbVopR8vjHLSd7ysByUiFUusg7RPmJRMlZ0LwWMJCUm9E/xIoq9zNGr38u0yKNjS27PYf8PLgYQx2qHUGbla3KlSX5i81hxyeF/sHqfn6F+RQ/BAxVziu7atDTZF+RojYfiw087Zp/57Th6ouSPIeObTeYkJjFFENavsCcDwbqUnMyndDoPbCqV/f0494HSFZWPX8KUVfWnnJ1HQWp37vgZV8uU59OMLibYCD6t/p4Qfvp0/CCgFW8a6XoYXwYcm/tl root@slave2
    3. 将master中的authorized_keys拷贝到slave1和slave2中

      scp /root/.ssh/authorized_keys root@192.168.122.129:/root/.ssh
      scp /root/.ssh/authorized_keys root@192.168.122.130:/root/.ssh
    4. 检查是否成功拷贝

    5. 测试是否能成功互相免密登录 (exit 登出)

      ssh master
      ssh slave1
      ssh slave2

安装jdk

1. 将本机的jdk文件拷贝到集群中

使用SecureCRT连接集群,在终端输入rz,选择上传的jdk文件

本例中,新建/opt/java文件夹,拷贝到java文件夹中。

2. tar解压jdk文件

cd /opt/java 
tar -zxvf jdk-8u60-linux-x64.tar.gz 
ls 

3. 修改配置文件 /etc/profile

vi /etc/profile

编辑添加java环境

export JAVA_HOME=/opt/java/jdk1.8.0_60
export CLASSPATH=$:CLASSPATH:$JAVA_HOME/lib/
export PATH=$PATH:$JAVA_HOME/bin

使编辑文件生效

source /etc/profile

查看java版本

java -version

4. 补充 (可能出现问题:环境不能生效)

由于版权问题,大部分Linux发行版都默认安装了OpenJDK,并且OpenJDK的java命令加入到环境变量中,所以我们安装完SunJDK后,系统中存在两个jdk环境——OpenJDK和SunJDK。如何将SunJDK替换原来的OpenJDK呢?

  1. 原因查找

    查看java命令所在目录

    whereis java

    输出:

    java: /usr/bin/java /usr/lib/java /etc/java /usr/share/java /opt/java/jdk1.8.0_60/bin/java /usr/share/man/man1/java.1.gz

    其中/opt/java/jdk1.8.0_60/bin/java 是我们安装的SunJDK,/usr/bin/java 是系统默认安装的java命令所在的目录,接着往下看

    ls -la /usr/bin/java

    输出:

    lrwxrwxrwx. 1 root root 22 1123 23:44 /usr/bin/java -> /etc/alternatives/java

    进入到/etc/alternatives 目录中

    cd /etc/alternatives
    ls -la

    输出(其中一条):

    lrwxrwxrwx.   1 root root   70 1123 23:44 java -> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64/jre/bin/java

    原因就在这:系统默认的java指向到OpenJDK中的java命令,所以导致我们再/etc/profile配置的环境变量不能生效。
    接下来要做的就是将这个java软链接指定到我们的SunJDK的目录/opt/java/jdk1.8.0_60/bin/java 中。

  2. 更改java软链接路径,解决问题

    查看当前的java默认配置

    update-alternatives --display java

    输出:(其中一部分)

    java - 状态为自动。
    链接当前指向 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64/jre/bin/java
    /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.91-2.6.2.3.el7.x86_64/jre/bin/java - 优先度 1700091

    从中可以看到系统默认使用的java是OpenJDK的(注意优先度)

  3. 配置成我们安装的SunJDK

    update-alternatives  --install /usr/bin/java java /opt/java/jdk1.8.0_60/bin/java 170130
    update-alternatives --config java

    输出:

    
    共有 3 个提供“java”的程序。
    
     选项    命令
    <hr />
    
      1           /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.91-2.6.2.3.el7.x86_64/jre/bin/java
    *+ 2           /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64/jre/bin/java
      3           /opt/java/jdk1.8.0_60/bin/java
    
    按 Enter 保留当前选项[+],或者键入选项编号:

    编号选择我们安装的jdk,本例中是3

    由于我们配置的优先度比OpenJDK的优先度低,需要手动选择,如果配置的优先度比OpenJDK的优先度高,可以无需手动选择,系统自动回选择优先度高的作为默认的alternative。

    查看java版本

    java -version

安装hadoop

1. 将本机的hadoop拷贝到集群(见jdk拷贝方法)

本例中,新建/opt/hadoop目录

2. tar解压

3.配置环境

将hadoop的环境加入到/ect/profile

export HADOOP_HOME=/opt/hadoop/hadoop-2.7.3
export PATH=PATH:JAVA_HOME/bin:$HADOOP_HOME/bin

4. 新建hadoop相关目录

mkdir  /root/hadoop  
mkdir  /root/hadoop/tmp  
mkdir  /root/hadoop/var  
mkdir  /root/hadoop/dfs  
mkdir  /root/hadoop/dfs/name  
mkdir  /root/hadoop/dfs/data  

5. 修改hadoop目录下etc/hadoop 的一系列配置文件

本例中,目录为/opt/hadoop/hadoop-2.7.3/etc/hadoop

  1. 修改core-site.xml

    在标签中添加:

    <configuration>
    <property>
         <name>hadoop.tmp.dir</name>
         <value>/root/hadoop/tmp</value>
         <description>Abase for other temporary directories.</description>
        </property>
       <property>
         <name>fs.default.name</name>
         <value>hdfs://master:9000</value>
        </property>
    </configuration>

  2. 修改hadoop-env.sh

    export   JAVA_HOME=${JAVA_HOME}

    改为:

    export JAVA_HOME=/opt/java/jdk1.8.0_60

    路径为安装的jdk路径

  3. 修改hdfs-site.xml

    在标签中添加:

    <property>
      <name>dfs.name.dir</name>
      <value>/root/hadoop/dfs/name</value>
      <description>Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.</description>
    </property>
    <property>
      <name>dfs.data.dir</name>
      <value>/root/hadoop/dfs/data</value>
      <description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.</description>
    </property>
    <property>
      <name>dfs.replication</name>
      <value>2</value>
    </property>
    <property>
         <name>dfs.permissions</name>
         <value>true</value>
         <description>need not permissions</description>
    </property>

    说明:dfs.permissions配置为false后,可以允许不要检查权限就生成dfs上的文件,方便倒是方便了,但是你需要防止误删除,请将它设置为true,或者直接将该property节点删除,因为默认就是true。

  4. 新建并且修改mapred-site.xml

    将mapred-site.xml.template模板拷贝出一份

    cp mapred-site.xml.template mapred-site.xml
    ls

    修改

    vi mapred-site.xml

    在标签中添加:

    <property>
       <name>mapred.job.tracker</name>
       <value>master:49001</value>
    </property>
    <property>
         <name>mapred.local.dir</name>
          <value>/root/hadoop/var</value>
    </property>
    <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
    </property>

  5. 修改slaves文件

    将里面的localhost替换为:

    slave1
    slave2

  6. 修改yarn-site.xml文件

    在标签中添加:

    <property>
           <name>yarn.resourcemanager.hostname</name>
           <value>master</value>
      </property>
      <property>
           <description>The address of the applications manager interface in the RM.</description>
           <name>yarn.resourcemanager.address</name>
           <value>${yarn.resourcemanager.hostname}:8032</value>
      </property>
      <property>
           <description>The address of the scheduler interface.</description>
           <name>yarn.resourcemanager.scheduler.address</name>
           <value>${yarn.resourcemanager.hostname}:8030</value>
      </property>
      <property>
           <description>The http address of the RM web application.</description>
           <name>yarn.resourcemanager.webapp.address</name>
           <value>${yarn.resourcemanager.hostname}:8088</value>
      </property>
      <property>
           <description>The https adddress of the RM web application.</description>
           <name>yarn.resourcemanager.webapp.https.address</name>
           <value>${yarn.resourcemanager.hostname}:8090</value>
      </property>
      <property>
           <name>yarn.resourcemanager.resource-tracker.address</name>
           <value>${yarn.resourcemanager.hostname}:8031</value>
      </property>
      <property>
           <description>The address of the RM admin interface.</description>
           <name>yarn.resourcemanager.admin.address</name>
           <value>${yarn.resourcemanager.hostname}:8033</value>
      </property>
      <property>
           <name>yarn.nodemanager.aux-services</name>
           <value>mapreduce_shuffle</value>
      </property>
      <property>
           <name>yarn.scheduler.maximum-allocation-mb</name>
           <value>2048</value>
           <discription>每个节点可用内存,单位MB,默认8182MB</discription>
      </property>
      <property>
           <name>yarn.nodemanager.vmem-pmem-ratio</name>
           <value>2.1</value>
      </property>
      <property>
           <name>yarn.nodemanager.resource.memory-mb</name>
           <value>2048</value>
    </property>
      <property>
           <name>yarn.nodemanager.vmem-check-enabled</name>
           <value>false</value>
    </property>

    说明:yarn.nodemanager.vmem-check-enabled这个的意思是忽略虚拟内存的检查,如果你是安装在虚拟机上,这个配置很有用,配上去之后后续操作不容易出问题。如果是实体机上,并且内存够多,可以将这个配置去掉。

启动hadoop

1. 在namenode上执行初始化

由于master节点是namenode,slave1和slave2是datanode节点;所以只需要在namenode上执行初始化,即对hdfs执行格式化

cd /opt/hadoop/hadoop-2.7.3/bin
./hadoop namenode -format

由于之前配过hadoop环境,可直接

hadoop namenode -format

一系列代码滚动完成后,可以看到一些配置信息;查看/root/hadoop/dfs/name 目录下是否多了一个current文件夹,里面有几个文件。

cd /root/hadoop/dfs/name
ls
cd current
ls

2. 在namenode上执行启动命令

执行启动命令:

cd /opt/hadoop/hadoop-2.7.3/sbin/
./start-all.sh

测试hadoop

打开网址

http://192.168.122.128:50070
http://192.168.122.128:8088

能够跳转到hadoop的页面

至此,整个hadoop环境搭建成功。

WordCount 测试

本地写两个文件

cd /opt
mkdir file
cd file
echo "hello world" >> file1.txt
echo "hello hadoop" >> file2.txt
ls
more file1.txt
more file2.txt

hadoop集群中创建输入输出文件夹

hadoop fs -mkdir -p /test/hadoop/input

上传文件到集群

hadoop fs -put /opt/file/file*.txt /test/hadoop/input
hadoop fs -ls /test/hadoop/input

调用MapReduce测试

hadoop jar /opt/hadoop/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /test/hadoop/input /test/hadoop/output

说明: 调用share目录下mapreduce的jar包中wordcount方法,后面两个参数分别是输入和输出

结果:

17/11/29 14:33:01 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.122.128:8032
17/11/29 14:33:03 INFO input.FileInputFormat: Total input paths to process : 2
17/11/29 14:33:03 INFO mapreduce.JobSubmitter: number of splits:2
17/11/29 14:33:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1511935947803_0003
17/11/29 14:33:04 INFO impl.YarnClientImpl: Submitted application application_1511935947803_0003
17/11/29 14:33:04 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1511935947803_0003/
17/11/29 14:33:04 INFO mapreduce.Job: Running job: job_1511935947803_0003
17/11/29 14:33:14 INFO mapreduce.Job: Job job_1511935947803_0003 running in uber mode : false
17/11/29 14:33:14 INFO mapreduce.Job:  map 0% reduce 0%
17/11/29 14:33:29 INFO mapreduce.Job:  map 50% reduce 0%
17/11/29 14:33:30 INFO mapreduce.Job:  map 100% reduce 0%
17/11/29 14:33:38 INFO mapreduce.Job:  map 100% reduce 100%
17/11/29 14:33:38 INFO mapreduce.Job: Job job_1511935947803_0003 completed successfully
17/11/29 14:33:39 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=55
                FILE: Number of bytes written=355783
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=247
                HDFS: Number of bytes written=25
                HDFS: Number of read operations=9
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=2
                Launched reduce tasks=1
                Data-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=27082
                Total time spent by all reduces in occupied slots (ms)=6022
                Total time spent by all map tasks (ms)=27082
                Total time spent by all reduce tasks (ms)=6022
                Total vcore-milliseconds taken by all map tasks=27082
                Total vcore-milliseconds taken by all reduce tasks=6022
                Total megabyte-milliseconds taken by all map tasks=27731968
                Total megabyte-milliseconds taken by all reduce tasks=6166528
        Map-Reduce Framework
                Map input records=2
                Map output records=4
                Map output bytes=41
                Map output materialized bytes=61
                Input split bytes=222
                Combine input records=4
                Combine output records=4
                Reduce input groups=3
                Reduce shuffle bytes=61
                Reduce input records=4
                Reduce output records=3
                Spilled Records=8
                Shuffled Maps =2
                Failed Shuffles=0
                Merged Map outputs=2
                GC time elapsed (ms)=1132
                CPU time spent (ms)=4410
                Physical memory (bytes) snapshot=500752384
                Virtual memory (bytes) snapshot=6232047616
                Total committed heap usage (bytes)=307437568
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=25
        File Output Format Counters 
                Bytes Written=25

JobId: Running job: job15119359478030003

测试结果

hadoop fs -ls /test/hadoop/output
Found 2 items
-rw-r--r--   2 root supergroup          0 2017-11-29 14:33 /test/hadoop/output/_SUCCESS
-rw-r--r--   2 root supergroup         25 2017-11-29 14:33 /test/hadoop/output/part-r-00000
hadoop fs -cat /test/hadoop/output/part-r-00000
hadoop  1
hello   2
world   1

另外,可以通过http://192.168.122.128:8088/cluster/app看到提交的任务执行结果。

至此,wordcount测试成功

遇到的问题

测试网页打不开

搭建完成后,发现测试网页打不开,以为是配置出现问题,排查了好长时间,最后不得已问小伙伴是怎么回事,“你代理取消了吗?”,结果取消代理,测试成功,心情舒畅中带着崩溃;或者在代理中,将集群ip格式添加到不使用代理规则中。

hadoop: 未找到命令…

将hadoop的环境加入到/ect/profile

export HADOOP_HOME=/opt/hadoop/hadoop-2.7.3
export  PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin

参考:Linux上安装Hadoop集群(CentOS7+hadoop-2.8.0)

linux下如何使用自己安装的SunJDK替换默认的OpenJDK

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值