【运维】Hadoop集群搭建

6 篇文章 0 订阅
5 篇文章 0 订阅

1.基本信息

  • 版本2.7.3
  • 安装机器三台机器
  • 账号hadoop
  • 源路径/opt/software/hadoop-2.7.3.tar.gz
  • 目标路径/opt/hadoop -> /opt/hadoop-2.7.3
  • 依赖关系zookeeper

2.安装过程

1).切换到hadoop账户,通过tar -zxvf命令将hadoop解压缩至目的安装目录:
[root@test opt]# su hadoop
[hadoop@test opt]$ cd /opt/software
[hadoop@test software]$  tar -zxvf hadoop-${version}.tar.gz  -C /opt
[hadoop@test software]$ cd /opt
[hadoop@test opt]$ ln -s /opt/hadoop-${version} /opt/hadoop
2).创建tmpdir目录:
[hadoop@test opt]$ cd  /opt/hadoop
[hadoop@test hadoop]$ mkdir -p tmpdir
3).配置hadoop-env.sh文件:
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ mkdir -p /opt/hadoop/pids
[hadoop@test hadoop]$ vim hadoop-env.sh

在hadoop-env.sh文件中添加如下配置:

export JAVA_HOME=/opt/java
export HADOOP_PID_DIR=/opt/hadoop/pids
4).配置mapred-env.sh文件:
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim mapred-env.sh

在mapred-env.sh文件中添加如下配置:
export JAVA_HOME=/opt/java

5).配置core-site.xml文件 core-site.xml
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$  vim core-site.xml

在core-site.xml文件中添加如下配置:

<configuration>
<property>
//namenode的临时工作目录
        <name>hadoop.tmp.dir</name>
        <value>/opt/hadoop/tmpdir</value>
    </property>
<property>
//hdfs的入口,告诉namenode在那个机器上,端口号是什么。
        <name>fs.defaultFS</name>
        <value>hdfs://test:8020</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
    </property>
    <property>
        <name>fs.trash.interval</name>
        <value>1440</value>
    </property>
</configuration>
6).配置hdfs-site.xml文件 hdfs-site.xml

在安装的时候如果没有安装过rnager,那么在该文件中需要将以下代码注释掉。

<property>
    <name>dfs.namenode.inode.attributes.provider.class</name>
    <value>org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer</value>
</property>
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim hdfs-site.xml

在hdfs-site.xml文件中添加如下配置:

<configuration>
<property>
#副本数量,一般是小于等于datanode的数量,
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/opt/hadoop/data/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/opt/hadoop/data/datanode</value>
    </property>
    <property> 
        <name>dfs.webhdfs.enabled</name> 
        <value>true</value> 
</property>
<property>
        <name>dfs.secondary.http.address</name>
        <value>test:50090</value>
 </property>
</configuration>
7).配置mapred-site.xml文件 mapred-site.xml
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim mapred-site.xml

在mapred-site.xml文件中添加如下配置:

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>test:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>test:19888</value>
    </property>
</configuration>
8).配置yarn-site.xml文件: yarn-site.xml
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim yarn-site.xml

在yarn-site.xml文件中添加如下配置:

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>test:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>test:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>test:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>test:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>test:8088</value>
    </property>
<!-- Site specific YARN configuration properties -->
</configuration>
9).配置hadoop运行的环境变量
[hadoop@test hadoop]$ vim /etc/profile
export HADOOP_HOME=/opt/hadoop
export PATH=$HADOOP_HOME/bin:$PATH

配置成功后,执行source /etc/profile使配置生效

[hadoop@test hadoop]$ source /etc/profile
10).修改slaves文件:
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop
[hadoop@test hadoop]$ vim slaves

在slaves文件中添加

//datanode的节点的位置
test2
test3
11).在test上复制hadoop-2.7.3到hadoop@test2和hadoop@test2机器并按照步骤9修改环境变量并执行以下操作:
[hadoop@test hadoop]$ scp -r /opt/hadoop-${version} hadoop@test2:/opt/
[hadoop@test hadoop]$ ln -s /opt/hadoop-${version} /opt/hadoop
[hadoop@test hadoop]$ scp -r /opt/hadoop-${version} hadoop@test3:/opt/
[hadoop@test hadoop]$ ln -s /opt/hadoop-${version} /opt/hadoop
12).格式化namenode(仅第一次启动需要格式化!),启动hadoop,并启动jobhistory服务:
# 格式化 namenode ,仅第一次启动需要格式化!!
[hadoop@test hadoop]$ hadoop namenode -format
# 启动
[hadoop@test hadoop]$ ${HADOOP_HOME}/sbin/start-all.sh
[hadoop@test hadoop]$ ${HADOOP_HOME}/sbin/mr-jobhistory-daemon.sh start historyserver

start-all.sh包含dfs和yarn两个模块的启动,分别为start-dfs.shstart-yarn.sh,所以dfs和yarn可以单独启动。
注意:如果datanode没有启动起来,看看是不是tmpdir中有之前的脏数据,删除这个目录其他两台机器也要删除。

13).检查每台机器的服务,test、test2、test3三台机器上分别输入jps:
[hadoop@test ~]$ jps
24429 Jps
22898 ResourceManager
24383 JobHistoryServer
22722 SecondaryNameNode
22488 NameNode
[ahdoop@test2 ~]$ jps
7650 DataNode
7788 NodeManager
8018 Jps
[hadoop@test3 ~]$ jps
28407 Jps
28038 DataNode
28178 NodeManager

如果三台机器正常输出上述内容,则表示hadoop集群的服务正常工作。

访问hadoop的服务页面:在浏览器中输入如下地址:http://172.24.5.173:8088

跑一个简单的mr程序,验证集群是否安装成功

[hadoop@test mapreduce]$ cd /opt/hadoop/share/hadoop/mapreduce
[hadoop@test mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.3.jar pi 2 4
Number of Maps  = 2
Samples per Map = 4
Wrote input for Map #0
Wrote input for Map #1
Starting Job
17/04/06 09:36:47 INFO client.RMProxy: Connecting to ResourceManager at test/172.24.5.173:8032
17/04/06 09:36:47 INFO input.FileInputFormat: Total input paths to process : 2
17/04/06 09:36:48 INFO mapreduce.JobSubmitter: number of splits:2
17/04/06 09:36:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491470782060_0001
17/04/06 09:36:48 INFO impl.YarnClientImpl: Submitted application application_1491470782060_0001
17/04/06 09:36:48 INFO mapreduce.Job: The url to track the job: http://test:8088/proxy/application_1491470782060_0001/
17/04/06 09:36:48 INFO mapreduce.Job: Running job: job_1491470782060_0001
17/04/06 09:36:56 INFO mapreduce.Job: Job job_1491470782060_0001 running in uber mode : false
17/04/06 09:36:56 INFO mapreduce.Job:  map 0% reduce 0%
17/04/06 09:37:00 INFO mapreduce.Job:  map 50% reduce 0%
17/04/06 09:37:02 INFO mapreduce.Job:  map 100% reduce 0%
17/04/06 09:37:08 INFO mapreduce.Job:  map 100% reduce 100%
17/04/06 09:37:08 INFO mapreduce.Job: Job job_1491470782060_0001 completed successfully
17/04/06 09:37:08 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=50
        FILE: Number of bytes written=357588
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=554
        HDFS: Number of bytes written=215
        HDFS: Number of read operations=11
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=3
    Job Counters
        Launched map tasks=2
        Launched reduce tasks=1
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=6118
        Total time spent by all reduces in occupied slots (ms)=4004
        Total time spent by all map tasks (ms)=6118
        Total time spent by all reduce tasks (ms)=4004
        Total vcore-milliseconds taken by all map tasks=6118
        Total vcore-milliseconds taken by all reduce tasks=4004
        Total megabyte-milliseconds taken by all map tasks=6264832
        Total megabyte-milliseconds taken by all reduce tasks=4100096
    Map-Reduce Framework
        Map input records=2
        Map output records=4
        Map output bytes=36
        Map output materialized bytes=56
        Input split bytes=318
        Combine input records=0
        Combine output records=0
        Reduce input groups=2
        Reduce shuffle bytes=56
        Reduce input records=4
        Reduce output records=0
        Spilled Records=8
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=213
        CPU time spent (ms)=2340
        Physical memory (bytes) snapshot=713646080
        Virtual memory (bytes) snapshot=6332133376
        Total committed heap usage (bytes)=546308096
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=236
    File Output Format Counters
        Bytes Written=97
Job Finished in 20.744 seconds
Estimated value of Pi is 3.50000000000000000000

Q&A

Q: stop-all.sh无法停止hadoop集群 ?
A: 由于hadoop进程的信息保存在tmp中,而tmp会被定时清空

Q:无法启动namenode
A: core-site.xml 里的 namenode value 不能有下划线!!!!

hadoop核心要素

  • node
    • namenode
      存储元信息
  • manage
    • nodemanage

      1.管理单个节点中的计算功能
      2.与ResourcesManger(集群管理者)和ApplicationMaster(单机上的主进程)保持通信
      3.管理container的生命周期,监控每一个container的资源使用(内存、CPU,追踪节点健康状况、管理日志)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

吴姬压酒

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值