Centos 7 环境 hadoop 3.2.0 完全分布式集群搭建

系列文章地址

Centos 7 环境 hadoop 3.2.0 完全分布式集群搭建

Centos 7 环境 hive3.1.1 搭建

Centos 7 环境 Spark 2.4.3 完全分布式集群的搭建过程

Centos 7 环境  HBase 2.1.5 完全分布式集群的搭建过程

Centos 7 环境 Storm 2.0.0 完全分布式集群的搭建过程

 

本博目录

一.  3台服务器的部署结构

二 . 目录规划

三  安装必备软件以及增加必要设置

3.1 安装Java (必须是 1.8 以上)

3.2  修改hostname (!重要)

3.3  配置hosts 文件

3.4  设置ssh 无密码登录

 3.4.1  在centos48上生成公钥

3.4.2 将centos48 上的公钥分发至centos48, centos49, centos50  上

3.4.3  在centos49上生成公钥, 将centos49 上的公钥分发至centos48, centos49, centos50  上

3.4.4  在centos50上生成公钥, 将centos50上的公钥分发至centos48, centos49, centos50  上

四 在 centos 48 上安装hadoop

1. 下载hadoop 3.2.0 

2. 配置core-site.xml (/usr/local/hadoop-3.2.0/etc/hadoop 路径下)

3. 配置hdfs-site.xml

4. 配置workers

5. 配置yarn-site.xml

6. 配置mapred-site.xml

7、复制Hadoop配置好的包到其他Linux主机

8. 格式化NameNode

四 启动 hadoop 集群

1. 设置环境变量 (/etc/profile, 3 台机器都要设置)

2. 启动 hadoop 

3.  查看HDFS  web 页面


一.  3台服务器的部署结构

hostname centos48centos49centos50
ip10.0.0.4810.0.0.4910.0.0.50
HDFSNameNode SecondaryNameNode
HDFSDataNodeDataNodeDataNode
YARNResourceManager  
YARNNodeManagerNodeManagerNodeManager
YARN  HistoryServer

二 . 目录规划

#hadoop临时目录hadoop.tmp.dir
/var/hadoopdata/tmp

#hadoop的NameNode节点保存元数据的目录dfs.namenode.name.dir
/var/data/hadoop/hdfs/name

#hadoop的DataNode节点保存数据的目录dfs.datanode.data.dir
/var/data/hadoop/hdfs/data

 

三  安装必备软件以及增加必要设置

3.1 安装Java (必须是 1.8 以上)

          可通过我的百度网盘下载安装包, 链接:https://pan.baidu.com/s/10GPuELlBQyyIGLmFX1byGw
          下载后安装

[root@centos48 ~]# yum localinstall ./jre-8u131-linux-x64.rpm
[root@centos48 ~]# yum localinstall ./jdk-8u131-linux-x64.rpm

     安装完成后修改环境变量 (/etc/profile),增加如下内容

export JAVA_HOME=/usr/java/jdk1.8.0_131
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin:$PATH

3.2  修改hostname (!重要)

          用 hostname 命令依次将 10.0.0.48, 10.0.0.49, 10.0.0.50  主机名设为 centos48, centos49, centos50

3.3  配置hosts 文件

vi /etc/hosts

打开文件后,增加以下内容

10.0.0.48 centos48
10.0.0.49 centos49
10.0.0.50 centos50

3.4  设置ssh 无密码登录

 3.4.1  在centos48上生成公钥

ssh-keygen -t rsa

一路回车,都设置为默认值

3.4.2 将centos48 上的公钥分发至centos48, centos49, centos50  上

ssh-copy-id centos48
ssh-copy-id centos49
ssh-copy-id centos50

3.4.3  在centos49上生成公钥, 将centos49 上的公钥分发至centos48, centos49, centos50  上

      步骤同3.4.1, 3.4.2

3.4.4  在centos50上生成公钥, 将centos50上的公钥分发至centos48, centos49, centos50  上

      步骤同3.4.1, 3.4.2

 

四 在 centos 48 上安装hadoop

1. 下载hadoop 3.2.0 

wget  http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz, 也可通过我的百度网盘下载,https://pan.baidu.com/s/10GPuELlBQyyIGLmFX1byGw

解压至 /usr/local/hadoop-3.2.0

1.1  修改hadoop 环境变量 (/usr/local/hadoop-3.2.0/etc/hadoop/hadoop-env.sh)

export JAVA_HOME=/usr/java/jdk1.8.0_131

export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

export HADOOP_PID_DIR=/var/hadoopdata/pids
export HADOOP_LOG_DIR=/var/hadoopdata/logs

2. 配置core-site.xml (/usr/local/hadoop-3.2.0/etc/hadoop 路径下)

<configuration>

<property>
        <name>fs.defaultFS</name>
        <value>hdfs://centos48:8020</value>
</property>
<property>
        <name>hadoop.tmp.dir</name>
        <value>/var/hadoopdata/tmp</value>
</property>

</configuration>

3. 配置hdfs-site.xml

<configuration>
   <property>
             <name>dfs.namenode.secondary.http-address</name>
             <value>centos50:50090</value>
   </property>
   <property>
             <name>dfs.replication</name>
             <value>2</value>
   </property>
   <property>
             <name>dfs.namenode.name.dir</name>
             <value>file:/var/data/hadoop/hdfs/name</value>
   </property>
   <property>
             <name>dfs.datanode.data.dir</name>
             <value>file:/var/data/hadoop/hdfs/data</value>
   </property>

<property>
  <name>dfs.namenode.http-address</name>
  <value>centos48:8084</value>
</property>

</configuration>

注意这里的8084 即是HDFS web 页面的监听端口

4. 配置workers

centos48
centos49
centos50

5. 配置yarn-site.xml

<configuration>
    <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
   </property>
   <property>
      <name>yarn.nodemanager.localizer.address</name>
      <value>0.0.0.0:8140</value>
   </property>
   <property>
       <name>yarn.resourcemanager.hostname</name>
       <value>centos48</value>
   </property>
   <property>
       <name>yarn.log-aggregation-enable</name>
       <value>true</value>
   </property>
   <property>
       <name>yarn.log-aggregation.retain-seconds</name>
       <value>604800</value>
   </property>
   <property>
       <name>yarn.log.server.url</name>
       <value>http://centos50:19888/jobhistory/logs</value>
   </property>

  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>centos48:3206</value>
  </property>


</configuration>

注意3206 即是 YARN Web页面 的端口

6. 配置mapred-site.xml

<configuration>
   <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
   </property>

   <property>
       <name>yarn.app.mapreduce.am.env</name>
       <value>HADOOP_MAPRED_HOME=/usr/local/hadoop-3.2.0</value>
   </property>
   <property>
       <name>mapreduce.map.env</name>
       <value>HADOOP_MAPRED_HOME=/usr/local/hadoop-3.2.0</value>
   </property>
   <property>
       <name>mapreduce.reduce.env</name>
       <value>HADOOP_MAPRED_HOME=/usr/local/hadoop-3.2.0</value>
   </property>

   <property>
       <name>mapreduce.jobhistory.address</name>
       <value>centos50:10020</value>
   </property>
   <property>
       <name>mapreduce.jobhistory.webapp.address</name>
       <value>centos50:19888</value>
   </property>
</configuration>

7、复制Hadoop配置好的包到其他Linux主机

[root@centos48 local]# scp -r ./hadoop-3.2.0 root@centos49:/usr/local/
[root@centos48 local]# scp -r ./hadoop-3.2.0 root@centos50:/usr/local/

8. 格式化NameNode

[root@centos48 local]# hdfs namenode -format

以上xml 的配置在我的网盘中均有,可下载使用  链接:https://pan.baidu.com/s/10GPuELlBQyyIGLmFX1byGw  

四 启动 hadoop 集群

1. 设置环境变量 (/etc/profile, 3 台机器都要设置)

export HADOOP_HOME=/usr/local/hadoop-3.2.0
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH

2. 启动 hadoop 

在chentos48 上执行

[root@centos48 local]# start-all.sh

3.  查看HDFS  web 页面

http://10.0.0.48:8084

备注(实际访问地址xx: 18084), 读者请忽略

 

4. 查看 YARN Web页面

http://10.0.0.48:3206   

备注(实际访问地址xx: 33062), 读者请忽略

五 hadoop 的 mapreduce

 查看  hadoop jar  /usr/local/hadoop-3.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar  有哪些例子

[root@centos48 mapreduce]# hadoop jar hadoop-mapreduce-examples-3.2.0.jar
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

在hdfs中准备好input目录:

hadoop dfs -mkdir /input

hadoop dfs -mkdir /output

上传一个文件到hdfs, 假定文件名称为wordcount

hadoop dfs -put  wordcount   /input/wordcount

执行mapreduce 任务

 hadoop jar hadoop-mapreduce-examples-3.2.0.jar wordcount /input/wordcount.txt /output/wordcount-result

[root@centos48 mapreduce]# hadoop jar hadoop-mapreduce-examples-3.2.0.jar wordcount /input/wordcount /output/wordcount-result
2019-10-15 15:38:38,332 INFO client.RMProxy: Connecting to ResourceManager at centos48/10.0.0.48:8032
2019-10-15 15:38:50,432 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1571122204786_0003
2019-10-15 15:38:53,780 INFO input.FileInputFormat: Total input files to process : 1
2019-10-15 15:38:56,023 INFO mapreduce.JobSubmitter: number of splits:1
2019-10-15 15:38:56,413 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2019-10-15 15:38:58,531 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1571122204786_0003
2019-10-15 15:38:58,533 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-10-15 15:39:01,142 INFO conf.Configuration: resource-types.xml not found
2019-10-15 15:39:01,142 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2019-10-15 15:39:04,160 INFO impl.YarnClientImpl: Submitted application application_1571122204786_0003
2019-10-15 15:39:04,322 INFO mapreduce.Job: The url to track the job: http://centos48:3306/proxy/application_1571122204786_0003/
2019-10-15 15:39:04,323 INFO mapreduce.Job: Running job: job_1571122204786_0003
2019-10-15 15:39:18,489 INFO mapreduce.Job: Job job_1571122204786_0003 running in uber mode : false
2019-10-15 15:39:18,491 INFO mapreduce.Job:  map 0% reduce 0%
2019-10-15 15:39:59,257 INFO mapreduce.Job:  map 100% reduce 0%
2019-10-15 15:40:05,419 INFO mapreduce.Job:  map 100% reduce 100%
2019-10-15 15:40:05,429 INFO mapreduce.Job: Job job_1571122204786_0003 completed successfully
2019-10-15 15:40:05,526 INFO mapreduce.Job: Counters: 54

 查看文件

[root@centos48 mapreduce]# hdfs dfs -ls /output/wordcount-result
Found 2 items
-rw-r--r--   2 root supergroup          0 2019-10-15 15:40 /output/wordcount-result/_SUCCESS
-rw-r--r--   2 root supergroup       4565 2019-10-15 15:40 /output/wordcount-result/part-r-00000
[root@centos48 mapreduce]# hdfs dfs -cat /output/wordcount-result/part-r-00000
!(*this	1
"AS	1
"StringPiece"	1
"as_string().c_str()"	1
"const	2
"string"	1
"string".	2
"this"	1
"x"	1
#define	4
#elif	1
#endif	3

 

  • 0
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值