系列文章地址
Centos 7 环境 hadoop 3.2.0 完全分布式集群搭建
Centos 7 环境 Spark 2.4.3 完全分布式集群的搭建过程
Centos 7 环境 HBase 2.1.5 完全分布式集群的搭建过程
Centos 7 环境 Storm 2.0.0 完全分布式集群的搭建过程
本博目录
3.4.2 将centos48 上的公钥分发至centos48, centos49, centos50 上
3.4.3 在centos49上生成公钥, 将centos49 上的公钥分发至centos48, centos49, centos50 上
3.4.4 在centos50上生成公钥, 将centos50上的公钥分发至centos48, centos49, centos50 上
2. 配置core-site.xml (/usr/local/hadoop-3.2.0/etc/hadoop 路径下)
1. 设置环境变量 (/etc/profile, 3 台机器都要设置)
一. 3台服务器的部署结构
hostname | centos48 | centos49 | centos50 |
ip | 10.0.0.48 | 10.0.0.49 | 10.0.0.50 |
HDFS | NameNode | SecondaryNameNode | |
HDFS | DataNode | DataNode | DataNode |
YARN | ResourceManager | ||
YARN | NodeManager | NodeManager | NodeManager |
YARN | HistoryServer |
二 . 目录规划
#hadoop临时目录hadoop.tmp.dir
/var/hadoopdata/tmp
#hadoop的NameNode节点保存元数据的目录dfs.namenode.name.dir
/var/data/hadoop/hdfs/name
#hadoop的DataNode节点保存数据的目录dfs.datanode.data.dir
/var/data/hadoop/hdfs/data
三 安装必备软件以及增加必要设置
3.1 安装Java (必须是 1.8 以上)
可通过我的百度网盘下载安装包, 链接:https://pan.baidu.com/s/10GPuELlBQyyIGLmFX1byGw
下载后安装
[root@centos48 ~]# yum localinstall ./jre-8u131-linux-x64.rpm
[root@centos48 ~]# yum localinstall ./jdk-8u131-linux-x64.rpm
安装完成后修改环境变量 (/etc/profile),增加如下内容
export JAVA_HOME=/usr/java/jdk1.8.0_131
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin:$PATH
3.2 修改hostname (!重要)
用 hostname 命令依次将 10.0.0.48, 10.0.0.49, 10.0.0.50 主机名设为 centos48, centos49, centos50
3.3 配置hosts 文件
vi /etc/hosts
打开文件后,增加以下内容
10.0.0.48 centos48
10.0.0.49 centos49
10.0.0.50 centos50
3.4 设置ssh 无密码登录
3.4.1 在centos48上生成公钥
ssh-keygen -t rsa
一路回车,都设置为默认值
3.4.2 将centos48 上的公钥分发至centos48, centos49, centos50 上
ssh-copy-id centos48
ssh-copy-id centos49
ssh-copy-id centos50
3.4.3 在centos49上生成公钥, 将centos49 上的公钥分发至centos48, centos49, centos50 上
步骤同3.4.1, 3.4.2
3.4.4 在centos50上生成公钥, 将centos50上的公钥分发至centos48, centos49, centos50 上
步骤同3.4.1, 3.4.2
四 在 centos 48 上安装hadoop
1. 下载hadoop 3.2.0
wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz, 也可通过我的百度网盘下载,https://pan.baidu.com/s/10GPuELlBQyyIGLmFX1byGw
解压至 /usr/local/hadoop-3.2.0
1.1 修改hadoop 环境变量 (/usr/local/hadoop-3.2.0/etc/hadoop/hadoop-env.sh)
export JAVA_HOME=/usr/java/jdk1.8.0_131
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export HADOOP_PID_DIR=/var/hadoopdata/pids
export HADOOP_LOG_DIR=/var/hadoopdata/logs
2. 配置core-site.xml (/usr/local/hadoop-3.2.0/etc/hadoop 路径下)
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://centos48:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoopdata/tmp</value>
</property>
</configuration>
3. 配置hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>centos50:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/var/data/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/var/data/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>centos48:8084</value>
</property>
</configuration>
注意这里的8084 即是HDFS web 页面的监听端口
4. 配置workers
centos48
centos49
centos50
5. 配置yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.localizer.address</name>
<value>0.0.0.0:8140</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>centos48</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://centos50:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>centos48:3206</value>
</property>
</configuration>
注意3206 即是 YARN Web页面 的端口
6. 配置mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop-3.2.0</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop-3.2.0</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop-3.2.0</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>centos50:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>centos50:19888</value>
</property>
</configuration>
7、复制Hadoop配置好的包到其他Linux主机
[root@centos48 local]# scp -r ./hadoop-3.2.0 root@centos49:/usr/local/
[root@centos48 local]# scp -r ./hadoop-3.2.0 root@centos50:/usr/local/
8. 格式化NameNode
[root@centos48 local]# hdfs namenode -format
以上xml 的配置在我的网盘中均有,可下载使用 链接:https://pan.baidu.com/s/10GPuELlBQyyIGLmFX1byGw
四 启动 hadoop 集群
1. 设置环境变量 (/etc/profile, 3 台机器都要设置)
export HADOOP_HOME=/usr/local/hadoop-3.2.0
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
2. 启动 hadoop
在chentos48 上执行
[root@centos48 local]# start-all.sh
3. 查看HDFS web 页面
备注(实际访问地址xx: 18084), 读者请忽略
4. 查看 YARN Web页面
备注(实际访问地址xx: 33062), 读者请忽略
五 hadoop 的 mapreduce
查看 hadoop jar /usr/local/hadoop-3.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar 有哪些例子
[root@centos48 mapreduce]# hadoop jar hadoop-mapreduce-examples-3.2.0.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
在hdfs中准备好input目录:
hadoop dfs -mkdir /input
hadoop dfs -mkdir /output
上传一个文件到hdfs, 假定文件名称为wordcount
hadoop dfs -put wordcount /input/wordcount
执行mapreduce 任务
hadoop jar hadoop-mapreduce-examples-3.2.0.jar wordcount /input/wordcount.txt /output/wordcount-result
[root@centos48 mapreduce]# hadoop jar hadoop-mapreduce-examples-3.2.0.jar wordcount /input/wordcount /output/wordcount-result
2019-10-15 15:38:38,332 INFO client.RMProxy: Connecting to ResourceManager at centos48/10.0.0.48:8032
2019-10-15 15:38:50,432 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1571122204786_0003
2019-10-15 15:38:53,780 INFO input.FileInputFormat: Total input files to process : 1
2019-10-15 15:38:56,023 INFO mapreduce.JobSubmitter: number of splits:1
2019-10-15 15:38:56,413 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2019-10-15 15:38:58,531 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1571122204786_0003
2019-10-15 15:38:58,533 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-10-15 15:39:01,142 INFO conf.Configuration: resource-types.xml not found
2019-10-15 15:39:01,142 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2019-10-15 15:39:04,160 INFO impl.YarnClientImpl: Submitted application application_1571122204786_0003
2019-10-15 15:39:04,322 INFO mapreduce.Job: The url to track the job: http://centos48:3306/proxy/application_1571122204786_0003/
2019-10-15 15:39:04,323 INFO mapreduce.Job: Running job: job_1571122204786_0003
2019-10-15 15:39:18,489 INFO mapreduce.Job: Job job_1571122204786_0003 running in uber mode : false
2019-10-15 15:39:18,491 INFO mapreduce.Job: map 0% reduce 0%
2019-10-15 15:39:59,257 INFO mapreduce.Job: map 100% reduce 0%
2019-10-15 15:40:05,419 INFO mapreduce.Job: map 100% reduce 100%
2019-10-15 15:40:05,429 INFO mapreduce.Job: Job job_1571122204786_0003 completed successfully
2019-10-15 15:40:05,526 INFO mapreduce.Job: Counters: 54
查看文件
[root@centos48 mapreduce]# hdfs dfs -ls /output/wordcount-result
Found 2 items
-rw-r--r-- 2 root supergroup 0 2019-10-15 15:40 /output/wordcount-result/_SUCCESS
-rw-r--r-- 2 root supergroup 4565 2019-10-15 15:40 /output/wordcount-result/part-r-00000
[root@centos48 mapreduce]# hdfs dfs -cat /output/wordcount-result/part-r-00000
!(*this 1
"AS 1
"StringPiece" 1
"as_string().c_str()" 1
"const 2
"string" 1
"string". 2
"this" 1
"x" 1
#define 4
#elif 1
#endif 3