文章目录
- 1.基本信息
- 2.安装过程
- 1).切换到hadoop账户,通过tar -zxvf命令将hadoop解压缩至目的安装目录:
- 2).创建tmpdir目录:
- 3).配置hadoop-env.sh文件:
- 4).配置mapred-env.sh文件:
- 5).配置core-site.xml文件 core-site.xml
- 6).配置hdfs-site.xml文件 hdfs-site.xml
- 7).配置mapred-site.xml文件 mapred-site.xml
- 8).配置yarn-site.xml文件: yarn-site.xml
- 9).配置hadoop运行的环境变量
- 10).修改slaves文件:
- 11).在test上复制hadoop-2.7.3到hadoop@test2和hadoop@test2机器并按照步骤9修改环境变量并执行以下操作:
- 12).格式化namenode(仅第一次启动需要格式化!),启动hadoop,并启动jobhistory服务:
- 13).检查每台机器的服务,test、test2、test3三台机器上分别输入jps:
- Q&A
- hadoop核心要素
1.基本信息
- 版本2.7.3
- 安装机器三台机器
- 账号hadoop
- 源路径/opt/software/hadoop-2.7.3.tar.gz
- 目标路径/opt/hadoop -> /opt/hadoop-2.7.3
- 依赖关系zookeeper
2.安装过程
1).切换到hadoop账户,通过tar -zxvf命令将hadoop解压缩至目的安装目录:
[root@test opt]# su hadoop
[hadoop@test opt]$ cd /opt/software
[hadoop@test software]$ tar -zxvf hadoop-${version}.tar.gz -C /opt
[hadoop@test software]$ cd /opt
[hadoop@test opt]$ ln -s /opt/hadoop-${version} /opt/hadoop
2).创建tmpdir目录:
[hadoop@test opt]$ cd /opt/hadoop
[hadoop@test hadoop]$ mkdir -p tmpdir
3).配置hadoop-env.sh文件:
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ mkdir -p /opt/hadoop/pids
[hadoop@test hadoop]$ vim hadoop-env.sh
在hadoop-env.sh文件中添加如下配置:
export JAVA_HOME=/opt/java
export HADOOP_PID_DIR=/opt/hadoop/pids
4).配置mapred-env.sh文件:
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim mapred-env.sh
在mapred-env.sh文件中添加如下配置:
export JAVA_HOME=/opt/java
5).配置core-site.xml文件 core-site.xml
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim core-site.xml
在core-site.xml文件中添加如下配置:
<configuration>
<property>
//namenode的临时工作目录
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmpdir</value>
</property>
<property>
//hdfs的入口,告诉namenode在那个机器上,端口号是什么。
<name>fs.defaultFS</name>
<value>hdfs://test:8020</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>1440</value>
</property>
</configuration>
6).配置hdfs-site.xml文件 hdfs-site.xml
在安装的时候如果没有安装过rnager,那么在该文件中需要将以下代码注释掉。
<property>
<name>dfs.namenode.inode.attributes.provider.class</name>
<value>org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer</value>
</property>
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim hdfs-site.xml
在hdfs-site.xml文件中添加如下配置:
<configuration>
<property>
#副本数量,一般是小于等于datanode的数量,
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/data/datanode</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>test:50090</value>
</property>
</configuration>
7).配置mapred-site.xml文件 mapred-site.xml
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim mapred-site.xml
在mapred-site.xml文件中添加如下配置:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>test:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>test:19888</value>
</property>
</configuration>
8).配置yarn-site.xml文件: yarn-site.xml
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim yarn-site.xml
在yarn-site.xml文件中添加如下配置:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>test:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>test:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>test:8032</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>test:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>test:8088</value>
</property>
<!-- Site specific YARN configuration properties -->
</configuration>
9).配置hadoop运行的环境变量
[hadoop@test hadoop]$ vim /etc/profile
export HADOOP_HOME=/opt/hadoop
export PATH=$HADOOP_HOME/bin:$PATH
配置成功后,执行source /etc/profile使配置生效
[hadoop@test hadoop]$ source /etc/profile
10).修改slaves文件:
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop
[hadoop@test hadoop]$ vim slaves
在slaves文件中添加
//datanode的节点的位置
test2
test3
11).在test上复制hadoop-2.7.3到hadoop@test2和hadoop@test2机器并按照步骤9修改环境变量并执行以下操作:
[hadoop@test hadoop]$ scp -r /opt/hadoop-${version} hadoop@test2:/opt/
[hadoop@test hadoop]$ ln -s /opt/hadoop-${version} /opt/hadoop
[hadoop@test hadoop]$ scp -r /opt/hadoop-${version} hadoop@test3:/opt/
[hadoop@test hadoop]$ ln -s /opt/hadoop-${version} /opt/hadoop
12).格式化namenode(仅第一次启动需要格式化!),启动hadoop,并启动jobhistory服务:
# 格式化 namenode ,仅第一次启动需要格式化!!
[hadoop@test hadoop]$ hadoop namenode -format
# 启动
[hadoop@test hadoop]$ ${HADOOP_HOME}/sbin/start-all.sh
[hadoop@test hadoop]$ ${HADOOP_HOME}/sbin/mr-jobhistory-daemon.sh start historyserver
start-all.sh包含dfs和yarn两个模块的启动,分别为start-dfs.sh 、 start-yarn.sh,所以dfs和yarn可以单独启动。
注意:如果datanode没有启动起来,看看是不是tmpdir中有之前的脏数据,删除这个目录其他两台机器也要删除。
13).检查每台机器的服务,test、test2、test3三台机器上分别输入jps:
[hadoop@test ~]$ jps
24429 Jps
22898 ResourceManager
24383 JobHistoryServer
22722 SecondaryNameNode
22488 NameNode
[ahdoop@test2 ~]$ jps
7650 DataNode
7788 NodeManager
8018 Jps
[hadoop@test3 ~]$ jps
28407 Jps
28038 DataNode
28178 NodeManager
如果三台机器正常输出上述内容,则表示hadoop集群的服务正常工作。
访问hadoop的服务页面:在浏览器中输入如下地址:http://172.24.5.173:8088
跑一个简单的mr程序,验证集群是否安装成功
[hadoop@test mapreduce]$ cd /opt/hadoop/share/hadoop/mapreduce
[hadoop@test mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.3.jar pi 2 4
Number of Maps = 2
Samples per Map = 4
Wrote input for Map #0
Wrote input for Map #1
Starting Job
17/04/06 09:36:47 INFO client.RMProxy: Connecting to ResourceManager at test/172.24.5.173:8032
17/04/06 09:36:47 INFO input.FileInputFormat: Total input paths to process : 2
17/04/06 09:36:48 INFO mapreduce.JobSubmitter: number of splits:2
17/04/06 09:36:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491470782060_0001
17/04/06 09:36:48 INFO impl.YarnClientImpl: Submitted application application_1491470782060_0001
17/04/06 09:36:48 INFO mapreduce.Job: The url to track the job: http://test:8088/proxy/application_1491470782060_0001/
17/04/06 09:36:48 INFO mapreduce.Job: Running job: job_1491470782060_0001
17/04/06 09:36:56 INFO mapreduce.Job: Job job_1491470782060_0001 running in uber mode : false
17/04/06 09:36:56 INFO mapreduce.Job: map 0% reduce 0%
17/04/06 09:37:00 INFO mapreduce.Job: map 50% reduce 0%
17/04/06 09:37:02 INFO mapreduce.Job: map 100% reduce 0%
17/04/06 09:37:08 INFO mapreduce.Job: map 100% reduce 100%
17/04/06 09:37:08 INFO mapreduce.Job: Job job_1491470782060_0001 completed successfully
17/04/06 09:37:08 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=50
FILE: Number of bytes written=357588
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=554
HDFS: Number of bytes written=215
HDFS: Number of read operations=11
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=6118
Total time spent by all reduces in occupied slots (ms)=4004
Total time spent by all map tasks (ms)=6118
Total time spent by all reduce tasks (ms)=4004
Total vcore-milliseconds taken by all map tasks=6118
Total vcore-milliseconds taken by all reduce tasks=4004
Total megabyte-milliseconds taken by all map tasks=6264832
Total megabyte-milliseconds taken by all reduce tasks=4100096
Map-Reduce Framework
Map input records=2
Map output records=4
Map output bytes=36
Map output materialized bytes=56
Input split bytes=318
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=56
Reduce input records=4
Reduce output records=0
Spilled Records=8
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=213
CPU time spent (ms)=2340
Physical memory (bytes) snapshot=713646080
Virtual memory (bytes) snapshot=6332133376
Total committed heap usage (bytes)=546308096
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=236
File Output Format Counters
Bytes Written=97
Job Finished in 20.744 seconds
Estimated value of Pi is 3.50000000000000000000
Q&A
Q: stop-all.sh无法停止hadoop集群 ?
A: 由于hadoop进程的信息保存在tmp中,而tmp会被定时清空
Q:无法启动namenode
A: core-site.xml 里的 namenode value 不能有下划线!!!!
hadoop核心要素
- node
- namenode
存储元信息
- namenode
- manage
-
nodemanage
1.管理单个节点中的计算功能
2.与ResourcesManger(集群管理者)和ApplicationMaster(单机上的主进程)保持通信
3.管理container的生命周期,监控每一个container的资源使用(内存、CPU,追踪节点健康状况、管理日志)
-