注意:为免测试出现特殊状况,请严格按照样板参数进行配置,了解参数配置原则的可以自行修改
注意:为免测试出现特殊状况,请严格按照样板参数进行配置,了解参数配置原则的可以自行修改
注意:为免测试出现特殊状况,请严格按照样板参数进行配置,了解参数配置原则的可以自行修改
环境及软件说明:
windows7 64位
xshell5
vmware12
centOS7
jdk1.8.0_152.rpm
hadoop2.6.5.tar.gz
安装vmware12
创建虚拟机vmware12
- VMWare配置
- 虚拟网络编辑器,为VMnet8配置如下:
1 子网IP:192.168.2.0
2 子网掩码:255.255.255.0
3 NAT设置 网关IP:192.168.2.2 子网掩码:255.255.255.0
- 虚拟网络编辑器,为VMnet8配置如下:
- 虚拟机
- 名称master,路径为windows路径,选个空间较大的磁盘
- 硬件
1 cpu 2处理器2核心
2 内存 1GB
3 硬盘 60GB
4 网络 NAT模式,开
5 其他 能关的都关 - 系统安装配置
1 除特殊说明其他均为centOS7默认配置
2 网络 打开,为IPV4设置静态IP 192.168.2.11 DNS 8.8.8.8 网关 192.168.2.2
3 root用户密码,可以随意设置一个普通用户名和密码,虽然用不到
配置开始
windows环境下安装xshell
为master虚拟机配置一个ssh连接和一个sftp连接,为了方便,直接使用root用户名和root密码
1 发送文件
连接sftp
为master发送文件jdk和hadoop(注意windows下的文件路径)
put jdk-8u152-linux-x64.rpm
put hadoop-2.6.5.tar.gz
发送完毕之后可以断开sftp
2 安装JDK和HADOOP
连接ssh
1 安装JDK
rpm -ivh jdk-8u152-linux-x64.rpm
2 安装hadoop
tar -xzvf hadoop-2.6.5.tar.gz
mv hadoop-2.6.5 hadoop
mv hadoop /usr/
3配置JAVA和HADOOP环境变量
3.1 安装vim编辑器
yum install vim #需要很多次输入y确认
3.2 配置环境变量
vim /etc/profile
#文件已经有很多内容,在文件最后加入如下代码
export JAVA_HOME=/usr/java/jdk1.8.0_152
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/usr/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
#按下i可以插入文本,添加完毕后按下Esc输入":wq"保存退出
source /etc/profile #重载环境变量
env | grep java #验证JAVA环境变量
env | grep hadoop #验证HADOOP环境变量
3.3 配置hosts文件
vim /etc/hosts
#配置成如下状态
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#把::1注释掉
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
#添加集群IP
192.168.2.11 m
192.168.2.12 n1
192.168.2.13 n2
192.168.2.14 n3
# 配置完毕保存退出
3.4 配置hadoop
#配置模板:
<configuration>
<property>
<name></name>
<value></value>
</property>
</configuration>
#1 core
vim /usr/hadoop/etc/hadoop/core-site.xml
#样板配置
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://m/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/data/tmp</value>
</property>
</configuration>
#2 hdfs
vim /usr/hadoop/etc/hadoop/hdfs-site.xml
#样板配置
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/hadoop/data/nd</value>
</property>
<property>
<name>dfs.namenode.data.dir</name>
<value>/usr/hadoop/data/dd</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>n1:50090</value>
</property>
</configuration>
#3 mapred
vim /usr/hadoop/etc/hadoop/mapred-site.xml
#样板配置
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
#4 yarn
vim /usr/hadoop/etc/hadoop/yarn-site.xml
#样板配置
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>m</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>1</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
</configuration>
#5 slaves
vim /usr/hadoop/etc/hadoop/slaves
#样板配置
n1
n2
n3
#6 hadoop-env.sh
vim /usr/hadoop/etc/hadoop/hadoop-env.sh
#将其中的JAVA_HOME修改为绝对路径
export JAVA_HOME=/usr/java/jdk1.8.0_152
#7 在bin目录下创建一个用于初始化缓存文件的shell脚本
vim /usr/hadoop/bin/myini
#--------------------以下为配置内容↓
#!/bin/bash
#文件初始化
rm -rf /usr/hadoop/data
rm -rf /usr/hadoop/logs
mkdir -p /usr/hadoop/data/tmp
mkdir -p /usr/hadoop/data/nd
mkdir -p /usr/hadoop/data/dd
#--------------------以上为配置内容↑
3.5 关闭防火墙
systemctl stop firewalld.service
systemctl disable firewalld.service
4 克隆虚拟机
关机
将虚拟机完整克隆3份,分别命名为node1,node2,node3
分别启动,并为node1,node2,node3设置ip地址
vim /etc/sysconfig/network-scripts/ifcfg-ens33
#将ipaddr修改为192.168.1.12
#将ipaddr修改为192.168.1.13
#将ipaddr修改为192.168.1.14
#修改后保存退出,并重启网络
service network restart
5 设置免密登录
使用xshell创建n1,n2,n3的连接,用户均使用root,同时连接4台虚拟机,将XShell的群发命令修改一下
依次输入如下代码:
ssh-keygen -t rsa
#多次回车直到生成密钥完毕,产生rsa图片
ssh-copy-id m
#输入yes
#输入root密码
ssh-copy-id n1
#输入yes
#输入root密码
ssh-copy-id n2
#输入yes
#输入root密码
ssh-copy-id n3
#输入yes
#输入root密码
测试hadoop
1 重启所有虚拟机
#统一命令
reboot
2 hdfs格式化
#统一命令
myInit
#以下均为master命令
hdfs namenode -format
3 启动hadoop
start-dfs.sh
start-yarn.sh
4 windows使用web访问hadoop和yarn
http://192.168.2.11:50070
http://192.168.2.11:8088
5 测试mapreduce
# 提供测试文件
mkdir input
vim input/a
# vim编辑器中随意输入几个单词
a
b c
a b c
# 上传输入测试文件
hadoop fs -put input /
# 开始测试
hadoop jar /usr/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount hdfs:///input hdfs:///output
如下状态为测试成功
18/01/03 02:31:51 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/01/03 02:31:51 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/01/03 02:31:51 INFO input.FileInputFormat: Total input paths to process : 1
18/01/03 02:31:51 INFO mapreduce.JobSubmitter: number of splits:1
18/01/03 02:31:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local861712628_0001
18/01/03 02:31:52 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/01/03 02:31:52 INFO mapreduce.Job: Running job: job_local861712628_0001
18/01/03 02:31:52 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/01/03 02:31:52 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/01/03 02:31:52 INFO mapred.LocalJobRunner: Waiting for map tasks
18/01/03 02:31:52 INFO mapred.LocalJobRunner: Starting task: attempt_local861712628_0001_m_000000_0
18/01/03 02:31:52 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/01/03 02:31:52 INFO mapred.MapTask: Processing split: hdfs://m/input/a:0+13
18/01/03 02:31:53 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/01/03 02:31:53 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/01/03 02:31:53 INFO mapred.MapTask: soft limit at 83886080
18/01/03 02:31:53 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/01/03 02:31:53 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/01/03 02:31:53 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/01/03 02:31:53 INFO mapred.LocalJobRunner:
18/01/03 02:31:53 INFO mapred.MapTask: Starting flush of map output
18/01/03 02:31:53 INFO mapred.MapTask: Spilling map output
18/01/03 02:31:53 INFO mapred.MapTask: bufstart = 0; bufend = 36; bufvoid = 104857600
18/01/03 02:31:53 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214376(104857504); length = 21/6553600
18/01/03 02:31:53 INFO mapred.MapTask: Finished spill 0
18/01/03 02:31:53 INFO mapred.Task: Task:attempt_local861712628_0001_m_000000_0 is done. And is in the process of committing
18/01/03 02:31:53 INFO mapreduce.Job: Job job_local861712628_0001 running in uber mode : false
18/01/03 02:31:53 INFO mapreduce.Job: map 0% reduce 0%
18/01/03 02:31:53 INFO mapred.LocalJobRunner: map
18/01/03 02:31:53 INFO mapred.Task: Task 'attempt_local861712628_0001_m_000000_0' done.
18/01/03 02:31:53 INFO mapred.LocalJobRunner: Finishing task: attempt_local861712628_0001_m_000000_0
18/01/03 02:31:53 INFO mapred.LocalJobRunner: map task executor complete.
18/01/03 02:31:53 INFO mapred.LocalJobRunner: Waiting for reduce tasks
18/01/03 02:31:53 INFO mapred.LocalJobRunner: Starting task: attempt_local861712628_0001_r_000000_0
18/01/03 02:31:53 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
18/01/03 02:31:53 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@511c375c
18/01/03 02:31:53 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
18/01/03 02:31:53 INFO reduce.EventFetcher: attempt_local861712628_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
18/01/03 02:31:53 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local861712628_0001_m_000000_0 decomp: 26 len: 30 to MEMORY
18/01/03 02:31:53 INFO reduce.InMemoryMapOutput: Read 26 bytes from map-output for attempt_local861712628_0001_m_000000_0
18/01/03 02:31:53 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 26, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->26
18/01/03 02:31:53 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
18/01/03 02:31:53 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/01/03 02:31:53 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
18/01/03 02:31:53 INFO mapred.Merger: Merging 1 sorted segments
18/01/03 02:31:53 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 22 bytes
18/01/03 02:31:53 INFO reduce.MergeManagerImpl: Merged 1 segments, 26 bytes to disk to satisfy reduce memory limit
18/01/03 02:31:53 INFO reduce.MergeManagerImpl: Merging 1 files, 30 bytes from disk
18/01/03 02:31:53 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
18/01/03 02:31:53 INFO mapred.Merger: Merging 1 sorted segments
18/01/03 02:31:53 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 22 bytes
18/01/03 02:31:53 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/01/03 02:31:54 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
18/01/03 02:31:54 INFO mapred.Task: Task:attempt_local861712628_0001_r_000000_0 is done. And is in the process of committing
18/01/03 02:31:54 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/01/03 02:31:54 INFO mapred.Task: Task attempt_local861712628_0001_r_000000_0 is allowed to commit now
18/01/03 02:31:54 INFO output.FileOutputCommitter: Saved output of task 'attempt_local861712628_0001_r_000000_0' to hdfs://m/output/_temporary/0/task_local861712628_0001_r_000000
18/01/03 02:31:54 INFO mapred.LocalJobRunner: reduce > reduce
18/01/03 02:31:54 INFO mapred.Task: Task 'attempt_local861712628_0001_r_000000_0' done.
18/01/03 02:31:54 INFO mapred.LocalJobRunner: Finishing task: attempt_local861712628_0001_r_000000_0
18/01/03 02:31:54 INFO mapred.LocalJobRunner: reduce task executor complete.
18/01/03 02:31:54 INFO mapreduce.Job: map 100% reduce 100%
18/01/03 02:31:54 INFO mapreduce.Job: Job job_local861712628_0001 completed successfully
18/01/03 02:31:54 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=585780
FILE: Number of bytes written=1103682
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=26
HDFS: Number of bytes written=12
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=3
Map output records=6
Map output bytes=36
Map output materialized bytes=30
Input split bytes=81
Combine input records=6
Combine output records=3
Reduce input groups=3
Reduce shuffle bytes=30
Reduce input records=3
Reduce output records=3
Spilled Records=6
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=61
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=242360320
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=13
File Output Format Counters
Bytes Written=12
6 查看输出文件
hadoop fs -cat /output/part-r-00000