1. 系统初始化
一)软件准备
1)VMware-workstation-full-8.0.4-744019.exe
(license key :NF64A-DMJ92-TZNQO-9VCXK-AAJJT)
2)(ubuntu-11.10-server-i386.iso)
3)hadoop-1.0.3.tar.gz(1.0.3)
4)ssh secure shell
二)安装VM
三)安装redhat
四)安装ssh secure shell
可以在本机登录虚拟机
五)设置hostname、hosts
vi/etc/sysconfig/network
修改:HOSTNAME=redhat1
$>hostnameredhat1(此命令后不用重启)
$>hostname (显示修改后的主机名)
vi/etc/hosts
修改:
127.0.0.1localhost
192.168.229.128 redhat1
#192.168.229.129 redhat2
#192.168.229.130 redhat3
六)安装SSH、免密码SSH设置
1) sudoapt-get install ssh(ubuntu)
yum install ssh(centos)
2)生成密钥对:ssh-keygen–t rsa
一路回车,文件保存在/root/.ssh里
3)进入.ssh目录,执行命令:
cpid_rsa.pub authorized_keys
sshlocalhost
七)安装与配置JDK
1、./jdk-6u26-linux-i586.bin
输入java javacjava -version版本有信息
2、安装vim软件,方便以后编辑文件apt-get install vim
(强烈建议安装,因为vi工具使用起来很不方便)
3、配置java环境:
1)vim /etc/profile 编辑profile文件
2)在文件末尾加入如下信息:
exportJAVA_HOME=/usr/java/jdk1.6.0_26
exportJRE_HOME=/usr/java/jdk1.6.0_26/jre
exportCLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
exportPATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATT
3)编辑完毕 :wq保存退出
4)source /etc/profile
八)安装Hadoop
1、tar -zvxf hadoop-1.0.3.tar.gz
mvhadoop-1.0.3 hadoop
2、编辑usr/hadoop2-0.20.2/conf/hadoop-env.sh文件
vim conf/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.6.0_26
3、reboot重启电脑
输入 hadoop version出现版本信息安装完毕
2. 三种工作模式
Ø 单机模式
单机模式是Hadoop的默认模式。当首次解压Hadoop的源码包时,Hadoop无法了解硬件安装环境,便保守地选择了最小配置。在这种默认情况下所有3个XML文件均为空。
当配置文件为空时,Hadoop会完全的运行在本地。因为不需要与其他节点交互,单机模式不使用HDFS,也不加载任何Hadoop的守护进程(不用运行任何.sh命令,如start-all.sh)。该模式主要用于开发调试MapReduce程序的应用逻辑,而不会与守护进程交互,避免引起额外的复杂性。
不做任何配置(连hadoop-env.sh里的jre都不用配置),直接运行bin/hadoop jar hadoop-examples-x.y.z.jar wordcount input output 命令就是单机模式运行方式,它输入输出都是本地文件。
[root@localhosthadoop]# mkdir input
[root@localhosthadoop]# cd input
[root@localhostinput]# ls
[root@localhostinput]# echo "hello lsr" > test1.txt
[root@localhostinput]# ls
test1.txt
[root@localhostinput]# echo "hello hadoop" > test2.txt
[root@localhostinput]# ls
test1.txt test2.txt
[root@localhostinput]# cd ../
[root@localhosthadoop]# bin/hadoop jar hadoop-examples-1.0.3.jar wordcount input output
12/10/2310:34:03 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/10/2310:34:03 INFO input.FileInputFormat: Total input paths to process : 2
12/10/2310:34:03 WARN snappy.LoadSnappy: Snappy native library not loaded
12/10/2310:34:03 INFO mapred.JobClient: Running job: job_local_0001
12/10/2310:34:04 INFO util.ProcessTree: setsid exited with exit code 0
12/10/2310:34:04 INFO mapred.Task: UsingResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@cd5f8b
12/10/2310:34:04 INFO mapred.MapTask: io.sort.mb = 100
12/10/2310:34:04 INFO mapred.JobClient: map 0%reduce 0%
12/10/2310:34:09 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/2310:34:09 INFO mapred.MapTask: record buffer = 262144/327680
12/10/2310:34:09 INFO mapred.MapTask: Starting flush of map output
12/10/2310:34:09 INFO mapred.MapTask: Finished spill 0
12/10/2310:34:09 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And isin the process of commiting
12/10/2310:34:10 INFO mapred.LocalJobRunner:
12/10/2310:34:10 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/10/2310:34:10 INFO mapred.Task: UsingResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@c4fe76
12/10/2310:34:10 INFO mapred.MapTask: io.sort.mb = 100
12/10/2310:34:10 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/2310:34:10 INFO mapred.MapTask: record buffer = 262144/327680
12/10/2310:34:10 INFO mapred.MapTask: Starting flush of map output
12/10/2310:34:10 INFO mapred.MapTask: Finished spill 0
12/10/2310:34:10 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And isin the process of commiting
12/10/2310:34:10 INFO mapred.JobClient: map 100%reduce 0%
12/10/2310:34:13 INFO mapred.LocalJobRunner:
12/10/2310:34:13 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
12/10/2310:34:13 INFO mapred.Task: UsingResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@e28b9
12/10/2310:34:13 INFO mapred.LocalJobRunner:
12/10/2310:34:13 INFO mapred.Merger: Merging 2 sorted segments
12/10/2310:34:13 INFO mapred.Merger: Down to the last merge-pass, with 2 segments leftof total size: 51 bytes
12/10/2310:34:13 INFO mapred.LocalJobRunner:
12/10/2310:34:13 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And isin the process of commiting
12/10/2310:34:13 INFO mapred.LocalJobRunner:
12/10/2310:34:13 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed tocommit now
12/10/2310:34:13 INFO output.FileOutputCommitter: Saved output of task'attempt_local_0001_r_000000_0' to output
12/10/2310:34:16 INFO mapred.LocalJobRunner: reduce > reduce
12/10/2310:34:16 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
12/10/2310:34:16 INFO mapred.JobClient: map 100%reduce 100%
12/10/2310:34:16 INFO mapred.JobClient: Job complete: job_local_0001
12/10/2310:34:16 INFO mapred.JobClient: Counters: 20
12/10/2310:34:16 INFO mapred.JobClient: FileOutput Format Counters
12/10/2310:34:16 INFO mapred.JobClient: BytesWritten=35
12/10/2310:34:16 INFO mapred.JobClient: FileSystemCounters
12/10/2310:34:16 INFO mapred.JobClient: FILE_BYTES_READ=428713
12/10/2310:34:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=528398
12/10/2310:34:16 INFO mapred.JobClient: FileInput Format Counters
12/10/2310:34:16 INFO mapred.JobClient: BytesRead=23
12/10/2310:34:16 INFO mapred.JobClient: Map-Reduce Framework
12/10/2310:34:16 INFO mapred.JobClient: Mapoutput materialized bytes=59
12/10/2310:34:16 INFO mapred.JobClient: Mapinput records=2
12/10/2310:34:16 INFO mapred.JobClient: Reduce shuffle bytes=0
12/10/2310:34:16 INFO mapred.JobClient: Spilled Records=8
12/10/23 10:34:16INFO mapred.JobClient: Map outputbytes=39
12/10/2310:34:16 INFO mapred.JobClient: Totalcommitted heap usage (bytes)=548130816
12/10/2310:34:16 INFO mapred.JobClient: CPUtime spent (ms)=0
12/10/2310:34:16 INFO mapred.JobClient: SPLIT_RAW_BYTES=200
12/10/2310:34:16 INFO mapred.JobClient: Combine input records=4
12/10/2310:34:16 INFO mapred.JobClient: Reduce input records=4
12/10/2310:34:16 INFO mapred.JobClient: Reduce input groups=3
12/10/2310:34:16 INFO mapred.JobClient: Combine output records=4
12/10/2310:34:16 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
12/10/2310:34:16 INFO mapred.JobClient: Reduce output records=3
12/10/2310:34:16 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
12/10/2310:34:16 INFO mapred.JobClient: Mapoutput records=4
[root@localhosthadoop]# cd output
[root@localhostoutput]# ls
part-r-00000 _SUCCESS
[root@localhostoutput]# cat part-r-00000
hadoop 1
hello 2
lsr 1
[root@localhostoutput]#
Ø 伪分布模式
1) 解压文件
tar xzvf hadoop-1.0.3.tar.gz
2) 修改配置文件
1、hadoop-env.sh
增加:
export JAVA_HOME=/soft/java/jdk1.6.0_13
2、core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ubuntu:9000</value>
</property>
</configuration>
3、hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/hadoopdata/tmp</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/hadoop/hadoopdata/fs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/hadoop/hadoopdata/fs/data</value>
</property>
</configuration>
4、mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>ubuntu:9001</value>
</property>
</configuration>
3) 格式文件系统
bin/hadoop namenode –format
4) 启动
root@ubuntu:/_nosql/hadoop# bin/start-all.sh
root@ubuntu:/_nosql/hadoop# jps
9117 JobTracker
8927 DataNode
9228 TaskTracker
9266 Jps
9037 SecondaryNameNode
8814 NameNode
5) 验证安装是否成功
http://ubuntu:50030 (MapReduce页面)
http://ubuntu:50070 (HDFS页面)
Ø 全分布(集群)模式
节点说明:
节点类型 节点IP 节点hostname
master节点 192.168.40.4 master
slave节点 192.168.40.5 salve1
192.168.40.6 salve2
192.168.40.7 slave3
secondaryName节点 192.168.40.4
配置步骤:
一、按本章第一节系统初始化安装一台虚拟机
二、用VMWare(manager中的clone) clone三台虚拟机
三、修改每台虚拟机的hosts,hostname
vi /etc/sysconfig/network
修改:HOSTNAME=master
$>hostname master(此命令后不用重启)
$>hostname (显示修改后的主机名)
vi /etc/hosts
修改:
127.0.0.1 localhost
192.168.40.4 master
192.168.40.5 salve1
192.168.40.6 salve2
192.168.40.7 slave3
四、修改测试四台虚拟机的ssh
① 生成密钥并配置ssh无密码登陆主机(在master主机)
ssh -keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
② 将authorized_keys文件拷贝到两台slave主机
scp authorized_keys slave1:~/.ssh/
scp authorized_keys slave2:~/.ssh/
③ 检查是否可以从master无密码登陆slave机
ssh slave1(在master主机输入)登陆成功则配置成功,exit退出slave1返回master
五、关闭四台虚拟机的防火墙
关闭防火墙
Shell代码
service iptables stop
机器重启后,防火墙还会开启。
关闭/开启Red hat防火墙
/* 关闭防火墙 */
service iptables stop
/* 开启防火墙 */
service iptables start
/* 默认关闭防火墙 */
六、配置hadoop
1、hadoop-env.sh
增加:
export JAVA_HOME=/_work/jdk
2、core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
3、hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/_work/hadoop/hadoopdata/tmp</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/_work/hadoop/hadoopdata/fs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/_work/hadoop/hadoopdata/fs/data</value>
</property>
</configuration>
4、mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
</configuration>
5、masters
6、slavers
slaver2
slaver3
七、启动hadoop
hadoop namenode -format
start-all.sh,或者执行start-dfs.sh,再执行start-mapred.sh
八、测试
master节点:
[root@master ~]# jps
25429 SecondaryNameNode
25500 JobTracker
25201 NameNode
18474 Jps
slave节点:
[root@slave1 ~]# jps
4469 TaskTracker
4388 DataNode
29622 Jps
hadoop fs -ls /
hadoop fs -mkdir /newDir
[root@slave1 hadoop-0.20.2]#hadoop jar hadoop-0.20.2-examples.jar pi 4 2
Number of Maps = 4
Samples per Map = 2
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Starting Job
12/05/20 09:45:19 INFO mapred.FileInputFormat:Total input paths to process : 4
12/05/20 09:45:19 INFOmapred.JobClient: Running job: job_201205190417_0005
12/05/20 09:45:20 INFOmapred.JobClient: map 0% reduce 0%
12/05/20 09:45:30 INFOmapred.JobClient: map 50% reduce 0%
12/05/20 09:45:31 INFOmapred.JobClient: map 100% reduce 0%
12/05/20 09:45:45 INFOmapred.JobClient: map 100% reduce 100%
12/05/20 09:45:47 INFOmapred.JobClient: Job complete: job_201205190417_0005
12/05/20 09:45:47 INFOmapred.JobClient: Counters: 18
12/05/20 09:45:47 INFOmapred.JobClient: Job Counters
12/05/20 09:45:47 INFOmapred.JobClient: Launched reducetasks=1
12/05/20 09:45:47 INFOmapred.JobClient: Launched maptasks=4
12/05/20 09:45:47 INFOmapred.JobClient: Data-local maptasks=4
12/05/20 09:45:47 INFOmapred.JobClient: FileSystemCounters
12/05/20 09:45:47 INFOmapred.JobClient: FILE_BYTES_READ=94
12/05/20 09:45:47 INFOmapred.JobClient: HDFS_BYTES_READ=472
12/05/20 09:45:47 INFOmapred.JobClient: FILE_BYTES_WRITTEN=334
12/05/20 09:45:47 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=215
12/05/20 09:45:47 INFOmapred.JobClient: Map-Reduce Framework
12/05/20 09:45:47 INFOmapred.JobClient: Reduce inputgroups=8
12/05/20 09:45:47 INFOmapred.JobClient: Combine outputrecords=0
12/05/20 09:45:47 INFOmapred.JobClient: Map input records=4
12/05/20 09:45:47 INFOmapred.JobClient: Reduce shufflebytes=112
12/05/20 09:45:47 INFOmapred.JobClient: Reduce outputrecords=0
12/05/20 09:45:47 INFOmapred.JobClient: Spilled Records=16
12/05/20 09:45:47 INFOmapred.JobClient: Map output bytes=72
12/05/20 09:45:47 INFOmapred.JobClient: Map input bytes=96
12/05/20 09:45:47 INFOmapred.JobClient: Combine inputrecords=0
12/05/20 09:45:47 INFO mapred.JobClient: Map output records=8
12/05/20 09:45:47 INFOmapred.JobClient: Reduce inputrecords=8
Job Finished in 28.952 seconds
Estimated value of Pi is3.50000000000000000000
说明:
NameNode和SecondNameNode分开部署的方法:
1、 master文件:
在master文件中直接写上SecondNameNode要部署的机器的主机名或IP
说明:master文件不决定哪个是NameNode,而决定的是SecondNameNode(决定谁是NameNode的关键是core-site.xml的fs.default.name这个参数)。
2、 hdfs-site.xml增加一个参数
<name>dfs.http.address</name>
<value>(NameNode的主机名或IP):50070</value>
3、 core-site.xml增加两个参数,一般使用默认就可以
fs.checkpoint.period表示多长时间记录一次HDFS的镜像,默认一个小时
<name>fs.checkpoint.period</name>
<value>3600</value>
fs.checkpoint.size表示一次记录多大的size,默认64M
<name>fs.checkpoint.size</name>
<size>67108864</size>
4、 验证
到目标机器上jps查看进程是否启动
进入hdfs-site.xml文件配置的fs.checkpoint.dir({fs.default.dir}/dfs/namesecondary文件夹下用ll命令查看文件,cd current再用ll命令查看文件。