一、下载hadoop-2.6.0.tar.gz
本文采用hadoop2.6
二、解压tar包
#tar -zxvf hadoop-2.6.0.tar.gz
三、配置环境变量
$ vim .bash_profile
export HADOOP_HOME=/home/hadoop/hadoop-2.6.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
$ source .bash_profile
hadoop version
Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled by jenkins on 2014-11-13T21:10Z
Compiled with protoc 2.5.0
From source with checksum 18e43357c8f927c0695f1e9522859d6a
This command was run using /home/hadoop/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar
验证配置
四、单机版hadoop
不需要做任何配置
$ mkdir input
$ cp ./etc/hadoop/* ./input
$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep ./input ./output 'dfs[a-z.]+' 执行hadoop自带例子
$ cat ./output/* 查看结果
6 dfs.audit.logger
4 dfs.class
3 dfs.server.namenode.
2 dfs.period
2 dfs.audit.log.maxfilesize
2 dfs.audit.log.maxbackupindex
1 dfsmetrics.log
1 dfsadmin
1 dfs.servers
1 dfs.file
五、伪分布式配置
1、配置hadoop-env.sh
export JAVA_HOME=/home/hadoop/java/jdk1.7.0_40 配置JAVA环境变量
2、配置core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/hadoop-2.6.0/tmp</value>
</property>
3、配置hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop-2.6.0/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop-2.6.0/tmp/dfs/data</value>
</property>
4、格式化
$ hadoop namennode -format
5、启动hdfs
$ start-dfs.sh
6、查看服务
$ jps
15051 Jps
10256 SecondaryNameNode
9976 NameNode
10098 DataNode
7、通过浏览器查看
http://ip:50070
8、注意事项:
1)格式化时若出现问题需要删除/home/hadoop/hadoop-2.6.0/tmp下自动生成的目录
2)若web不能访问50070,关闭防火墙 service iptables stopiptables
3)若出现其它问题,查看/home/hadoop/hadoop-2.6.0/logs下的日志
9、运行伪分布式实例
$ hdfs dfs -ls
若报“ls: '.': No such file or directory”
$ hdfs dfs -mkdir -p /user/hadoop
创建输入目录
$ hdfs dfs -mkdir input
将etc/hadoop/目录下的文件放入hdfs的input中,为执行实例做准备数据
$ hdfs dfs -put ~/hadoop-2.6.0/etc/hadoop/* input
创建输出目录
$ hdfs dfs -mkdir output
执行命令
$ hadoop jar ~/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep input output 'dfs[a-z.]+'
查看结果
$ hdfs dfs -cat output/*
1 dfsadmin
1 dfs.replication
1 dfs.namenode.name.dir
1 dfs.datanode.data.dir
10、启动YARN
YARN 是从 MapReduce 中分离出来的,负责资源管理与任务调度
停止hdfs服务
$ stop-dfs.sh
配置mapred-site.xml
$ mv mapred-site.xml.template mapred-site.xml
$ vim mapred-site.xml
添加
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
配置yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
启动服务
$ start-dfs.sh
$ start-yarn.sh
$ mr-jobhistory-daemon.sh start historyserver
验证服务
$ jps
15276 Jps
10256 SecondaryNameNode
10801 JobHistoryServer
9976 NameNode
10098 DataNode
10390 ResourceManager
10482 NodeManager
访问http://ip:8088查看任务信息