首先配置好SSH免登陆和JAVA 1.8.0环境,在前面的博客中已经记录。
(1)在官网下载Hadoop 2.6.0 安装包
(2)解压文件到 /usr/local 文件夹下:
tar -zxvf hadoop-2.6.0.tar.gz
(3)将hadoop 文件夹加入hadoop组中
(4)配置hadoop 运行环境,修改profile文件
vim /etc/profile
export HADOOP_HOME=/usr/local/hadoop-2.6.0
export HADOOP_HOME_WARN_SUPPRESS=1
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
并且保存 source profile
(5)yarn_env.sh 增加JAVA 环境
export JAVA_HOME=/usr/local/jdk1.8.0_60
(6)配置Hadoop核心配置文件 (一共有5个文件需要修改)
分别为:(1)core-site.xml (2)hdfs-site.xml (3)mapred-site.xml (4)slaves (5)yarn-site.xml
(1) core-site.xml (mkdir /usr/local/hadoop-2.9.0/tmp)
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop-2.9.0/tmp</value>
</property>
</configuration>
(2) hdfs-site.xml (mkdir /usr/local/hadoop-2.9.0/data/namespace dataspace)
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop-2.9.0/data/namespace</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop-2.9.0/data/dataspace</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node1:50090</value>
<final>true</final>
<description>The secondary namenode http server address and port</description>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
<final>true</final>
<description>Enable WebHDFS (REST API) in Namenodes and Datanodes</description>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<final>true</final>
<description>Disable permission checking in HDFS</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
<final>true</final>
<description>Default block replication</description>
</property>
</configuration>
(4)mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
(5)yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>node2:8030</value>
<final>true</final>
<description>The address of the scheduler interface</description>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>node2:8031</value>
<final>true</final>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>node2:8032</value>
<final>true</final>
<description>The address of the applications manager interface in the RM</description>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>node2:8033</value>
<final>true</final>
<description>The address of the RM admin interface</description>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>node2:8088</value>
<final>true</final>
<description>The http address of the RM web application</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<final>true</final>
</property>
</configuration>
(6)slaves
localhost
node1
node2
(6)将配置好的文件夹拷贝到(SCP)Node1 和Node2 机器上.
(7) 更改两台机器上hadoop文件权限,增加Hadoop 用户组。
(7)测试Hadoop 是否安装配置成功
首先格式化HDFS,hadoop namenode -format
(8)启动 hdfs ./start-dfs.sh
(9) 启动 yarn ./start-yarn.sh
(10) 节点上观测hadoop进程
(11)测试Hadoop DM界面
(12)测试wordcount
在/home文件夹下创建 file文件夹,并创建两个文本文件file1和file2.内容为:hello xiakeann 和 bye xiakeann
(13)在hdfs上创建input文件夹并把两个文件复制到input文件夹中
hadoop fs -mkdir /input
hadoop fs -put /home/hadoop/file1 /input
hadoop fs -put /home/hadoop/file2/input
(14) 在hdfs上创建output文件夹
hadoop fs -mkdir output
(15)执行hadoop wordcount程序
注意:hadoop wordcount jar 包在hadoop包中的hadoop-2.6.0->share->hadoop->mapreduce中的hadoop-mapreduce-examples 2.6.0.jar中。
hadoop jar share/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input/ /output/wordcount 计算结果在 /output/wordcount/文件夹下。
<img data-cke-saved-src="https://img-blog.csdn.net/20161117201122913?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" src="https://img-blog.csdn.net/20161117201122913?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" />
(16)查看统计结果
<img data-cke-saved-src="https://img-blog.csdn.net/20161117201229244?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" src="https://img-blog.csdn.net/20161117201229244?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" />
搭建Hadoop集群网上的文章有很多,遇到问题不断的查找,最终总是可以解决问题的。感觉最繁碎的问题是权限,我后面一概就用root了。改起 来烦。搭建完一个hadoop根本不算什么。搞懂hadoop适合的业务情形,搞懂Hadoop的设计思想,在写自己程序时,可以灵活运用,达到它山 之石可以攻玉的效果,那才是学习Hadoop的最终目的。