学习Hadoop开发已经有一段时间了,从开始的入门到如今的开发阶段,下面是我在学习hadoop过程中所遇到的问题希望对大家有所帮助:
Hadoop的安装与配置分为三部分:单机的 伪分布式 和 集群
1.首先去官方网站下载最近的hadoop版本(http://mirror.bit.edu.cn/apache/hadoop/common/)
2.解压 tar -zxvf hadoop-x.y.z(你的hadoop版本) -C /home/chianyu(你想解压的目录)
3.安装Hadoop单机伪分布式模式:<官网--http://hadoop.apache.org/common/docs/r1.0.3/single_node_setup.html
特别说明:
集群中各机器上hadoop安装路径一致;系统用户名一致,方便ssh无密码登陆;设置不同的主机名,以便主机间通信。
前提条件:
sudo apt-get install ssh
sudo apt-get install rsync
ssh localhost
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa //产生公私密钥 目的是为了提供无密码访问
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
ssh localhost //无密码访问
接下来开始配置文件
1./hadoop-1.0.3/conf/core-site.xml:
<property>
<name>hadoop.tmp.dir</name>
<value>/home/chianyu/hadoopdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
2./hadoop-1.0.3/conf/hdfs-site.xml:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
3./hadoop-1.0.3/conf/mapred-site.xml:
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
配置完成之后开始启动hadoop 当然在启动前去配置一下环境变量 在终端输入sudo vim /etc/profile
export JAVA_HOME=/home/chianyu/jdk1.6.0_33
export HADOOP_HOME=/home/chianyu/hadoop-1.0.3
export PATH=$HADOOP_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
输入source /etc/profile 使之生效
接下来通过start-all.sh来启动hadoop咯 启动hadoop之后终端出现如下界面:
starting namenode, logging to /home/chianyu/hadoop-1.0.3/libexec/../logs/hadoop-chianyu-namenode-chenxiaobian-Vostro-260s.out
localhost: starting datanode, logging to /home/chianyu/hadoop-1.0.3/libexec/../logs/hadoop-chianyu-datanode-chenxiaobian-Vostro-260s.out
localhost: starting secondarynamenode, logging to /home/chianyu/hadoop-1.0.3/libexec/../logs/hadoop-chianyu-secondarynamenode-chenxiaobian-Vostro-260s.out
starting jobtracker, logging to /home/chianyu/hadoop-1.0.3/libexec/../logs/hadoop-chianyu-jobtracker-chenxiaobian-Vostro-260s.out
localhost: starting tasktracker, logging to /home/chianyu/hadoop-1.0.3/libexec/../logs/hadoop-chianyu-tasktracker-chenxiaobian-Vostro-260s.out
日志和服务器状态查看:
NameNode - http://localhost:50070/
JobTracker - http://localhost:50030/
使用jps命令查看hadoop的五个进程
6753 JobTracker
6679 SecondaryNameNode
6983 Jps
6499 DataNode
6333 NameNode
6921 TaskTracker
说明分布式集群配置成功
4.安装hadoop集群配置:**NameNode配置(master):192.168.1.1181./hadoop-1.0.3/conf/core-site.xml:<property> <name>hadoop.tmp.dir</name> <value>/home/chianyu/hadoopdata</value></property>
<property> <name>fs.default.name</name> <value>hdfs://192.168.1.118:9000</value> //直接填写IP地址</property>
<property> <name>dfs.hosts.exclude</name> <value>excludes</value> //可以动态增加DataNode节点</property>
2./hadoop-1.0.3/conf/hdfs-site.xml:<property> <name>dfs.replication</name> <value>1</value></property>
<property> <name>dfs.name.dir</name> <value>/home/chianyu/hdfs/nameDir</value></property>
<property> <name>dfs.data.dir</name> <value>/home/chianyu/hdfs/dataDir</value></property>
3./hadoop-1.0.3/conf/mapred-site.xml:<property> <name>mapred.job.tracker</name> <value>192.168.1.118:9001</value> //直接填写IP地址</property>
4./hadoop-1.0.3/conf/masters:192.168.1.118
5./hadoop-1.0.3/conf/slaves:192.168.1.129
DataNode配置(slave):192.168.1.129
直接复制NameNode中的1.2.3文件,与文件4.5无关。
日志和服务器状态查看:
日志和服务器状态查看:
NameNode - http://host:50070/
JobTracker - http://host:50030/
使用jps命令查看hadoop的五个守护进程:
2051 NameNode master
2238 DataNode slave
2429 SecondaryNameNode master
2514 JobTracker master
2707 TaskTracker slave