ubuntu 12.04下安装并运行hadoop

一.概述

介绍Hadoop的安装之前,先介绍一下Hadoop对各个节点的角色定义。

Hadoop分别从三个角度将主机划分为两种角色。第一,最基本的划分为Master和Slave,即主人与奴隶;第二,从HDFS的角度,将主机划分为NameNode和DataNode(在分布式文件系统中,目录的管理很重要,管理目录相当于主人,而NameNode就是目录管理者);第三,从MapReduce的角度,将主机划分为JobTracker和TaskTracker(一个Job经常被划分为多个Task,从这个角度不难理解它们之间的关系)。

Hadoop有官方发行版与cloudera版,其中cloudera版是Hadoop的商用版本,这里先介绍Hadoop官方发行版的安装方法。

Hadoop有三种运行方式:单机模式、伪分布式与完全分布式。乍看之下,前两种方式并不能体现云计算的优势,但是它们便于程序的测试与调试,所以还是很有意义的。

你可以在以下地址获得Hadoop的官方发行版:http://www.apache.org/dyn/closer.cgi/Hadoop/core/

下载hadoop-2.5.1.tar.gz并将其解压,本书后续都默认将Hadoop解压到/home/tang/目录下。

(1)单机模式配置方式

安装单机模式的Hadoop无须配置,在这种方式下,Hadoop被认为是一个单独的Java进程,这种方式经常用来调试。

(2)伪分布式Hadoop配置

可以把伪分布式的Hadoop看做只有一个节点的集群,在这个集群中,这个节点既是Master,也是Slave;既是NameNode,也是DataNode;既是JobTracker,也是TaskTracker。

二.安装Hadoop

1.从Apache Hadoop 发布页面(http://hadoop.apache.org/coases.html)下载稳定的发布包并解压。

$ tar xzvf hadoop-2.5.1.tar.gz

         2.安装Java并配置Java环境,可以参看 Ubuntu 12.04安装JDK

3.编辑.bashrc文件,创建一个指向Hadoop安装目录的环境变量,并在PATH中添加Hadoop执行路径。

export HADOOP_INSTALL=/home/tang/hadoop-2.5.1
export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin
source .bashrc

4.执行以下指令测试。

$ hadoop version // 若正确显示出Hadoop版本,则安装成功;

三.伪分布式的配置

伪分布式的配置过程也很简单,只需要进入修改etc/hadoop文件夹,修改配置文件。

1. 编辑Hadoop-env.sh,指定JDK的安装位置。

export JAVA_HOME=/usr/lib/jvm/jdk 

2. 编辑core-site.xml,配置HDFS的地址和端口号,以及Hadoop的tmp文件夹。

<configuration>
        <property>
             <name>hadoop.tmp.dir</name>
             <value>file:/home/tang/hadoop-2.5.1/tmp</value>
             <description>Abase for other temporary directories.</description>
        </property>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://localhost:9000</value>
        </property>
</configuration>

3. 编辑hdfs-site.xml,配置HDFS的地址和端口号,以及hdfs的tmp文件夹。

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
        <property>
             <name>dfs.namenode.name.dir</name>
             <value>file:/home/tang/hadoop-2.5.1/tmp/dfs/name</value>
        </property>
        <property>
             <name>dfs.datanode.data.dir</name>
             <value>file:/home/tang/hadoop-2.5.1/tmp/dfs/data</value>
        </property>
</configuration>

4. 编辑mapred-site.xml.template,配置JobTracker的地址和端口。

<configuration> 
     <property> 
         <name>mapred.job.tracker</name> 
         <value>localhost:9001</value> 
     </property> 
</configuration>

5. 配置SSH:在伪分布式模式下工作时必须启动守护进程,而启动守护进程的前提是已经成功安装SSH(需要SSH到各个主机并启动一个守护进程)

$ sudo apt-get install ssh
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ ssh localhost;如果成功,则无需键入密码。

三.启动Hadoop

接下来,在启动Hadoop前,需要格式化Hadoop的文件系统HDFS。进入Hadoop文件夹,输入命令:

$ hadoop namenode –format

启动HDFS。

$ start-dfs.sh

验证HDFS是否启动成功:进入http://localhost:50070。


  四.错误以及解决方案

1. 运行start-dfs.sh错误

sed: -e expression #1, char 6: unknown option to `s'
have: ssh: Could not resolve hostname have: Name or service not known
loaded: ssh: Could not resolve hostname loaded: Name or service not known
VM: ssh: Could not resolve hostname VM: Name or service not known
library: ssh: Could not resolve hostname library: Name or service not known
which: ssh: Could not resolve hostname which: Name or service not known
You: ssh: Could not resolve hostname You: Name or service not known
stack: ssh: Could not resolve hostname stack: Name or service not known
might: ssh: Could not resolve hostname might: Name or service not known
have: ssh: Could not resolve hostname have: Name or service not known
guard.: ssh: Could not resolve hostname guard.: Name or service not known
Java: ssh: Could not resolve hostname Java: Name or service not known
disabled: ssh: Could not resolve hostname disabled: Name or service not known
The: ssh: Could not resolve hostname The: Name or service not known
will: ssh: Could not resolve hostname will: Name or service not known
VM: ssh: Could not resolve hostname VM: Name or service not known
try: ssh: Could not resolve hostname try: Name or service not known
the: ssh: Could not resolve hostname the: Name or service not known
fix: ssh: Could not resolve hostname fix: Name or service not known
stack: ssh: Could not resolve hostname stack: Name or service not known
Server: ssh: Could not resolve hostname Server: Name or service not known
guard: ssh: Could not resolve hostname guard: Name or service not known
to: ssh: Could not resolve hostname to: Name or service not known
64-Bit: ssh: Could not resolve hostname 64-Bit: Name or service not known
you: ssh: Could not resolve hostname you: Name or service not known
highly: ssh: Could not resolve hostname highly: Name or service not known
-c: Unknown cipher type 'cd'
fix: ssh: Could not resolve hostname fix: Name or service not known
recommended: ssh: Could not resolve hostname recommended: Name or service not known
that: ssh: Could not resolve hostname that: Name or service not known
now.: ssh: Could not resolve hostname now.: Name or service not known
library: ssh: Could not resolve hostname library: Name or service not known
the: ssh: Could not resolve hostname the: Name or service not known
link: ssh: Could not resolve hostname link: Name or service not known
with: ssh: Could not resolve hostname with: Name or service not known
or: ssh: Could not resolve hostname or: Name or service not known
noexecstack'.: ssh: Could not resolve hostname noexecstack'.: Name or service not known
with: ssh: Could not resolve hostname with: Name or service not known
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
RSA key fingerprint is fa:8e:e6:aa:3f:f9:f0:0b:48:6e:40:ca:c2:f1:02:21.
Are you sure you want to continue connecting (yes/no)? it: ssh: connect to host it port 22: Connection refused
HotSpot(TM): ssh: Could not resolve hostname HotSpot(TM): Name or service not known
warning:: ssh: Could not resolve hostname warning:: Name or service not known

解决办法:

输出native环境变量

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native

参考:http://www.geedoo.info/tag/hadoop



  2. namenode没有启动。

root@master:/usr/local/hadoop/hadoop-2.6.0# sbin/start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-namenode-master.out
localhost: starting datanode, logging to /usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-datanode-master.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is dc:f6:ec:22:ca:5c:5d:b2:e1:cb:0f:77:ce:f9:57:07.
Are you sure you want to continue connecting (yes/no)? ^C0.0.0.0: Host key verification failed.
  解决方法:关闭防火墙

$ sudo ufw disable


 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值