hadoop-分布式环境搭建

1. 准备四台机器,操作系统都是Ubuntu 12.04 LTS,角色分配如下:

iphostnamerole
192.168.0.101bg01secondraynamenode, job tracker, namenode
192.168.0.102bg02task tracker, datanode
192.168.0.103bg03task tracker, datanode
192.168.0.104bg04task tracker, datanode
2. putty到每台远程机器上,修改hostname

    sudo vim /etc/hosts

127.0.0.1    localhost

192.168.0.101    bg01

192.168.0.102bg02

192.168.0.103bg03

192.168.0.104bg04

sudo vim /etc/hostname, 每台机上修改成相应的名字,如bg01就写bg01,以此类推

3. ssh 到每台机器上,建立相应文件夹

sudo mkdir /usr/local/bg

sudo chmod 777 -R /usr/local/bg

4. 安装jdk,详细不说,比较简单(每台机器都要装,因为每个node会有独立的jps进程)

下载tar文件解压到刚刚bg文件夹下

配置jdk的环境变量,很多方法,提其中之一

sudo vim /etc/profile

# set java environment
export PATH=/usr/local/bin:$PATH
export JAVA_HOME=/usr/local/bg/jdk1.7.0_60
export JRE_HOME=$JDK_HOME/jre
export CLASSPATH=$CLASSPATH:.:$JDK_HOME/lib:$JDK_HOME/jre/lib
export PATH=$JAVA_HOME/bin:$PATH

使其立即生效:source /etc/profile 

5. 配置ssh无密码登录:

具体参见http://www.cnblogs.com/jdksummer/articles/2521550.html

确保四台机器可以两两ssh登录,其实就是把4台机生成的pub文件内容append到authorized_keys中,这样每台机都有其他机器的key

6. 安装hadoop

下载hadoop-1.2.1.tar.gz

解压到上面说的bg文件夹中

修改环境变量

sudo vim /etc/profile

# set hadoop environment
export HADOOP_HOME=/usr/local/bg/hadoop-1.2.1
export HADOOP_HOME_WARN_SUPPRESS=1
export PATH=$PATH:$HADOOP_HOME/bin

source /etc/profile

修改hadoop conf文件夹下的hadoop-env.sh

export JAVA_HOME=/usr/local/bg/jdk1.7.0_60

建立数据存储目录

sudo mkdir /usr/local/bg/storage/hadoop

修改conf下的hdfs-site.xml文件:

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<!-- Put site-specific property overrides in this file. -->



<configuration>

  <property>

    <name>dfs.name.dir</name>

    <value>/usr/local/bg/storage/hadoop/name</value>

    <description>

      Determines where on the local filesystem the DFS name node should store the name table(fsimage).  If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.

    </description>

  </property>

  <property>

    <name>dfs.data.dir</name>

    <value>/usr/local/bg/storage/hadoop/data</value>

    <description>

      Determines where on the local filesystem the DFS name node should store the name table(fsimage).  If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.

    </description>

  </property>

  <property>

    <name>dfs.http.address</name>

    <value>bg01:50070</value>

    <description>

      The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port.

    </description>

  </property>

  <property>

    <name>dfs.permissions</name>

    <value>false</value>

    <description>

      If "true", enable permission checking in HDFS. If "false", permission checking is turned off, but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories.

    </description>

  </property>

  <property>

    <name>dfs.replication</name>

    <value>3</value>

    <description>

      Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.

    </description>

  </property>

</configuration>
修改conf下core-site.xml文件:

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<!-- Put site-specific property overrides in this file. -->



<configuration>

  <property>

    <name>hadoop.tmp.dir</name>

    <value>/usr/local/bg/storage/hadoop/temp</value>

    <description>A base for other temporary directories.</description>

  </property>

  <property>

    <name>fs.default.name</name>

    <value>hdfs://bg01:9000</value>

    <description>

      The name of the default file system.  A URI whose scheme and authority determine the FileSystem implementation.  The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class.  The uri's authority is used to determine the host, port, etc. for a filesystem.

    </description>

  </property>

  <property>

    <name>fs.checkpoint.period</name>

    <value>3600</value>

    <description>

      The number of seconds between two periodic checkpoints.

    </description>

  </property>

  <property>

    <name>fs.checkpoint.size</name>

    <value>67108864</value>

    <description>

      The size of the current edit log (in bytes) that triggers a periodic checkpoint even if the fs.checkpoint.period hasn't expired.

    </description>

  </property>

</configuration>
修改conf下mapred-site.xml:

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<!-- Put site-specific property overrides in this file. -->



<configuration>

  <property>

    <name>mapred.job.tracker</name>

    <value>bg01:9001</value>

    <description>

      The host and port that the MapReduce job tracker runs  at.  If "local", then jobs are run in-process as a single map and reduce task.

    </description>

  </property> 

</configuration>
修改conf下masters成:bg01,这个其实是决定secondarynamenode的

修改conf下slavers成:

bg02

bg03

bg04

这个是决定datanode和tasktracker的

注意:四台机上的配置一样,所以你可以copy这个hadoop文件夹到其他机器相应的位置,4台机器的hadoop环境变量都需要配置,copy的话可以用scp命令:scp -r xxx-src bg02:xxx-dis

7. ssh 进入bg01

hadoop namenode -format,注意只需要格式化一次,否则你的数据将全部丢失,还会出现datanode不能启动等一系列问题,要是遇到了这些问题可以去百度google

start-all.sh

jps

应该发现secondraynamenode, job tracker, namenode这个3个进程

ssh进入其他机器,jps应该只有task tracker和datanode这个2个进程(如果缺少了估计就是配置不对)

8. 可以访问www.bg01:50070查看live node,www.bg01:50030

至此安装成功

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值