hadoop-分布式环境搭建

最新推荐文章于 2024-05-06 11:14:59 发布

Cumu_

最新推荐文章于 2024-05-06 11:14:59 发布

阅读量5.2k

点赞数

分类专栏：大数据 hadoop 文章标签： hadoop 大数据分布式环境搭载

本文链接：https://blog.csdn.net/jthink_/article/details/38622297

版权

大数据同时被 2 个专栏收录

29 篇文章 1 订阅

订阅专栏

hadoop

22 篇文章 0 订阅

订阅专栏

1. 准备四台机器，操作系统都是Ubuntu 12.04 LTS，角色分配如下：

ip	hostname	role
192.168.0.101	bg01	secondraynamenode, job tracker, namenode
192.168.0.102	bg02	task tracker, datanode
192.168.0.103	bg03	task tracker, datanode
192.168.0.104	bg04	task tracker, datanode

2. putty到每台远程机器上，修改hostname

sudo vim /etc/hosts

127.0.0.1 localhost

192.168.0.101 bg01

192.168.0.102bg02

192.168.0.103bg03

192.168.0.104bg04

sudo vim /etc/hostname, 每台机上修改成相应的名字，如bg01就写bg01，以此类推

3. ssh 到每台机器上，建立相应文件夹

sudo mkdir /usr/local/bg

sudo chmod 777 -R /usr/local/bg

4. 安装jdk，详细不说，比较简单（每台机器都要装，因为每个node会有独立的jps进程）

下载tar文件解压到刚刚bg文件夹下

配置jdk的环境变量，很多方法，提其中之一

sudo vim /etc/profile

# set java environment
export PATH=/usr/local/bin:$PATH
export JAVA_HOME=/usr/local/bg/jdk1.7.0_60
export JRE_HOME=$JDK_HOME/jre
export CLASSPATH=$CLASSPATH:.:$JDK_HOME/lib:$JDK_HOME/jre/lib
export PATH=$JAVA_HOME/bin:$PATH

使其立即生效：source /etc/profile

5. 配置ssh无密码登录：

具体参见http://www.cnblogs.com/jdksummer/articles/2521550.html

确保四台机器可以两两ssh登录，其实就是把4台机生成的pub文件内容append到authorized_keys中，这样每台机都有其他机器的key

6. 安装hadoop

下载hadoop-1.2.1.tar.gz

解压到上面说的bg文件夹中

修改环境变量

sudo vim /etc/profile

# set hadoop environment
export HADOOP_HOME=/usr/local/bg/hadoop-1.2.1
export HADOOP_HOME_WARN_SUPPRESS=1
export PATH=$PATH:$HADOOP_HOME/bin

source /etc/profile

修改hadoop conf文件夹下的hadoop-env.sh

export JAVA_HOME=/usr/local/bg/jdk1.7.0_60

建立数据存储目录

sudo mkdir /usr/local/bg/storage/hadoop

修改conf下的hdfs-site.xml文件：

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<!-- Put site-specific property overrides in this file. -->



<configuration>

  <property>

    <name>dfs.name.dir</name>

    <value>/usr/local/bg/storage/hadoop/name</value>

    <description>

      Determines where on the local filesystem the DFS name node should store the name table(fsimage).  If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.

    </description>

  </property>

  <property>

    <name>dfs.data.dir</name>

    <value>/usr/local/bg/storage/hadoop/data</value>

    <description>

      Determines where on the local filesystem the DFS name node should store the name table(fsimage).  If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.

    </description>

  </property>

  <property>

    <name>dfs.http.address</name>

    <value>bg01:50070</value>

    <description>

      The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port.

    </description>

  </property>

  <property>

    <name>dfs.permissions</name>

    <value>false</value>

    <description>

      If "true", enable permission checking in HDFS. If "false", permission checking is turned off, but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories.

    </description>

  </property>

  <property>

    <name>dfs.replication</name>

    <value>3</value>

    <description>

      Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.

    </description>

  </property>

</configuration>

修改conf下core-site.xml文件:

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<!-- Put site-specific property overrides in this file. -->



<configuration>

  <property>

    <name>hadoop.tmp.dir</name>

    <value>/usr/local/bg/storage/hadoop/temp</value>

    <description>A base for other temporary directories.</description>

  </property>

  <property>

    <name>fs.default.name</name>

    <value>hdfs://bg01:9000</value>

    <description>

      The name of the default file system.  A URI whose scheme and authority determine the FileSystem implementation.  The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class.  The uri's authority is used to determine the host, port, etc. for a filesystem.

    </description>

  </property>

  <property>

    <name>fs.checkpoint.period</name>

    <value>3600</value>

    <description>

      The number of seconds between two periodic checkpoints.

    </description>

  </property>

  <property>

    <name>fs.checkpoint.size</name>

    <value>67108864</value>

    <description>

      The size of the current edit log (in bytes) that triggers a periodic checkpoint even if the fs.checkpoint.period hasn't expired.

    </description>

  </property>

</configuration>

修改conf下mapred-site.xml:

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<!-- Put site-specific property overrides in this file. -->



<configuration>

  <property>

    <name>mapred.job.tracker</name>

    <value>bg01:9001</value>

    <description>

      The host and port that the MapReduce job tracker runs  at.  If "local", then jobs are run in-process as a single map and reduce task.

    </description>

  </property> 

</configuration>

修改conf下masters成：bg01，这个其实是决定secondarynamenode的

修改conf下slavers成：

bg02

bg03

bg04

这个是决定datanode和tasktracker的

注意：四台机上的配置一样，所以你可以copy这个hadoop文件夹到其他机器相应的位置，4台机器的hadoop环境变量都需要配置，copy的话可以用scp命令：scp -r xxx-src bg02:xxx-dis

7. ssh 进入bg01

hadoop namenode -format，注意只需要格式化一次，否则你的数据将全部丢失，还会出现datanode不能启动等一系列问题，要是遇到了这些问题可以去百度google

start-all.sh

jps

应该发现secondraynamenode, job tracker, namenode这个3个进程