hadoop 2.2完全分布式搭建 --准备

最新推荐文章于 2023-06-28 01:57:08 发布

weian404

最新推荐文章于 2023-06-28 01:57:08 发布

阅读量437

点赞数

本文链接：https://blog.csdn.net/orangeann/article/details/24916945

版权

环境：centos6.4 64位

1、禁用selinux（所有节点)、关闭防火墙（所有节点必须做）

　　禁用selinux：

$vim /etc/sysconfig/selinux
将其中属性SELINUX修改为：SELINUX=disabled

　　关闭防火墙：

$service iptables stop  #暂时关闭防火墙，重启后失效
$chkconfig iptables off #设置系统启动时关闭防火墙，即永久关闭防火墙

2、修改hostname（所有节点必须做）

$vim /etc/sysconfig/network #本文件只能重启后生效
$hostname master  #修改本次的主机名，无需重启
将属性HOSTNAME修改为需要的主机名HOSTNAME=master

3、修改/etc/hosts（本文件需要分发，保持所有节点一致）

$vim /etc/hosts
文件格式为每个ip地址一行【IP地址 空格或者tab 主机名】如：
192.168.1.1  master
192.168.1.2 slave1
192.168.1.3 slave2

4、ssh（所有节点必须做）

$vim /etc/ssh/sshd_config  #修改
将#AuthorizedKeysFile      .ssh/authorized_keys一行的#注释去掉
$service sshd restart #重启ssh服务
$ssh-keygen #生成ssh所需的公钥和私钥对，若之前未运行过此命令，连续多次回车即可；若运行过则如下：

　　　　Generating public/private rsa key pair.

　　　　Enter file in which to save the key (/root/.ssh/id_rsa): #回车

　　　　/root/.ssh/id_rsa already exists.#若之前运行过此命令则会出现词句
　　　　Overwrite (y/n)?#输入y，回车

　　　　Enter passphrase (empty for no passphrase): #回车

　　　　Enter same passphrase again: #回车

　　　　结果如：

　　　　The key's randomart image is:

　　　　+--[ RSA 2048]----+

　　　　| o. |

　　　　| . + . |

　　　　| E * * |

　　　　| . B = + |

　　　　| S = O . |

　　　　| . * B |

　　　　| * |

　　　　| |

　　　　+-----------------+

$cd ~/.ssh/

$ll

总用量 16

-rw-r--r-- 1 root root 395 4月 27 05:36 authorized_keys #免密码登陆的公钥文件集合

-rw------- 1 root root 1675 4月 28 20:43 id_rsa #私钥

-rw-r--r-- 1 root root 395 4月 28 20:43 id_rsa.pub #公钥

-rw-r--r-- 1 root root 1579 4月 27 05:37 known_hosts #已知机器的指纹；有时重装机器后指纹改变了，导致ssh时报错，需到这里把原先的指纹删掉！！！

$cat id_rsa >> authorized_keys

#将其他机器的id_rsa.pub用scp命令传到本节点，然后追加到authorized_keys文件，如：

#登陆到slave1机器

运行ssh-keygen，生成公钥私钥对后

$scp ~/.ssh/id_rsa.pub root@master:~/.ssh/id_rsa.slave1.pub

回到master主机：

$cat ~/.ssh/id_rsa.slave1.pub ~/.ssh/authorized_keys

其它所有机器重复执行此动作，将所有机器的id_rsa.pub都追加到master的authorized_keys文件。

5、解压Hadoop

$tar -xzvf  hadoop-2.2.0.tar.gz
$cd hadoop-2.2.0/etc/hadoop #进入Hadoop的配置文件目录

6、修改hadoop-env.sh（本文件需要分发，保持所有节点一致）

$vim hadoop-env.sh
修改：export JAVA_HOME=${JAVA_HOME}指向实际的jdk目录
如：export JAVA_HOME=/usr/java/jdk1.6.0_31

7、修改core-site.xml（本文件需要分发，保持所有节点一致）

    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:8020/</value>
        <description>The name of the default file system. A URI whose
            scheme and authority determine the FileSystem implementation. The
            uri's scheme determines the config property (fs.SCHEME.impl) naming
            the FileSystem implementation class. The uri's authority is used to
            determine the host, port, etc. for a filesystem.
        </description>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>4096</value>
        <description>The size of buffer for use in sequence files.
            The size of this buffer should probably be a multiple of hardware
            page size (4096 on Intel x86), and it determines how much data is
            buffered during read and write operations.
        </description>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoopneed/</value>
        <description>A base for other temporary directories.</description>
    </property>

8、 修改hdfs-site.xml（本文件需要分发，保持所有节点一致）

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file://${hadoop.tmp.dir}/dfs/name</value>
        <description>Determines where on the local filesystem the DFS name
            node
            should store the name table(fsimage). If this is a comma-delimited list
            of directories then the name table is replicated in all of the
            directories, for redundancy.
        </description>
    </property>
    <property>
        <name>dfs.namenode.hosts</name>
        <value></value>
        <description>If necessary, use these files to control the list of
            allowable datanodes. </description>
    </property>
    <property>
        <name>dfs.namenode.hosts.exclude</name>
        <value></value>
        <description>If necessary, use these files to control the list of
            allowable datanodes. </description>
    </property>
    <property>
        <name>dfs.namenode.handler.count</name>
        <value>100</value>
        <description>The number of server threads for the namenode.
        </description>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file://${hadoop.tmp.dir}/dfs/data</value>
        <description>Determines where on the local filesystem an DFS data node
            should store its blocks. If this is a comma-delimited
            list of directories, then data will be stored in all named
            directories, typically on different devices.
            Directories that do not exist are ignored.
        </description>
    </property>

9、修改yarn-site.xml（本文件需要分发，保持所有节点一致）

    <!-- Site specific YARN configuration properties -->
    <property>
        <description>Are acls enabled.</description>
        <name>yarn.acl.enable</name>
        <value>false</value>
    </property>
    <property>
        <description>ACL of who can be admin of the YARN cluster.
        </description>
        <name>yarn.admin.acl</name>
        <value>*</value>
    </property>
    <property>
        <description>Whether to enable log aggregation</description>
        <name>yarn.log-aggregation-enable</name>
        <value>false</value>
    </property>


    <property>
        <description>The hostname of the RM.</description>
        <name>yarn.resourcemanager.hostname</name>
        <value>0.0.0.0</value>
    </property>
    <property>
        <description>The address of the applications manager interface in the
            RM.</description>
        <name>yarn.resourcemanager.address</name>
        <value>${yarn.resourcemanager.hostname}:8032</value>
    </property>
    <property>
        <description>The address of the scheduler interface.</description>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>${yarn.resourcemanager.hostname}:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>${yarn.resourcemanager.hostname}:8031</value>
    </property>
    <property>
        <description>The address of the RM admin interface.</description>
        <name>yarn.resourcemanager.admin.address</name>
        <value>${yarn.resourcemanager.hostname}:8033</value>
    </property>
    <property>
        <description>The http address of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>${yarn.resourcemanager.hostname}:8088</value>
    </property>
    <property>
        <description>The class to use as the resource scheduler.</description>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
        </value>
    </property>
    <property>
        <description>The minimum allocation for every container request at the
            RM,
            in MBs. Memory requests lower than this won't take effect,
            and the specified value will get allocated at minimum.
        </description>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>1024</value>
    </property>
    <property>
        <description>The maximum allocation for every container request at the
            RM,
            in MBs. Memory requests higher than this won't take effect,
            and will get capped to this value.
        </description>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>8192</value>
    </property>
    <property>
        <description>Path to file with nodes to include.</description>
        <name>yarn.resourcemanager.nodes.include-path</name>
        <value></value>
    </property>
    <property>
        <description>Path to file with nodes to exclude.</description>
        <name>yarn.resourcemanager.nodes.exclude-path</name>
        <value></value>
    </property>


    <property>
        <description>Amount of physical memory, in MB, that can be allocated
            for containers.
        </description>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>8192</value>
    </property>
    <property>
        <description>Ratio between virtual memory to physical memory when
            setting memory limits for containers. Container allocations are
            expressed in terms of physical memory, and virtual memory usage
            is allowed to exceed this allocation by this ratio.
        </description>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>2.1</value>
    </property>
    <property>
        <description>List of directories to store localized files in. An
            application's localized file directory will be found in:
            ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
            Individual containers' work directories, called container_${contid},
            will
            be subdirectories of this.
        </description>
        <name>yarn.nodemanager.local-dirs</name>
        <value>${hadoop.tmp.dir}/nm-local-dir</value>
    </property>
    <property>
        <description>
            Where to store container logs. An application's localized log directory
            will be found in ${yarn.nodemanager.log-dirs}/application_${appid}.
            Individual containers' log directories will be below this, in
            directories
            named container_{$contid}. Each container directory will contain the files
            stderr, stdin, and syslog generated by that container.
        </description>
        <name>yarn.nodemanager.log-dirs</name>
        <value>${yarn.log.dir}/userlogs</value>
    </property>
    <property>
        <description>Time in seconds to retain user logs. Only applicable if
            log aggregation is disabled
        </description>
        <name>yarn.nodemanager.log.retain-seconds</name>
        <value>10800</value>
    </property>
    <property>
        <description>Where to aggregate logs to.</description>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/tmp/logs</value>
    </property>
    <property>
        <description>The remote log dir will be created at
            {yarn.nodemanager.remote-app-log-dir}/${user}/{thisParam}
        </description>
        <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
        <value>logs</value>
    </property>
    <property>
        <description>the valid service name should only contain a-zA-Z0-9_ and
            can not start with numbers</description>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        <!--<value>mapreduce_shuffle</value> -->
    </property>




    <property>
        <description>How long to keep aggregation logs before deleting them.
            -1 disables.
            Be careful set this too small and you will spam the name node.
        </description>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>-1</value>
    </property>
    <property>
        <description>How long to wait between aggregated log retention checks.
            If set to 0 or a negative value then the value is computed as
            one-tenth
            of the aggregated log retention time. Be careful set this too small and
            you will spam the name node.
        </description>
        <name>yarn.log-aggregation.retain-check-interval-seconds</name>
        <value>-1</value>
    </property>

10、修改mapred-site.xml（本文件需要分发，保持所有节点一致）

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        <description>The runtime framework for executing MapReduce jobs.
            Can be one of local, classic or yarn.
        </description>
    </property>
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>1536</value>
        <description>Larger resource limit for maps.</description>
    </property>
    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx1024M</value>
        <description>Larger heap-size for child jvms of maps.</description>
    </property>
    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>3072</value>
        <description>Larger resource limit for reduces.</description>
    </property>
    <property>
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xmx2560M</value>
        <description>Larger heap-size for child jvms of reduces.</description>
    </property>
    <property>
        <name>mapreduce.task.io.sort.mb</name>
        <value>100</value>
        <description>The total amount of buffer memory to use while sorting
            files, in megabytes. By default, gives each merge stream 1MB, which
            should minimize seeks.
        </description>
    </property>
    <property>
        <name>mapreduce.task.io.sort.factor</name>
        <value>10</value>
        <description>The number of streams to merge at once while sorting
            files. This determines the number of open file handles.
        </description>
    </property>
    <property>
        <name>mapreduce.reduce.shuffle.parallelcopies</name>
        <value>5</value>
        <description>The default number of parallel transfers run by reduce
            during the copy(shuffle) phase.
        </description>
    </property>


    <!-- jobhistory properties -->

    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>0.0.0.0:10020</value>
        <description>MapReduce JobHistory Server IPC host:port</description>
    </property>

    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>0.0.0.0:19888</value>
        <description>MapReduce JobHistory Server Web UI host:port
        </description>
    </property>
    <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate
        </value>
        <description></description>
    </property>
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>${yarn.app.mapreduce.am.staging-dir}/history/done</value>
        <description></description>
    </property>

11、启动hdfs

12、启动yarn