hadoop 2.2完全分布式搭建 --准备

环境:centos6.4 64位

1、禁用selinux(所有节点)、关闭防火墙(所有节点必须做

  禁用selinux:

$vim /etc/sysconfig/selinux
将其中属性SELINUX修改为:SELINUX=disabled

  关闭防火墙:

$service iptables stop  #暂时关闭防火墙,重启后失效
$chkconfig iptables off #设置系统启动时关闭防火墙,即永久关闭防火墙

2、修改hostname(所有节点必须做)

$vim /etc/sysconfig/network #本文件只能重启后生效
$hostname master  #修改本次的主机名,无需重启
将属性HOSTNAME修改为需要的主机名HOSTNAME=master

3、修改/etc/hosts(本文件需要分发,保持所有节点一致)

$vim /etc/hosts
文件格式为每个ip地址一行【IP地址 空格或者tab 主机名】如:
192.168.1.1  master
192.168.1.2 slave1
192.168.1.3 slave2

4、ssh(所有节点必须做)

$vim /etc/ssh/sshd_config  #修改
将#AuthorizedKeysFile      .ssh/authorized_keys一行的#注释去掉
$service sshd restart #重启ssh服务
$ssh-keygen #生成ssh所需的公钥和私钥对,若之前未运行过此命令,连续多次回车即可;若运行过则如下:

    Generating public/private rsa key pair.

    Enter file in which to save the key (/root/.ssh/id_rsa): #回车

    /root/.ssh/id_rsa already exists.#若之前运行过此命令则会出现词句
    Overwrite (y/n)?#输入y,回车

    Enter passphrase (empty for no passphrase): #回车

    Enter same passphrase again: #回车

    结果如:

    The key's randomart image is:

    +--[ RSA 2048]----+

    |          o.     |

    |         . + .   |

    |        E * *    |

    |       . B = +   |

    |        S = O .  |

    |         . * B   |

    |            *    |

    |                 |

    |                 |

    +-----------------+

$cd ~/.ssh/

$ll

总用量 16

-rw-r--r-- 1 root root  395 4月  27 05:36 authorized_keys  #免密码登陆的公钥文件集合

-rw------- 1 root root 1675 4月  28 20:43 id_rsa   #私钥

-rw-r--r-- 1 root root  395 4月  28 20:43 id_rsa.pub   #公钥

-rw-r--r-- 1 root root 1579 4月  27 05:37 known_hosts   #已知机器的指纹;有时重装机器后指纹改变了,导致ssh时报错,需到这里把原先的指纹删掉!!!

$cat id_rsa  >> authorized_keys

#将其他机器的id_rsa.pub用scp命令传到本节点,然后追加到authorized_keys文件,如:

#登陆到slave1机器

运行ssh-keygen,生成公钥私钥对后

$scp  ~/.ssh/id_rsa.pub root@master:~/.ssh/id_rsa.slave1.pub

回到master主机:

$cat ~/.ssh/id_rsa.slave1.pub  ~/.ssh/authorized_keys

 其它所有机器重复执行此动作,将所有机器的id_rsa.pub都追加到master的authorized_keys文件。

5、解压Hadoop

$tar -xzvf  hadoop-2.2.0.tar.gz
$cd hadoop-2.2.0/etc/hadoop #进入Hadoop的配置文件目录

6、修改hadoop-env.sh(本文件需要分发,保持所有节点一致)

$vim hadoop-env.sh
修改:
export JAVA_HOME=${JAVA_HOME}指向实际的jdk目录
如:export JAVA_HOME=/usr/java/jdk1.6.0_31

7、修改core-site.xml(本文件需要分发,保持所有节点一致)

    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:8020/</value>
        <description>The name of the default file system. A URI whose
            scheme and authority determine the FileSystem implementation. The
            uri's scheme determines the config property (fs.SCHEME.impl) naming
            the FileSystem implementation class. The uri's authority is used to
            determine the host, port, etc. for a filesystem.
        </description>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>4096</value>
        <description>The size of buffer for use in sequence files.
            The size of this buffer should probably be a multiple of hardware
            page size (4096 on Intel x86), and it determines how much data is
            buffered during read and write operations.
        </description>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoopneed/</value>
        <description>A base for other temporary directories.</description>
    </property>
8、 修改hdfs-site.xml(本文件需要分发,保持所有节点一致)
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file://${hadoop.tmp.dir}/dfs/name</value>
        <description>Determines where on the local filesystem the DFS name
            node
            should store the name table(fsimage). If this is a comma-delimited list
            of directories then the name table is replicated in all of the
            directories, for redundancy.
        </description>
    </property>
    <property>
        <name>dfs.namenode.hosts</name>
        <value></value>
        <description>If necessary, use these files to control the list of
            allowable datanodes. </description>
    </property>
    <property>
        <name>dfs.namenode.hosts.exclude</name>
        <value></value>
        <description>If necessary, use these files to control the list of
            allowable datanodes. </description>
    </property>
    <property>
        <name>dfs.namenode.handler.count</name>
        <value>100</value>
        <description>The number of server threads for the namenode.
        </description>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file://${hadoop.tmp.dir}/dfs/data</value>
        <description>Determines where on the local filesystem an DFS data node
            should store its blocks. If this is a comma-delimited
            list of directories, then data will be stored in all named
            directories, typically on different devices.
            Directories that do not exist are ignored.
        </description>
    </property>

9、修改yarn-site.xml(本文件需要分发,保持所有节点一致)

    <!-- Site specific YARN configuration properties -->
    <property>
        <description>Are acls enabled.</description>
        <name>yarn.acl.enable</name>
        <value>false</value>
    </property>
    <property>
        <description>ACL of who can be admin of the YARN cluster.
        </description>
        <name>yarn.admin.acl</name>
        <value>*</value>
    </property>
    <property>
        <description>Whether to enable log aggregation</description>
        <name>yarn.log-aggregation-enable</name>
        <value>false</value>
    </property>


    <property>
        <description>The hostname of the RM.</description>
        <name>yarn.resourcemanager.hostname</name>
        <value>0.0.0.0</value>
    </property>
    <property>
        <description>The address of the applications manager interface in the
            RM.</description>
        <name>yarn.resourcemanager.address</name>
        <value>${yarn.resourcemanager.hostname}:8032</value>
    </property>
    <property>
        <description>The address of the scheduler interface.</description>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>${yarn.resourcemanager.hostname}:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>${yarn.resourcemanager.hostname}:8031</value>
    </property>
    <property>
        <description>The address of the RM admin interface.</description>
        <name>yarn.resourcemanager.admin.address</name>
        <value>${yarn.resourcemanager.hostname}:8033</value>
    </property>
    <property>
        <description>The http address of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>${yarn.resourcemanager.hostname}:8088</value>
    </property>
    <property>
        <description>The class to use as the resource scheduler.</description>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
        </value>
    </property>
    <property>
        <description>The minimum allocation for every container request at the
            RM,
            in MBs. Memory requests lower than this won't take effect,
            and the specified value will get allocated at minimum.
        </description>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>1024</value>
    </property>
    <property>
        <description>The maximum allocation for every container request at the
            RM,
            in MBs. Memory requests higher than this won't take effect,
            and will get capped to this value.
        </description>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>8192</value>
    </property>
    <property>
        <description>Path to file with nodes to include.</description>
        <name>yarn.resourcemanager.nodes.include-path</name>
        <value></value>
    </property>
    <property>
        <description>Path to file with nodes to exclude.</description>
        <name>yarn.resourcemanager.nodes.exclude-path</name>
        <value></value>
    </property>


    <property>
        <description>Amount of physical memory, in MB, that can be allocated
            for containers.
        </description>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>8192</value>
    </property>
    <property>
        <description>Ratio between virtual memory to physical memory when
            setting memory limits for containers. Container allocations are
            expressed in terms of physical memory, and virtual memory usage
            is allowed to exceed this allocation by this ratio.
        </description>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>2.1</value>
    </property>
    <property>
        <description>List of directories to store localized files in. An
            application's localized file directory will be found in:
            ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
            Individual containers' work directories, called container_${contid},
            will
            be subdirectories of this.
        </description>
        <name>yarn.nodemanager.local-dirs</name>
        <value>${hadoop.tmp.dir}/nm-local-dir</value>
    </property>
    <property>
        <description>
            Where to store container logs. An application's localized log directory
            will be found in ${yarn.nodemanager.log-dirs}/application_${appid}.
            Individual containers' log directories will be below this, in
            directories
            named container_{$contid}. Each container directory will contain the files
            stderr, stdin, and syslog generated by that container.
        </description>
        <name>yarn.nodemanager.log-dirs</name>
        <value>${yarn.log.dir}/userlogs</value>
    </property>
    <property>
        <description>Time in seconds to retain user logs. Only applicable if
            log aggregation is disabled
        </description>
        <name>yarn.nodemanager.log.retain-seconds</name>
        <value>10800</value>
    </property>
    <property>
        <description>Where to aggregate logs to.</description>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/tmp/logs</value>
    </property>
    <property>
        <description>The remote log dir will be created at
            {yarn.nodemanager.remote-app-log-dir}/${user}/{thisParam}
        </description>
        <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
        <value>logs</value>
    </property>
    <property>
        <description>the valid service name should only contain a-zA-Z0-9_ and
            can not start with numbers</description>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        <!--<value>mapreduce_shuffle</value> -->
    </property>




    <property>
        <description>How long to keep aggregation logs before deleting them.
            -1 disables.
            Be careful set this too small and you will spam the name node.
        </description>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>-1</value>
    </property>
    <property>
        <description>How long to wait between aggregated log retention checks.
            If set to 0 or a negative value then the value is computed as
            one-tenth
            of the aggregated log retention time. Be careful set this too small and
            you will spam the name node.
        </description>
        <name>yarn.log-aggregation.retain-check-interval-seconds</name>
        <value>-1</value>
    </property>

 

10、修改mapred-site.xml(本文件需要分发,保持所有节点一致)

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        <description>The runtime framework for executing MapReduce jobs.
            Can be one of local, classic or yarn.
        </description>
    </property>
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>1536</value>
        <description>Larger resource limit for maps.</description>
    </property>
    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx1024M</value>
        <description>Larger heap-size for child jvms of maps.</description>
    </property>
    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>3072</value>
        <description>Larger resource limit for reduces.</description>
    </property>
    <property>
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xmx2560M</value>
        <description>Larger heap-size for child jvms of reduces.</description>
    </property>
    <property>
        <name>mapreduce.task.io.sort.mb</name>
        <value>100</value>
        <description>The total amount of buffer memory to use while sorting
            files, in megabytes. By default, gives each merge stream 1MB, which
            should minimize seeks.
        </description>
    </property>
    <property>
        <name>mapreduce.task.io.sort.factor</name>
        <value>10</value>
        <description>The number of streams to merge at once while sorting
            files. This determines the number of open file handles.
        </description>
    </property>
    <property>
        <name>mapreduce.reduce.shuffle.parallelcopies</name>
        <value>5</value>
        <description>The default number of parallel transfers run by reduce
            during the copy(shuffle) phase.
        </description>
    </property>


    <!-- jobhistory properties -->

    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>0.0.0.0:10020</value>
        <description>MapReduce JobHistory Server IPC host:port</description>
    </property>

    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>0.0.0.0:19888</value>
        <description>MapReduce JobHistory Server Web UI host:port
        </description>
    </property>
    <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate
        </value>
        <description></description>
    </property>
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>${yarn.app.mapreduce.am.staging-dir}/history/done</value>
        <description></description>
    </property>

 

11、启动hdfs

12、启动yarn

 

 

Python网络爬虫与推荐算法新闻推荐平台:网络爬虫:通过Python实现新浪新闻的爬取,可爬取新闻页面上的标题、文本、图片、视频链接(保留排版) 推荐算法:权重衰减+标签推荐+区域推荐+热点推荐.zip项目工程资源经过严格测试可直接运行成功且功能正常的情况才上传,可轻松复刻,拿到资料包后可轻松复现出一样的项目,本人系统开发经验充足(全领域),有任何使用问题欢迎随时与我联系,我会及时为您解惑,提供帮助。 【资源内容】:包含完整源码+工程文件+说明(如有)等。答辩评审平均分达到96分,放心下载使用!可轻松复现,设计报告也可借鉴此项目,该资源内项目代码都经过测试运行成功,功能ok的情况下才上传的。 【提供帮助】:有任何使用问题欢迎随时与我联系,我会及时解答解惑,提供帮助 【附带帮助】:若还需要相关开发工具、学习资料等,我会提供帮助,提供资料,鼓励学习进步 【项目价值】:可用在相关项目设计中,皆可应用在项目、毕业设计、课程设计、期末/期中/大作业、工程实训、大创等学科竞赛比赛、初期项目立项、学习/练手等方面,可借鉴此优质项目实现复刻,设计报告也可借鉴此项目,也可基于此项目来扩展开发出更多功能 下载后请首先打开README文件(如有),项目工程可直接复现复刻,如果基础还行,也可在此程序基础上进行修改,以实现其它功能。供开源学习/技术交流/学习参考,勿用于商业用途。质量优质,放心下载使用。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值