Hadoop学习系列（一）

最新推荐文章于 2024-07-20 17:52:45 发布

weixin_33892359

最新推荐文章于 2024-07-20 17:52:45 发布

阅读量60

点赞数

原文链接：https://my.oschina.net/odetteisgorgeous/blog/3055648

版权

2019独角兽企业重金招聘Python工程师标准>>>

ENVIRONMENT PREPARING

standalone installation

配置环境变量打开/etc/profile 加上

export HADOOP_HOME=/opt/hadoop-3.1.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

source /etc/profile

测试:

官网standalone example：

  $ mkdir input
  $ cp etc/hadoop/*.xml input
  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar grep input output 'dfs[a-z.]+'
  $ cat output/*

MapReduce用法

在天气预报中，input数据经过map方法处理后，得到年份和温度的output,reduce方法以map生成的output作为input，返回最终的output 多个map可以对应0个，1个，多个reduce

编写start.sh

hdfs

为什么hdfs的block要设置成默认128M？

保证数据在block的查找时间10ms是传输时间的10%（通常传输速度100M/s左右）

hdfs使用block的好处

数据可以分布到集群中的所有机器上
简化数据存储 block只存储数据，不存储文件metadata信息（元数据存在namenode上）
便于备份增加容错和可用性

hdfs高可用

使用standby namenode作为namenode节点挂掉后的备用namenode，保证hdfs服务正常

hdfs中文件读解析

hdfs写入文件

YARN

yarn运行应用和容器类似，起到分配cpu、内存等资源的作用

yarn verus mapreduce 1

mapreduce 1的设计和lts一样！

yarn的稳定性更好、可用性更强、资源利用率更好、多租户

yarn不同任务分配模式集群资源利用率对比

FIFO Scheduler

Capacity Scheduler

Example 4-1. A basic configuration file for the Capacity Scheduler
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>prod,dev</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.dev.queues</name>
<value>eng,science</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.prod.capacity</name>
<value>40</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.dev.capacity</name>
<value>60</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.dev.maximum-capacity</name>
www.it-ebooks.info
 <value>75</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.dev.eng.capacity</name>
<value>50</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.dev.science.capacity</name>
<value>50</value>
</property>
</configuration>

Fair Scheduler

Example 4-2. An allocation file for the Fair Scheduler
<?xml version="1.0"?>
<allocations>
<defaultQueueSchedulingPolicy>fair</defaultQueueSchedulingPolicy>
<queue name="prod">
<weight>40</weight>
<schedulingPolicy>fifo</schedulingPolicy>
</queue>
<queue name="dev">
<weight>60</weight>
<queue name="eng" />
<queue name="science" />
</queue>
<queuePlacementPolicy>
<rule name="specified" create="false" />
<rule name="primaryGroup" create="false" />
<rule name="default" queue="dev.eng" />
</queuePlacementPolicy>
</allocations>