005 Hadoop简单介绍 Hadoop单机版安装及应用 hdfs相关内容介绍 hdfs文件读写流程 Yarn概念的讲解 Yarn的流程介绍

本文链接：https://blog.csdn.net/C_time/article/details/90379061

在这里插入图片描述
官网的内容http://hadoop.apache.org/
Welcome to Apache™ Hadoop®!
What Is Apache Hadoop?
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
灵活的可扩展的分布式计算的软件或者平台开源的
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

The project includes these modules:

Hadoop Common: The common utilities that support the other Hadoop modules.
Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
Hadoop YARN: A framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
在这里插入图片描述

hadoop单机版的安装
通过shell拖到home目录下
解压到usr local目录下

进入目录查看是否解压出来

有然后查看hadoop目录

bin目录二进制目录可执行文件都是 binary

etc目录软件配置目录
etc下面还有一层hadoop
这个目录下面基本全是配置文件
里面有六七个今后经常用的文件

还有included lib libexec
基本都是hadoop的一些库和扩展性文件

license notice readme 是软件发行都会有的必备的东西

sbin下面也是一些可执行的命令
只不过这些命令是用来启停hadoop服务的
也有好几个常用的

share下面是dos学习文档使用说明书
hadoop里面是一些测试jar包还有源码

目录先说这么多

解压过后就算安装好了
不过还有一步
要配置环境变量
vi /etc/profile
shift + G
到最后一行
按o换行
HADOOP_HOME=/usr/local/hadoop-2.7.1/
:cd /usr/local/hadoop-2.7.1/ 这一句是tab键补出来为了填上面那一句的
用完删掉
然后PATH后面追加 $HADOOP_HOME/bin:$ HADOOP_HOME/sbin:
在这里插入图片描述
配置好了还没生效 source /etc/profile/即可生效

但不知道为什么我的source命令不能用
但是hadoop环境变量是可以的了

然后我们看hadoop version

表示我们还没有配这个JAVA环境变量
我们来配
vi ./etc/hadoop/hadoop-env.sh
找到JAVA_HOME 换成我们java的目录
在这里插入图片描述
保存退出
再试一次hadoop version

这样就没问题了
总结一下
hadoop单机版的安装

下面我们举个例子

  $ mkdir input
  $ cp etc/hadoop/*.xml input
  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar grep input output 'dfs[a-z.]+'
  $ cat output/*

官网上的单机版的例子
在这里插入图片描述
创建一个目录input
然后把 etc/hadoop/这个目录下的所有的.xml文件复制到该目录下
然后去计算每一个文件里面每一个单词出现的次数

这有8个文件
命令
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /home/input/ /home/outout

注意这个output文件不能先创建

防止和已有文件内的已有内容掺杂造成结果不准确
这是跑的时候出现的内容不一一解说以后再说
在这里插入图片描述
看结果output
有两个文件
其中第二个文件代表本次作业成功

直接使用more /home/output/part-r-00000看详细结果

ok
hadoop单机版的安装和应用就说到这里

我们看一下这个根目录下有什么东西
hdfs dfs -ls /
在这里插入图片描述
所以
我们看看HDFS hdfs这个
hadoop有三大核心四大模块
hdfs就是一大模块

(一次写入多次读写)

官网上的图
https://hadoop.apache.org/old/#Getting+Started
http://hadoop.apache.org/docs/current/
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html 在这里插入图片描述
Rack1 Rack2就是机架每个机架可以安装多台服务器
hadoop1默认64M一个块
Hadoop2 128M一个块

Yarn 在中间这一层全局资源管理
是在hadoop2.x之后才有的 1.x时没有
在这里插入图片描述
云计算的三层服务
IaaS 基础设施即服务
PaaS 平台即服务
SaaS 软件即服务

全局资源管理调度任务
有点像系统了
Yarn可以说是一个小型系统
可以说是一个软件
在云计算这块 Yarn可以说是平台这一块的
在这里插入图片描述