Hadoop纵览之(一)Hadoop简介与集群搭建

最新推荐文章于 2019-11-22 16:15:12 发布

丨VivaLaVida丨

最新推荐文章于 2019-11-22 16:15:12 发布

阅读量167

点赞数

文章标签： Haoop简介 Hadoop集群搭建流程

本文链接：https://blog.csdn.net/qq_35954433/article/details/85262973

版权

Hadoop历史

雏形开始于2002年的Apache的Nutch，Nutch是一个开源Java 实现的搜索引擎。它提供了我们运行自己的搜索引擎所需的全部工具。包括全文搜索和Web爬虫。

随后在2003年Google发表了一篇技术学术论文谷歌文件系统（GFS）。GFS也就是google File System，google公司为了存储海量搜索数据而设计的专用文件系统。

2004年Nutch创始人Doug Cutting基于Google的GFS论文实现了分布式文件存储系统名为NDFS。

2004年Google又发表了一篇技术学术论文MapReduce。MapReduce是一种编程模型，用于大规模数据集（大于1TB）的并行分析运算。

2005年Doug Cutting又基于MapReduce，在Nutch搜索引擎实现了该功能。

2006年，Yahoo雇用了Doug Cutting，Doug Cutting将NDFS和MapReduce升级命名为Hadoop，Yahoo开建了一个独立的团队给Goug Cutting专门研究发展Hadoop。
不得不说Google和Yahoo对Hadoop的贡献功不可没。*

Hadoop核心

Hadoop的核心就是HDFS和MapReduce，而两者只是理论基础，不是具体可使用的高级应用，Hadoop旗下有很多经典子项目，比如HBase、Hive等，这些都是基于HDFS和MapReduce发展出来的。要想了解Hadoop，就必须知道HDFS和MapReduce是什么。

Hadoop概念

Apache Hadoop
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Hadoop 是一个能够对大量数据进行分布式处理的软件框架。具有可靠、高效、可伸缩的特点。

Hadoop中Modules

在这里插入图片描述
What is Apache Hadoop Ozone?
官网:
Ozone is a new subproject of Apache Hadoop. It provides an object store semantic for Hadoop.
It uses Hadoop Distributed Data Storage (HDDS) for storage layer. HDDS is another new subporoject of Apache Hadoop.
Ozone is an Object store for Hadoop built using Hadoop Distributed Data Store
WIKI：
https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Contributor+Guide:
Ozone is a distributed key-value store that can efficiently manage both small and large files alike. While HDFS provides POSIX-like semantics, Ozone looks and behaves like an Object Store.
Hadoop模块之一一个分布式的键值对存储框架能够同时对大小文件进行高效的管理

Hadoop中相关工程(生态圈)

在这里插入图片描述
Hadoop集群搭建
hadoop有三种运行模式：
独立(本地)模式：
无需运行任何守护进程，所有程序都在同一个JVM上执行，适合开发阶段；
伪分布模式：
hadoop守护进程运行在本地机器上，模拟一个小规模的集群。

全分布式模式：
hadoop守护进程运行在一个集群上。

*准备 :

安装linux
修改主机名(非必须为了便于区分)
静态IP
时间同步
-----------------------------集群时间同步--------------------
1.手工的改 date –s “2016-01-05”
2.启动service NTP
检查ntpd是否安装
配置ntpserver vi /etc/ntp.conf
启动ntpserver上的ntpd服务(注:ntpclient坚决不能启动ntpd服务)
在ntpclient上运行:ntpdate hadoop01

网络调试
关闭防火墙

iptables -F
chkconfig iptables off
service iptables save
等

*安装
1.安装Jdk
2.安装Hadoop
3.配置SSH
4.配置hadoop- env.sh (指定JDK目录)
5.配置hadoop- core-site.xm
6.配置hadoop- hdfs-site.xm
7.配置hadoop- mapred-site.xml
8.配置hadoop- yarn-site.xml
9.配置slaves
10.scp发送hadoop文件到其他机器上

启动/关闭Hadoop集群
hadoop namenode -format
start-dfs.sh
start-yarn.sh

stop-yarn.sh
stop-dfs.sh

丨VivaLaVida丨

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
**Hadoop纵览之(一)Hadoop简介与集群搭建**

Hadoop历史雏形开始于2002年的Apache的Nutch，Nutch是一个开源Java 实现的搜索引擎。它提供了我们运行自己的搜索引擎所需的全部工具。包括全文搜索和Web爬虫。随后在2003年Google发表了一篇技术学术论文谷歌文件系统（GFS）。GFS也就是google File System，google公司为了存储海量搜索数据而设计的专用文件系统。2004年Nutch创始人Do...
复制链接

扫一扫