Hadoop的ecosystem

Doramonbaby

于 2020-12-08 09:55:01 发布

阅读量416

点赞数

文章标签： hadoop

本文链接：https://blog.csdn.net/Doramonbaby/article/details/110699582

版权

前两篇回顾了hive， hadoop等知识

接下来，我们继续学习hadoop的其他服务吧！(抱歉，为了英语不退化，下面会有英文的出现)

In the next step, we are continuing to talking about the ecosystem of Hadoop.

Except for the major elements of Hadoop such as hdfs, Mapreduce and yarn, there are many tools and solutions that are used to supplement or support these major components.

reference

HDFS: Hadoop Distributed File System that can store different types of large data sets (i.e. structured, unstructured and semi-structured data)
YARN: Yet Another Resource Negotiator that performs all processing activities by allocating resources and scheduling tasks.
MapReduce: Programming based Data Processing
PIG: it has two parts Pig Latin and Pig runtime as Java and JVM, which has SQL like command structure.
Spark: In-Memory computations to increase the speed of data processing over MapReduce.
HIVE: Query-based processing of data services
HBase: NoSQL Database, written by Java API or shell command; reference
Mahout: it is renowned for machine learning. it provides an environment for creating machine learning applications
Solr&Lucene: Searching and Indexing in hadoop ecosystem
Zookeeper: it is the coordinator of any hadoop job and coordinates with various services in a distributed environment.
Oozie: works as a clock and alarm service inside hadoop ecosystem just like a scheduler.
Sqoop: it is a tool designed to transfer data between hdfs and relational database servers, importing data from RD to HDFS, and vice versa. The language used is Java; reference
Flume: ingest unstructured and semi-structured data into hdfs, such as datasets from a web server.
Kafka: it is distributed streaming platform and a distributed commit log service; I recommend this article reference since it clearly explains how Kafka work!!