[b][size=x-large]Hadoop介绍[/size][/b]
Hadoop是Google云计算框架的开源实现,是一个分布式存储和分布式计算的框架,主要包括HDFS和MapReduce的实现。
[b][size=large] HDFS[/size][/b]
HDFS由一个NameNode和多个DataNode组成,其中NameNode相当于系统的元数据存放地,它是Hadoop系统的神经中枢,而多个DataNode存储数据。
[b][size=large]MapReduce:分布式计算[/size][/b]
一个调用客户端由一个JobTracker代表,它将一个任务划分为多个子任务,每个子任务分别由一个TaskTracker负责。TaskTracker和DataNode在一起,本地数据本地计算。
[b][size=x-large]Hadoop的子项目[/size][/b]
[b][size=large]Avro™[/size][/b]
A data serialization system.
[b][size=large]Cassandra™:[/size][/b]
A scalable multi-master database with no single points of failure.
[b][size=large]Chukwa™: [/size][/b]
A data collection system for managing large distributed systems.
[b][size=large]HBase™:[/size][/b]
A scalable, distributed database that supports structured data storage for large tables.
[b][size=large]Hive™: [/size][/b]
A data warehouse infrastructure that provides data summarization and ad hoc querying.
[b][size=large]Mahout™: [/size][/b]
A Scalable machine learning and data mining library.
[b][size=large]Pig™: [/size][/b]
A high-level data-flow language and execution framework for parallel computation.
[b][size=large]ZooKeeper™:[/size][/b]
A high-performance coordination service for distributed applications.
Hadoop是Google云计算框架的开源实现,是一个分布式存储和分布式计算的框架,主要包括HDFS和MapReduce的实现。
[b][size=large] HDFS[/size][/b]
HDFS由一个NameNode和多个DataNode组成,其中NameNode相当于系统的元数据存放地,它是Hadoop系统的神经中枢,而多个DataNode存储数据。
[b][size=large]MapReduce:分布式计算[/size][/b]
一个调用客户端由一个JobTracker代表,它将一个任务划分为多个子任务,每个子任务分别由一个TaskTracker负责。TaskTracker和DataNode在一起,本地数据本地计算。
[b][size=x-large]Hadoop的子项目[/size][/b]
[b][size=large]Avro™[/size][/b]
A data serialization system.
[b][size=large]Cassandra™:[/size][/b]
A scalable multi-master database with no single points of failure.
[b][size=large]Chukwa™: [/size][/b]
A data collection system for managing large distributed systems.
[b][size=large]HBase™:[/size][/b]
A scalable, distributed database that supports structured data storage for large tables.
[b][size=large]Hive™: [/size][/b]
A data warehouse infrastructure that provides data summarization and ad hoc querying.
[b][size=large]Mahout™: [/size][/b]
A Scalable machine learning and data mining library.
[b][size=large]Pig™: [/size][/b]
A high-level data-flow language and execution framework for parallel computation.
[b][size=large]ZooKeeper™:[/size][/b]
A high-performance coordination service for distributed applications.