Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license.[1]It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google'sMapReduce and Google File System (GFS) papers.
Hadoop is a top-level Apache project being built and used by a global community of contributors,[2] using the Javaprogramming language. Yahoo! has been the largest contributor[3] to the project, and uses Hadoop extensively across its businesses.[4]
Hadoop was created by Doug Cutting,[5] who named it after his son's toy elephant.[6] It was originally developed to support distribution for the Nutch search engine project.[7]
Contents[hide] |
http://hadoop.apache.org/
- What Is Apache Hadoop?
- Who Uses Hadoop?
- News
- March 2011 - Apache Hadoop takes top prize at Media Guardian Innovation Awards
- January 2011 - ZooKeeper Graduates
- September 2010 - Hive and Pig Graduate
- May 2010 - Avro and HBase Graduate
- July 2009 - New Hadoop Subprojects
- March 2009 - ApacheCon EU
- November 2008 - ApacheCon US
- July 2008 - Hadoop Wins Terabyte Sort Benchmark
What Is Apache Hadoop?
The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.
The project includes these subprojects:
- Hadoop Common: The common utilities that support the other Hadoop subprojects.
- Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
- Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters.
Other Hadoop-related projects at Apache include:
- Avro™: A data serialization system.
- Cassandra™: A scalable multi-master database with no single points of failure.
- Chukwa™: A data collection system for managing large distributed systems.
- HBase™: A scalable, distributed database that supports structured data storage for large tables.
- Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.
- Mahout™: A Scalable machine learning and data mining library.
- Pig™: A high-level data-flow language and execution framework for parallel computation.
- ZooKeeper™: A high-performance coordination service for distributed applications.
Who Uses Hadoop?
A wide variety of companies and organizations use Hadoop for both research and production. Users are encouraged to add themselves to the Hadoop PoweredBy wiki page.
http://www.oschina.net/p/hadoop
Hadoop并不仅仅是一个用于存储的分布式文件系统,而是设计用来在由通用计算设备组成的大型集群上执行分布式应用的框架。
下图是Hadoop的体系结构: