hadoop CDH3版和apache 0.20版本的对比

目前hadoop2个开源版本,一个是Apache的版本,另一个是ClouderaApache的基础上进行优化的版本,也称为CDH3版。

两个版本的对比情况如下:

 

CDH3

版本

Apache

版本

描述

Hadoop Common

The common utilities that support the other Hadoop subprojects.

Hadoop Distributed File System (HDFS)

A distributed file system that provides high-throughput access to application data.

Hadoop MapReduce

A software framework for distributed processing of large data sets on compute clusters.

Flume

 

A distributed, reliable, and available service for efficiently moving large amounts of data as the data is produced.

Sqoop

 

A tool that imports data from relational databases into Hadoop clusters.

Hue

 

A graphical user interface to work with CDH.

Pig

A high-level data-flow language and execution framework for parallel computation.Enables you to analyze large amounts of data using Pig's query language called Pig Latin.

Hive

A data warehouse infrastructure that provides data summarization and ad hoc querying. A powerful data warehousing application built on top of Hadoop which enables you to access your data using Hive QL, a language that is similar to SQL.

HBase

A scalable, distributed database that supports structured data storage for large tables. provides large-scale tabular storage for Hadoop using the Hadoop Distributed File System (HDFS).

ZooKeeper

A high-performance coordination service for distributed applications.A highly reliable and available service that provides coordination between distributed processes.

Oozie

 

A server-based workflow engine specialized in running workflow jobs with actions that execute Hadoop jobs.

Whirr

 

Provides a fast way to run cloud services.

Snappy

 

A compression/decompression library.

Avro

 

A data serialization system.

Cassandra

 

A scalable multi-master database with no single points of failure.

Chukwa

 

A data collection system for managing large distributed systems.

Mahout

 

A Scalable machine learning and data mining library.

理论上说,CDH3版本应该支持Apache版本的全部组件及其子项目。

两个hadoop版本的异同如下:

系统

从CDH3b3开始不支持hadoop.job.ugi参数,请使用UserGroupInformation.doAs()方法代替。

其它见:https://ccp.cloudera.com/display/CDHDOC/Incompatible+Changes

 

安装

Cloudera CDH3基于hadoop稳定版0.20.2,并集成很多补丁(patch)。

CDH提供rpm包和tar两种方式(Cloudera更推荐使用rpm方式),hadoop0.20.2只提供了tar包安装方式。

Cloudera CDH3 自动设置JAVA_HOME环境变量,apache hadoop需要手工配置。

Apache hadoop使用start/stop-dfs.sh start/stop-all.sh脚本维护集群,CDH通过root身份运行/etc/init.d/hadoop-0.20-* 脚本启动、关闭服务,这种方式只可以管理当前服务器,如果希望实现类似start/stop-all.sh需要自己写脚本。

Cloudera CDH安装成功后会添加两个用户:hdfs(hdfs文件系统相关), mapred(mapreduce相关),而Apache hadoop通常的做法是添加一个hadoop用户来做所有的事情。

Cloudera CDH通过alternatives切换多个配置文件,而Apache hadoop配置文件只保存在$HADOOP_HOME/conf下面。

 

eclipse插件

Cloudera CDH默认没有提供eclipse插件,需要自己编译,而且它的插件和Apache hadoop插件不兼容。

 

安全

 CDH3支持Kerberos安全认证,apache hadoop则使用简陋的用户名匹配认证。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值