Tips for Cassandra - 1

1.  Tips and Summaries from Dominic Williams 'HBase vs Cassandra: why we moved':


[1] Bloodline

HBase: Derived from Google Big Table and File System Designs.

Cassandra: Big Table Also and a system inspired by Dynamo for storing data.

So HBase more suitable for data warehousing and large scale data processing and analysis (e.g: index the web)

And Cassandra more suitable for real time transaction processing and the serving of interactive data.



[2] CAP:

CAP provided by Eric Brewer: Consistency, Availability and Tolerance to network partitions. Hbase chose CP, and Cassandra chose AP.


The CAP theorem only applies to a single distributed algorithm, but there is no reason why you can't design a single system where for any given operation, the underlying algorithm and thus the trade-off achieved is selectable.

It's possible to offer differing degrees of balance between consistency, availability and tolerance to partition. This is Cassandra.


The beauty of Cassandra is that you can choose the trade-offs you want on a case by case basis such that they best match the requirements of the particular operation you are performing. 


e.g: if specify consistency level 'ALL' from Cassandra, the zero tolerance is met.  if simply want maximum possible performance, I can read from Cassandra with level 'ONE'.  Typically use consistency level “QUORUM” – that a majority of nodes in the replication factor agree.


<Self Note> Flexible Design


[3] Monolithic

Cassandra comes as a single Java process to be run per node like monolithic, and HBase solution is really comprised of serveral parts with modular: DB process may run in several modes, hadoop HDFS, and a Zookeeper.


<Self Note> Lightly Weight


[4] Gossip

Cassandra, which uses P2P communication protocol called 'Gossip', has no master nodes or region servers like in HBase, which has particular node or entity talking on a coordination role.

Benefits of P2P Arch:

  (1) Add new node easier.

  (2) Debug more progressive and repeatable when things go wrong.

  (3) Load balance better.

  (4) Tolerance to partition better.

  (5) Performance scale smoothly

  ...


<Self Note> Routing table VS P2P  Arch.


[5] Lock and modularity

Row lock by HBase with using region server as gateway, which Cassandra can't because of all equal nodes.

Cassandra implements the BigTable data model but uses a design where data storage is distributed over symmetric nodes and manages to implement a design that executes that purpose better – as indicated for example by its selectable CAP tradeoffs.


[6] Map Reduce

Cassandra can’t do well yet is MapReduce!

MapReduce and related systems such as Pig and Hive work well with HBase because it uses hadoop HDFS to store its data,  which is the platform these systems were primarily designed to work with. If you need to do that kind of data crunching and analysis, HBase may currently be your best option.


Cassandra has hadoop support in 0.6, so its MapReduce integration may be about to get a whole load better.









 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值