Tips for Cassandra - 1

最新推荐文章于 2024-07-18 09:07:20 发布

colbyfeng

最新推荐文章于 2024-07-18 09:07:20 发布

阅读量209

点赞数

分类专栏： NoSQL 文章标签： cassandra hbase mapreduce processing performance algorithm

本文链接：https://blog.csdn.net/colbyfeng/article/details/5555721

版权

NoSQL 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

1. Tips and Summaries from Dominic Williams 'HBase vs Cassandra: why we moved':

[1] Bloodline

HBase: Derived from Google Big Table and File System Designs.

Cassandra: Big Table Also and a system inspired by Dynamo for storing data.

So HBase more suitable for data warehousing and large scale data processing and analysis (e.g: index the web)

And Cassandra more suitable for real time transaction processing and the serving of interactive data.

[2] CAP:

CAP provided by Eric Brewer: Consistency, Availability and Tolerance to network partitions. Hbase chose CP, and Cassandra chose AP.

The CAP theorem only applies to a single distributed algorithm, but there is no reason why you can't design a single system where for any given operation, the underlying algorithm and thus the trade-off achieved is selectable.

It's possible to offer differing degrees of balance between consistency, availability and tolerance to partition. This is Cassandra.

The beauty of Cassandra is that you can choose the trade-offs you want on a case by case basis such that they best match the requirements of the particular operation you are performing.

e.g: if specify consistency level 'ALL' from Cassandra, the zero tolerance is met. if simply want maximum possible performance, I can read from Cassandra with level 'ONE'. Typically use consistency level “QUORUM” – that a majority of nodes in the replication factor agree.

<Self Note> Flexible Design

[3] Monolithic

Cassandra comes as a single Java process to be run per node like monolithic, and HBase solution is really comprised of serveral parts with modular: DB process may run in several modes, hadoop HDFS, and a Zookeeper.

<Self Note> Lightly Weight

[4] Gossip

Cassandra, which uses P2P communication protocol called 'Gossip', has no master nodes or region servers like in HBase, which has particular node or entity talking on a coordination role.

Benefits of P2P Arch:

(1) Add new node easier.

(2) Debug more progressive and repeatable when things go wrong.

(3) Load balance better.

(4) Tolerance to partition better.

(5) Performance scale smoothly

...

<Self Note> Routing table VS P2P Arch.

[5] Lock and modularity

Row lock by HBase with using region server as gateway, which Cassandra can't because of all equal nodes.

Cassandra implements the BigTable data model but uses a design where data storage is distributed over symmetric nodes and manages to implement a design that executes that purpose better – as indicated for example by its selectable CAP tradeoffs.

[6] Map Reduce

Cassandra can’t do well yet is MapReduce!

MapReduce and related systems such as Pig and Hive work well with HBase because it uses hadoop HDFS to store its data, which is the platform these systems were primarily designed to work with. If you need to do that kind of data crunching and analysis, HBase may currently be your best option.

Cassandra has hadoop support in 0.6, so its MapReduce integration may be about to get a whole load better.

colbyfeng

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Tips for Cassandra - 1

1. Tips and Summaries from Dominic Williams HBase vs Cassandra: why we moved:[1] BloodlineHBase: Derived from Google Big Table and File System Designs.
复制链接

扫一扫