cassandra (2)Understanding the Architecture【About partitioners】

A partitioner determines how data is distributed across the nodes in the cluster (including replicas).
Basically, a partitioner is a hash function for computing the token (it's hash) of a row key. Each row of
data is uniquely identified by a row key and distributed across the cluster by the value of the token.
Both the Murmur3Partitioner and RandomPartitioner use tokens to help assign equal portions of data
to each node and evenly distribute data from all the tables throughout the ring or other grouping, such
as a keyspace. This is true even if the tables use different row keys, such as usernames or timestamps.
Moreover, the read and write requests to the cluster are also evenly distributed and load balancing is
simplified because each part of the hash range receives an equal number of rows on average. For more
detailed information, see Consistent hashing on page 14.
Cassandra offers the following partitioners:
• Murmur3Partitioner (default): uniformly distributes data across the cluster based on MurmurHash
hash values.
• RandomPartitioner: uniformly distributes data across the cluster based on MD5 hash values.
• ByteOrderedPartitioner: keeps an ordered distribution of data lexically by key bytes
The Murmur3Partitioner is the default partitioning strategy for new Cassandra clusters and the right
choice for new clusters in almost all cases.
Set the partitioner in the cassandra.yaml file:
• Murmur3Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
• RandomPartitioner: org.apache.cassandra.dht.RandomPartitioner
• ByteOrderedPartitioner: org.apache.cassandra.dht.ByteOrderedPartitioner
Note:  If using virtual nodes, you do not need to calculate the tokens. If not using virtual nodes, you
must calculate the tokens to assign to the initial_token parameter in the cassandra.yaml file.

See Generating tokens on page 163 and use the method for the type of partitioner you are using.

About the Murmur3Partitioner
The Murmur3Partitioner provides faster hashing and improved performance than the previous default
partitioner (RandomPartitioner).
You can only use Murmur3Partitioner for new clusters; you cannot change the partitioner in existing
clusters. If you are switching to the 1.2 cassandra.yaml be sure to change the partitioner setting to
match the previous partitioner.
The Murmur3Partitioner uses the MurmurHash function. This hashing function creates a 64-bit hash
value of the row key. The possible range of hash values is from -2
63
 to +2
63
.
When using the Murmur3Partitioner, you can page through all rows using the token function in a CQL 3
query.
About the RandomPartitioner
The default partitioner prior to Cassandra 1.2.
Although no longer the default partitioner, you can use the RandomPartitioner in version 1.2, even when
using virtual nodes. However, if you don't use virtual nodes, you must calculate the tokens, as described
in Generating tokens.

The RandomPartition distributes data evenly across the nodes using an MD5 hash value of the row key.
The possible range of hash values is from 0 to 2
127
 -1.
When using the RandomPartitioner, you can page through all rows using the token function in a CQL 3
query.

About the ByteOrderedPartitioner
Use for ordered partitioning
Cassandra provides the ByteOrderedPartitioner for ordered partitioning. This partitioner orders rows
lexically by key bytes. You calculate tokens by looking at the actual values of your row key data and using
a hexadecimal representation of the leading character(s) in a key. For example, if you wanted to partition
rows alphabetically, you could assign an A token using its hexadecimal representation of 41.
Using the ordered partitioner allows ordered scans by primary key. This means you can scan rows as
though you were moving a cursor through a traditional index. For example, if your application has user
names as the row key, you can scan rows for users whose names fall between Jake and Joe. This type
of query is not possible using randomly partitioned row keys because the keys are stored in the order of
their MD5 hash (not sequentially).
Although having the capability to do range scans on rows sounds like a desirable feature of ordered
partitioners, there are ways to achieve the same functionality using table indexes.
Using an ordered partitioner is not recommended for the following reasons:
Difficult load balancing
More  administrative  overhead  is  required  to  load  balance  the  cluster.  An  ordered  partitioner  requires
administrators to manually calculate partition ranges (formerly token ranges) based on their estimates of
the row key distribution. In practice, this requires actively moving node tokens around to accommodate
the actual distribution of data once it is loaded.
Sequential writes can cause hot spots
If your application tends to write or update a sequential block of rows at a time, then the writes are not be
distributed across the cluster; they all go to one node. This is frequently a problem for applications dealing
with timestamped data.
Uneven load balancing for multiple tables
If your application has multiple tables, chances are that those tables have different row keys and different
distributions of data. An ordered partitioner that is balanced for one table may cause hot spots and uneven
distribution for another table in the same cluster.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值