What’s new in Cassandra 1.1: Flexible data file placement

Apache Cassandra is designed from the ground up to work well on spinning disks, but it can also leverage the high IOPS of SSDs. (Don’t miss the video and slides about using Cassandra with SSDs from our solutions architect.)

Suppose you have a column family whose data is written once and read infrequently (named “Logs”), and one whose data is accessed frequently (named “UserData”) under the same keyspace named “App”. You may want to use an SSD for the frequently accessed column family in order to boost IO performance. At first, it looks like you can achieve this by mounting the SSD to an appropriate data directory, but then you realize that Cassandra stores all column family data files under a single directory for their keyspace, like below:

/var/lib/cassandra/data/App/Logs-hc-1-Data.db
/var/lib/cassandra/data/App/Logs-hc-1-Index.db
...
/var/lib/cassandra/data/App/UserData-hc-1-Data.db
/var/lib/cassandra/data/App/UsreData-hc-1-Index.db
...

Until now, you can only use a separate disk per keyspace, not per column family.

More control over data files

In version 1.1, CASSANDRA-2749 changes the way Cassandra stores data files by using separate column family directories within each keyspace directory. In 1.1, the above data files will instead be stored like this:

/var/lib/cassandra/data/App/Logs/App-Logs-hc-1-Data.db
/var/lib/cassandra/data/App/Logs/App-Logs-hc-1-Index.db
...
/var/lib/cassandra/data/App/UserData/App-UserData-hc-1-Data.db
/var/lib/cassandra/data/App/UserData/App-UserData-hc-1-Index.db
...

This allows you to mount an SSD on a particular directory (in this case UserData) to boost the performance for a particular column family. You may notice that the file name format has also been changed to include the keyspace name at the beginning. This makes it easy to distinguish which keyspace and column family the file belongs when streaming or bulk loading.

What about upgrading?

Do you need to manually move all pre-1.1 data files to the new directory structure before upgrading to 1.1? No. Immediately after Cassandra 1.1 starts, it checks to see whether it has old directory structure and migrates all data files (including backups and snapshots) to the new directory structure if needed. So, just upgrade as you always do (don’t forget to read NEWS.txt first), and you will get more control over data files for free.

Conclusion

Starting with Cassandra 1.1, data files are stored inside their own column family directory, which enables you to control what column family goes to which disk. Upgrading to the new directory structure is done automatically, so no extra upgrade steps are required. The beta2 version of 1.1 is available for download, so feel free to try it out! Feedback is always appreciated.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值