cassandra ABC

总结cassandra的特性:
1. 支持SQL语句, 本身带CQL工具,支持等值查询, 但对range query支持很弱, 不支持Join. 
2. cassandra 虽然是NoSQL数据库, 但比MongoDB要更强调schema, 但不如MongoDB
3. cassandra 安装和配置简单, HBase 相比配置要复杂多了.
4. 线性扩展 和 高可用性(支持多data center的高可用)
5. 灵活性, 如果关注性能, 可以consistency设置的低一些,如果关注data integrity, consistency设置的高一些
6. cassandra 适合大量数据的存储, 而且写的效率非常高, 比RDBMS和HBase都高很多, 类外提供类SQL的语句, 上手很快. 

Cassandra 仅适合特定场景, 下面是我罗列的使用场景:
1. [不推荐]交易型的系统, 显然需要完整的ACID特性支持, 还是建议使用传统的RDBMS. 
2. [不推荐]其他OLTP系统, 如果一定要使用NoSQL数据库, 还是使用MongoDB, MongoDB
3. [不推荐]通用的数据分析系统(BI类的应用). Cassandra 不支持 Join/Range 查询, 所以它不太适合绝大多数OLAP场景(至少单独使用不合适)
4. [推荐]代替 redis 的场景, 比如作为 缓存 服务器, 团购/秒杀业务中替代Redis, 借助数据的TTL特性
5. [推荐]和RDBMS配合着使用的某些场景, 比如单笔数据量很大的记录, 可以将这些大记录存放到Cassandra中,  将记录关键查询信息记保存到RDBMS, 由RDBMS提供丰富的range和统计查询功能. 
6. [推荐]用户画像数据库, 用户画像的标签会非常多, 也仅仅需要等值查询功能即可, Cassandra可以添加无数个column. 
7. [推荐]其他一些代替 hbase 的场景. HBase 需要搭建在Hadoop集群上, 管理复杂, Cassandra 集群搭建很简单, 可以代替 HBase + phoenix 组合, 而且写性能更好

 下面Cassandra/MongoDB对比摘自: https://scalegrid.io/blog/cassandra-vs-mongodb/#

 

非常不错的学习资料, 包含日常运维, 升级, 数据迁移, 架构
http://zqhxuyuan.github.io/tags/cassandra

低成本搭建多可用区域高可用Cassandra集群, http://chuansong.me/n/840485751454 , 理解不同replication factor和数据中心和写入策略下,Cassandra的高可用性

Cassandra Note  https://chenhm.com/slides/cassandra/cassandra.html#true-columns

很不错的系列 https://www.flyml.net/2016/10/30/some-comments-on-column-family-database/ 

http://www.csdn.net/article/2014-10-24/2822278-how-to-bulida-spark-and-cassandra-based-high-performance-data-pipeline/2
【问底】许鹏:使用Spark+Cassandra打造高性能数据分析平台(一)

 

https://killrvideo.github.io/ 是一个完整的Cassandra+C#的Web示例应用, 而且是一个微服务的完整案例, 包括如何使用docker+etcd, 包括Cassandra 模型的完整设计.

spark机器学习笔记--包含好几个公开的数据集
http://blog.csdn.net/u013719780/article/details/51768720

Learn Apache Cassandra by Example with CDM( Cassandra Dataset Manager)
http://thelastpickle.com/blog/2016/09/21/learn-cassandra-by-example-with-cdm.html

机器学习常用的公开数据集: http://blog.csdn.net/u013719780/article/details/51768720 
=========================================
使用场景
=========================================
www.tuicool.com/articles/RjUjUrB
https://www.oreilly.com/ideas/apache-cassandra-for-analytics-a-performance-and-storage-analysis
http://stackoverflow.com/questions/2634955/when-not-to-use-cassandra
http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
http://www.edureka.co/blog/why-learn-cassandra-with-hadoop/


=========================================
教程
=========================================
中文文档
http://pimin.net/tags/Cassandra

http://rustyrazorblade.com/2015/08/migrating-from-mysql-to-cassandra-using-spark/
http://rustyrazorblade.com/2016/05/working-relationally-with-cassandra/

Spark SQL + Cassandra 
https://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/spark/sparkTOC.html

http://www.planetcassandra.org/blog/the-new-analytics-toolbox-with-apache-spark-going-beyond-hadoop/

Cassandra Tutorial
https://intellipaat.com/tutorial/cassandra-tutorial/

Cassandra + PySpark DataFrames revisted
http://rustyrazorblade.com/2015/07/cassandra-pyspark-dataframes-revisted/


使用 Python 开发
http://datastax.github.io/python-driver/getting_started.html

http://slides.com/amberdoctor/getting-started-with-cassandra-python#/6

python on cassandra  
http://yyri.blog.163.com/blog/static/148943951201221983458871/

Python 操作Cassandra
http://www.cnblogs.com/zhfan/p/4181529.html

http://pycon-2012-notes.readthedocs.io/en/latest/apache_cassandra_and_python.html
 


=========================================
Model

=========================================
Cassandra Modeling for Real-Time Analytics
http://www.datasciencecentral.com/profiles/blogs/cassandra-modeling-for-real-time-analytics


http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling
http://www.datastax.com/dev/blog/thrift-to-cql3

cql3 for DataStax 2.0 & 2.1  https://docs.datastax.com/en/cql/3.1/pdf/cql31.pdf
Cassandra By Example: Data Modelling with CQL3  http://www.slideshare.net/jericevans/cassandra-by-example-data-modelling-with-cql3
Primary Key(Partitioning Key, clustering Key) http://www.planetcassandra.org/blog/primary-keys-in-cql/

http://opensourceconnections.com/blog/2013/07/24/understanding-how-cql3-maps-to-cassandras-internal-data-structure/

http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/



=========================================
数据迁移
=========================================
www.tuicool.com/articles/ie67Vn
http://www.svds.com/flexible-data-architecture-with-spark-cassandra-and-impala/
http://www.codeproject.com/Articles/279947/Migration-of-Relational-Data-structure-to-Cassandr

http://wiki.apache.org/cassandra/FAQ
http://wiki.apache.org/cassandra/Operations


=========================================
原理
=========================================
一致性hash算法释义
http://www.cnblogs.com/haippy/archive/2011/12/10/2282943.html
https://my.oschina.net/xianggao/blog/394545


下面这个文章是虽然是将LevelDB的, 但Cassandra的很多概念都是类似, 尤其是内存/log等存储结构,
数据分析与处理之二(Leveldb 实现原理)
http://www.cnblogs.com/haippy/archive/2011/12/04/2276064.html

[译]Cassandra 架构简述
http://www.cnblogs.com/hxdong/archive/2013/06/16/3135455.html
delete 操作后空间的释放 http://www.sestevez.com/range-tombstones/
constant, GCGraceSeconds 参数,  default setting is very conservative, at 10 days
为表设置Compaction Strategy, https://www.instaclustr.com/blog/2016/01/27/apache-cassandra-compaction/

建模:  http://www.devx.com/dbzone/cassandra-for-sql-developers.html
建模 https://academy.datastax.com/resources/getting-started-time-series-data-modeling

select 语句的限制: http://mechanics.flite.com/blog/2013/11/05/breaking-down-the-cql-where-clause/

建模: http://rustyrazorblade.com/2015/08/migrating-from-mysql-to-cassandra-using-spark/
http://rustyrazorblade.com/2016/05/working-relationally-with-cassandra/
Cassandra + PySpark DataFrames revisted
http://rustyrazorblade.com/2015/07/cassandra-pyspark-dataframes-revisted/



=========================================
安装
=========================================

cassandra 下载
https://academy.datastax.com/planet-cassandra//cassandra/

Cassandra部署与安装
http://dongxicheng.org/nosql/cassandra-install/


cassandra集群添加新的数据中心
http://openwares.net/database/cassandra_add_new_datacenter.html



Cassandra中的各种策略(分区,备份,一致性, 存储等策略)
http://dongxicheng.org/nosql/cassandra-strategy/

详解Cassandra0.7配置文件
http://www.cnblogs.com/gpcuster/archive/2010/11/12/1875388.html

Cassandra单集群实验2个节点
http://blog.fens.me/cassandra-clustor/


动态增删Cassandra机器节点
http://www.codes51.com/article/detail_430313.html



=========================================
练习
=========================================
cqlsh> CREATE KEYSPACE mykeyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};

-- Breaking Down the CQL Where Clause
--http://mechanics.flite.com/blog/2013/11/05/breaking-down-the-cql-where-clause/
create table temperature_by_day(
    wheatherstation_id text,
    date text,
    event_time timestamp,
    temperature text,
    primary key((wheatherstation_id, date), event_time)
    );
 
    

CREATE TABLE temperature (
   weatherstation_id text,
   event_time timestamp,
   temperature text,
   PRIMARY KEY (weatherstation_id,event_time)
);   

SELECT *
FROM temperature_by_day
WHERE   event_time = '2013-04-03 06:00:00'
   
   
cqlsh> USE mykeyspace;
cqlsh> CREATE TABLE mytable (a INT PRIMARY KEY, b INT, c INT, d INT);
cqlsh> INSERT INTO mytable (a, b, c, d) VALUES (1, 2, 3, 4);



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值