Hadoop之(AP):Cassandra介绍

 

介绍

Apache Cassandra是一套开源分布式NoSQL数据库系统。它最初由Facebook开发,用于储存收件箱等简单格式数据,集Google BigTable的数据模型与Amazon Dynamo的完全分布式的架构于一身。Facebook于2008将 Cassandra 开源,此后,由于Cassandra良好的可扩放性,被Digg、Twitter等知名Web 2.0网站所采纳,成为了一种流行的分布式结构化数据存储方案。 
它是一个开源的、分布式、无中心、支持水平扩展、高可用的KEY-VALUE类型的NOSQL数据库。 
官网文档地址http://cassandra.apache.org/doc/latest/

版本

Apache Cassandra,目前最新版3.0.13 
DataStax Community Edition(基于Apache Cassandra,推荐使用) 
- 支持yum、rpm快速安装 
- yum安装升级方便 
- 目录环境自动配置 
- 方便与OpsCenter结合 
CQL开发工具-DevCenter 
运维工具-OpsCenter 
DataStax是针对Cassandra进行商业化运作的公司。

系统架构

由亚马逊的Dynamo与Google 的BigTable两部分组成。

一致性

什么叫数据库的一致性?读操作一定会返回最新写入的结果。 
Cassandra是最终一致性(弱一致性):成功写入后,读取的并不一定是最新数据,但过一段时间(毫秒级别,跨机房时间会更长)所有副本才会达成一致。 
Cassandra是最终一致性原因:优化写入性能,支持ONE、Qurum、ALL等。 
Cassandra支持致性调节:当要求成功写入节点数与副本数一致时,即ALL时,认为是强一致性的。

CAP理论

CAP理论指出在一个分布式系统中,你只能强化其中两个方面 
- Consistent:一致性,每次读取都是最新的数据 
- Available:可用性,客户端总是可以读写数据 
- Partition Tolerant:分区耐受性,数据库分散到多台机器,即使某台机器故障,也可以提供服务

编程驱动

DataStax Java Driver for Apache Cassandra是Apache Cassandra的一个Java驱动。它支持Cassandra Query Language version 3(CQL3)和Cassandra的二进制协议。它主要包括以下模块。 
- driver-core:核心层 
- driver-mapping:对象映射 
- driver-extras:JAVA驱动的可选特性 
- driver-examples 
- driver-tests 
驱动文档地址http://docs.datastax.com/en/developer/java-driver/3.0/

关系数据库对比

Cassandra关系数据库
水平扩展Yes
高可用性Yes
查询方式CQL(类似SQL),API
一致性可调节一致性
事务支持1.X支持行级事务,2.X支持轻量级事务处理机制

Cassandra概述与数据模型设计https://wenku.baidu.com/view/8eaabe6987c24028915fc386.html

DB-Engine数据库排名https://db-engines.com/en/ranking 
dbengines排名
HBase位于15名。

数据类型

查询语言(CQL)

如果有不熟悉命令的,进行cqlsh命令行模式后,输入help进行查看支持的命令。

数据类型

CQL是一种类似SQL的查询语言,它支持相当丰富的数据类型,如下。 
cql_type ::= native_type | collection_type | user_defined_type | tuple_type | custom_type

native_type

native_type ::=  ASCII
                 | BIGINT
                 | BLOB
                 | BOOLEAN
                 | COUNTER
                 | DATE
                 | DECIMAL
                 | DOUBLE
                 | DURATION
                 | FLOAT
                 | INET
                 | INT
                 | SMALLINT
                 | TEXT
                 | TIME
                 | TIMESTAMP
                 | TIMEUUID
                 | TINYINT
                 | UUID
                 | VARCHAR
                 | VARINT

数据库数据类型与JAVA数据类型的映射关系。

CQL3 data typeGetter nameJava type
asciigetStringjava.lang.String
bigintgetLonglong
blobgetBytesjava.nio.ByteBuffer
booleangetBoolboolean
countergetLonglong
dategetDateLocalDate
decimalgetDecimaljava.math.BigDecimal
doublegetDoubledouble
floatgetFloatfloat
inetgetInetjava.net.InetAddress
intgetIntint
listgetListjava.util.List
mapgetMapjava.util.Map
setgetSetjava.util.Set
smallintgetShortshort
textgetStringjava.lang.String
timegetTimelong
timestampgetTimestampjava.util.Date
timeuuidgetUUIDjava.util.UUID
tinyintgetBytebyte
tuplegetTupleValueTupleValue
user-define typesgetUDTValueUDTValue
uuidgetUUIDjava.util.UUID
varchargetStringjava.lang.String
varintgetVarintjava.math.BigInteger

创建keyspace

Cassandra的存储抽象结构和数据库一样,keyspace对应关系数据库的database或schema,column family对应于table。 
示例如下: 
CREATE KEYSPACE iotstp WITH replication = {'class': 'SimpleStrategy','replication_factor': 1}; 
其它操作

ALTER KEYSPACE iotstp WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 4};
use iotstp;
DROP KEYSPACE iotstp;

查看当前有哪些keyspace使用命令desc keyspaces

创建Table

创建示例如下:

CREATE TABLE IF NOT EXISTS iotstp.user (
    id timeuuid,
    tenant_id timeuuid,
    email text,
    additional_info text,
    PRIMARY KEY (id, tenant_id)
);

修改示例:

alter_table_statement   ::=  ALTER TABLE table_name alter_table_instruction
alter_table_instruction ::=  ADD column_name cql_type ( ',' column_name cql_type )*
                             | DROP column_name ( column_name )*
                             | WITH options
ALTER TABLE iotstp.user ADD address varchar;

ALTER TABLE iotstp.user
       WITH comment = 'A most excellent and useful table'
       AND read_repair_chance = 0.2;

删除表

drop_table_statement ::=  DROP TABLE [ IF EXISTS ] table_name

清空表

truncate_statement ::=  TRUNCATE [ TABLE ] table_name

查看当前有哪些table使用命令desc tables

数据操作

select查询

select_statement ::=  SELECT [ JSON | DISTINCT ] ( select_clause | '*' )
                      FROM table_name
                      [ WHERE where_clause ]
                      [ GROUP BY group_by_clause ]
                      [ ORDER BY ordering_clause ]
                      [ PER PARTITION LIMIT (integer | bind_marker) ]
                      [ LIMIT (integer | bind_marker) ]
                      [ ALLOW FILTERING ]
select_clause    ::=  selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector         ::=  column_name
                      | term
                      | CAST '(' selector AS cql_type ')'
                      | function_name '(' [ selector ( ',' selector )* ] ')'
                      | COUNT '(' '*' ')'
where_clause     ::=  relation ( AND relation )*
relation         ::=  column_name operator term
                      '(' column_name ( ',' column_name )* ')' operator tuple_literal
                      TOKEN '(' column_name ( ',' column_name )* ')' operator term
operator         ::=  '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
group_by_clause  ::=  column_name ( ',' column_name )*
ordering_clause  ::=  column_name [ ASC | DESC ] ( ',' column_name [ ASC | DESC ] )*
SELECT name, occupation FROM users WHERE userid IN (199, 200, 207);
SELECT JSON name, occupation FROM users WHERE userid = 199;
SELECT name AS user_name, occupation AS user_occupation FROM users;

SELECT time, value
FROM events
WHERE event_type = 'myEvent'
  AND time > '2011-02-03'
  AND time <= '2012-01-01'

SELECT COUNT (*) AS user_count FROM users;

insert插入

insert_statement ::=  INSERT INTO table_name ( names_values | json_clause )
                      [ IF NOT EXISTS ]
                      [ USING update_parameter ( AND update_parameter )* ]
names_values     ::=  names VALUES tuple_literal
json_clause      ::=  JSON string [ DEFAULT ( NULL | UNSET ) ]
names            ::=  '(' column_name ( ',' column_name )* ')'
INSERT INTO NerdMovies (movie, director, main_actor, year)
                VALUES ('Serenity', 'Joss Whedon', 'Nathan Fillion', 2005)
      USING TTL 86400;

INSERT INTO NerdMovies JSON '{"movie": "Serenity",
                              "director": "Joss Whedon",
                              "year": 2005}';

update更新

update_statement ::=  UPDATE table_name
                      [ USING update_parameter ( AND update_parameter )* ]
                      SET assignment ( ',' assignment )*
                      WHERE where_clause
                      [ IF ( EXISTS | condition ( AND condition )*) ]
update_parameter ::=  ( TIMESTAMP | TTL ) ( integer | bind_marker )
assignment       ::=  simple_selection '=' term
                     | column_name '=' column_name ( '+' | '-' ) term
                     | column_name '=' list_literal '+' column_name
simple_selection ::=  column_name
                     | column_name '[' term ']'
                     | column_name '.' `field_name
condition        ::=  simple_selection operator term
UPDATE NerdMovies USING TTL 400
   SET director   = 'Joss Whedon',
       main_actor = 'Nathan Fillion',
       year       = 2005
 WHERE movie = 'Serenity';

UPDATE UserActions
   SET total = total + 2
   WHERE user = B70DE1D0-9908-4AE3-BE34-5573E5B09F14
     AND action = 'click';

delete 删除

delete_statement ::=  DELETE [ simple_selection ( ',' simple_selection ) ]
                      FROM table_name
                      [ USING update_parameter ( AND update_parameter )* ]
                      WHERE where_clause
                      [ IF ( EXISTS | condition ( AND condition )*) ]
  • DELETE FROM NerdMovies USING TIMESTAMP 1240003134 WHERE movie = 'Serenity'; DELETE phone FROM Users WHERE userid IN (C73DE1D3-AF08-40F3-B124-3FF3E5109F22, B70DE1D0-9908-4AE3-BE34-5573E5B09F14);

更新与删除只支持按主键进行,意思是where关键字后面必须携带主键字段。

物化视图(Materialized Views)

安装

datastax社区版

如果采用datastax社区免费版本,则下载地址https://academy.datastax.com/planet-cassandra//cassandra/,我下载的为windows 3.0.9版本。

配置

当cassandra安装好后,配置文件放在安装目录的conf目录下,如我的机器中F:\Program Files\DataStax Community\apache-cassandra\conf,配置文件名为cassandra.yaml,主要关注以下两部分。 
- Main runtime properties(主要的cassandra运行时属性) 
a) cluster_name:集群名,同一集群的多个节点,集群名要一致 
b) seeds: 种子节点,集群中的全部机器的ip,以逗号隔开 
c) storage_port: Cassandra服务器与服务器之间连接的端口号,一般不需要修改,但要保证此端口上没有防火墙 
d) listen_address: Cassandra集群中服务器与服务器之间相互通信的地址。如果留空,将默认使用服务器的机器名 
e) native_transport_port: 默认的CQL本地服务端口,本地的cql客户端与服务器交互的端口 
- Changing the location of directories(相关的文件目录) 
a) data_file_directories: 数据文件存放的目录,一个或多个 
b) commitlog_directory: 提交信息的日志文件存放的目录 
c) saved_caches_directory: 缓存存放的目录

apache

下载地址http://cassandra.apache.org/download/

Cassandra介绍与使用

2017年05月25日 11:22:38

阅读数:5570

介绍

Apache Cassandra是一套开源分布式NoSQL数据库系统。它最初由Facebook开发,用于储存收件箱等简单格式数据,集Google BigTable的数据模型与Amazon Dynamo的完全分布式的架构于一身。Facebook于2008将 Cassandra 开源,此后,由于Cassandra良好的可扩放性,被Digg、Twitter等知名Web 2.0网站所采纳,成为了一种流行的分布式结构化数据存储方案。 
它是一个开源的、分布式、无中心、支持水平扩展、高可用的KEY-VALUE类型的NOSQL数据库。 
官网文档地址http://cassandra.apache.org/doc/latest/

版本

Apache Cassandra,目前最新版3.0.13 
DataStax Community Edition(基于Apache Cassandra,推荐使用) 
- 支持yum、rpm快速安装 
- yum安装升级方便 
- 目录环境自动配置 
- 方便与OpsCenter结合 
CQL开发工具-DevCenter 
运维工具-OpsCenter 
DataStax是针对Cassandra进行商业化运作的公司。

系统架构

由亚马逊的Dynamo与Google 的BigTable两部分组成。

一致性

什么叫数据库的一致性?读操作一定会返回最新写入的结果。 
Cassandra是最终一致性(弱一致性):成功写入后,读取的并不一定是最新数据,但过一段时间(毫秒级别,跨机房时间会更长)所有副本才会达成一致。 
Cassandra是最终一致性原因:优化写入性能,支持ONE、Qurum、ALL等。 
Cassandra支持致性调节:当要求成功写入节点数与副本数一致时,即ALL时,认为是强一致性的。

CAP理论

CAP理论指出在一个分布式系统中,你只能强化其中两个方面 
- Consistent:一致性,每次读取都是最新的数据 
- Available:可用性,客户端总是可以读写数据 
- Partition Tolerant:分区耐受性,数据库分散到多台机器,即使某台机器故障,也可以提供服务

编程驱动

DataStax Java Driver for Apache Cassandra是Apache Cassandra的一个Java驱动。它支持Cassandra Query Language version 3(CQL3)和Cassandra的二进制协议。它主要包括以下模块。 
- driver-core:核心层 
- driver-mapping:对象映射 
- driver-extras:JAVA驱动的可选特性 
- driver-examples 
- driver-tests 
驱动文档地址http://docs.datastax.com/en/developer/java-driver/3.0/

关系数据库对比

Cassandra关系数据库
水平扩展Yes
高可用性Yes
查询方式CQL(类似SQL),API
一致性可调节一致性
事务支持1.X支持行级事务,2.X支持轻量级事务处理机制

Cassandra概述与数据模型设计https://wenku.baidu.com/view/8eaabe6987c24028915fc386.html

DB-Engine数据库排名https://db-engines.com/en/ranking 
dbengines排名
HBase位于15名。

数据类型

查询语言(CQL)

如果有不熟悉命令的,进行cqlsh命令行模式后,输入help进行查看支持的命令。

数据类型

CQL是一种类似SQL的查询语言,它支持相当丰富的数据类型,如下。 
cql_type ::= native_type | collection_type | user_defined_type | tuple_type | custom_type

native_type

native_type ::=  ASCII
                 | BIGINT
                 | BLOB
                 | BOOLEAN
                 | COUNTER
                 | DATE
                 | DECIMAL
                 | DOUBLE
                 | DURATION
                 | FLOAT
                 | INET
                 | INT
                 | SMALLINT
                 | TEXT
                 | TIME
                 | TIMESTAMP
                 | TIMEUUID
                 | TINYINT
                 | UUID
                 | VARCHAR
                 | VARINT

数据库数据类型与JAVA数据类型的映射关系。

CQL3 data typeGetter nameJava type
asciigetStringjava.lang.String
bigintgetLonglong
blobgetBytesjava.nio.ByteBuffer
booleangetBoolboolean
countergetLonglong
dategetDateLocalDate
decimalgetDecimaljava.math.BigDecimal
doublegetDoubledouble
floatgetFloatfloat
inetgetInetjava.net.InetAddress
intgetIntint
listgetListjava.util.List
mapgetMapjava.util.Map
setgetSetjava.util.Set
smallintgetShortshort
textgetStringjava.lang.String
timegetTimelong
timestampgetTimestampjava.util.Date
timeuuidgetUUIDjava.util.UUID
tinyintgetBytebyte
tuplegetTupleValueTupleValue
user-define typesgetUDTValueUDTValue
uuidgetUUIDjava.util.UUID
varchargetStringjava.lang.String
varintgetVarintjava.math.BigInteger

创建keyspace

Cassandra的存储抽象结构和数据库一样,keyspace对应关系数据库的database或schema,column family对应于table。 
示例如下: 
CREATE KEYSPACE iotstp WITH replication = {'class': 'SimpleStrategy','replication_factor': 1}; 
其它操作

ALTER KEYSPACE iotstp WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 4};
use iotstp;
DROP KEYSPACE iotstp;

查看当前有哪些keyspace使用命令desc keyspaces

创建Table

创建示例如下:

CREATE TABLE IF NOT EXISTS iotstp.user (
    id timeuuid,
    tenant_id timeuuid,
    email text,
    additional_info text,
    PRIMARY KEY (id, tenant_id)
);

修改示例:

alter_table_statement   ::=  ALTER TABLE table_name alter_table_instruction
alter_table_instruction ::=  ADD column_name cql_type ( ',' column_name cql_type )*
                             | DROP column_name ( column_name )*
                             | WITH options
ALTER TABLE iotstp.user ADD address varchar;

ALTER TABLE iotstp.user
       WITH comment = 'A most excellent and useful table'
       AND read_repair_chance = 0.2;

删除表

drop_table_statement ::=  DROP TABLE [ IF EXISTS ] table_name

清空表

truncate_statement ::=  TRUNCATE [ TABLE ] table_name

查看当前有哪些table使用命令desc tables

数据操作

select查询

select_statement ::=  SELECT [ JSON | DISTINCT ] ( select_clause | '*' )
                      FROM table_name
                      [ WHERE where_clause ]
                      [ GROUP BY group_by_clause ]
                      [ ORDER BY ordering_clause ]
                      [ PER PARTITION LIMIT (integer | bind_marker) ]
                      [ LIMIT (integer | bind_marker) ]
                      [ ALLOW FILTERING ]
select_clause    ::=  selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector         ::=  column_name
                      | term
                      | CAST '(' selector AS cql_type ')'
                      | function_name '(' [ selector ( ',' selector )* ] ')'
                      | COUNT '(' '*' ')'
where_clause     ::=  relation ( AND relation )*
relation         ::=  column_name operator term
                      '(' column_name ( ',' column_name )* ')' operator tuple_literal
                      TOKEN '(' column_name ( ',' column_name )* ')' operator term
operator         ::=  '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
group_by_clause  ::=  column_name ( ',' column_name )*
ordering_clause  ::=  column_name [ ASC | DESC ] ( ',' column_name [ ASC | DESC ] )*
  • SELECT name, occupation FROM users WHERE userid IN (199, 200, 207); SELECT JSON name, occupation FROM users WHERE userid = 199; SELECT name AS user_name, occupation AS user_occupation FROM users; SELECT time, value FROM events WHERE event_type = 'myEvent' AND time > '2011-02-03' AND time <= '2012-01-01' SELECT COUNT (*) AS user_count FROM users;

insert插入

insert_statement ::=  INSERT INTO table_name ( names_values | json_clause )
                      [ IF NOT EXISTS ]
                      [ USING update_parameter ( AND update_parameter )* ]
names_values     ::=  names VALUES tuple_literal
json_clause      ::=  JSON string [ DEFAULT ( NULL | UNSET ) ]
names            ::=  '(' column_name ( ',' column_name )* ')'
INSERT INTO NerdMovies (movie, director, main_actor, year)
                VALUES ('Serenity', 'Joss Whedon', 'Nathan Fillion', 2005)
      USING TTL 86400;

INSERT INTO NerdMovies JSON '{"movie": "Serenity",
                              "director": "Joss Whedon",
                              "year": 2005}';

update更新

update_statement ::=  UPDATE table_name
                      [ USING update_parameter ( AND update_parameter )* ]
                      SET assignment ( ',' assignment )*
                      WHERE where_clause
                      [ IF ( EXISTS | condition ( AND condition )*) ]
update_parameter ::=  ( TIMESTAMP | TTL ) ( integer | bind_marker )
assignment       ::=  simple_selection '=' term
                     | column_name '=' column_name ( '+' | '-' ) term
                     | column_name '=' list_literal '+' column_name
simple_selection ::=  column_name
                     | column_name '[' term ']'
                     | column_name '.' `field_name
condition        ::=  simple_selection operator term
UPDATE NerdMovies USING TTL 400
   SET director   = 'Joss Whedon',
       main_actor = 'Nathan Fillion',
       year       = 2005
 WHERE movie = 'Serenity';

UPDATE UserActions
   SET total = total + 2
   WHERE user = B70DE1D0-9908-4AE3-BE34-5573E5B09F14
     AND action = 'click';

delete 删除

delete_statement ::=  DELETE [ simple_selection ( ',' simple_selection ) ]
                      FROM table_name
                      [ USING update_parameter ( AND update_parameter )* ]
                      WHERE where_clause
                      [ IF ( EXISTS | condition ( AND condition )*) ]
DELETE FROM NerdMovies USING TIMESTAMP 1240003134
 WHERE movie = 'Serenity';

DELETE phone FROM Users
 WHERE userid IN (C73DE1D3-AF08-40F3-B124-3FF3E5109F22, B70DE1D0-9908-4AE3-BE34-5573E5B09F14);

更新与删除只支持按主键进行,意思是where关键字后面必须携带主键字段。

物化视图(Materialized Views)

安装

datastax社区版

如果采用datastax社区免费版本,则下载地址https://academy.datastax.com/planet-cassandra//cassandra/,我下载的为windows 3.0.9版本。

配置

当cassandra安装好后,配置文件放在安装目录的conf目录下,如我的机器中F:\Program Files\DataStax Community\apache-cassandra\conf,配置文件名为cassandra.yaml,主要关注以下两部分。 
- Main runtime properties(主要的cassandra运行时属性) 
a) cluster_name:集群名,同一集群的多个节点,集群名要一致 
b) seeds: 种子节点,集群中的全部机器的ip,以逗号隔开 
c) storage_port: Cassandra服务器与服务器之间连接的端口号,一般不需要修改,但要保证此端口上没有防火墙 
d) listen_address: Cassandra集群中服务器与服务器之间相互通信的地址。如果留空,将默认使用服务器的机器名 
e) native_transport_port: 默认的CQL本地服务端口,本地的cql客户端与服务器交互的端口 
- Changing the location of directories(相关的文件目录) 
a) data_file_directories: 数据文件存放的目录,一个或多个 
b) commitlog_directory: 提交信息的日志文件存放的目录 
c) saved_caches_directory: 缓存存放的目录

apache

下载地址http://cassandra.apache.org/download/

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
Learning Apache Cassandra - Second Edition by Sandeep Yarabarla English | 25 Apr. 2017 | ASIN: B01N52R0B5 | 360 Pages | AZW3 | 10.68 MB Key Features Install Cassandra and set up multi-node clusters Design rich schemas that capture the relationships between different data types Master the advanced features available in Cassandra 3.x through a step-by-step tutorial and build a scalable, high performance database layer Book Description Cassandra is a distributed database that stands out thanks to its robust feature set and intuitive interface, while providing high availability and scalability of a distributed data store. This book will introduce you to the rich feature set offered by Cassandra, and empower you to create and manage a highly scalable, performant and fault-tolerant database layer. The book starts by explaining the new features implemented in Cassandra 3.x and get you set up with Cassandra. Then you'll walk through data modeling in Cassandra and the rich feature set available to design a flexible schema. Next you'll learn to create tables with composite partition keys, collections and user-defined types and get to know different methods to avoid denormalization of data. You will then proceed to create user-defined functions and aggregates in Cassandra. Then, you will set up a multi node cluster and see how the dynamics of Cassandra change with it. Finally, you will implement some application-level optimizations using a Java client. By the end of this book, you'll be fully equipped to build powerful, scalable Cassandra database layers for your applications. What you will learn Install Cassandra Create keyspaces and tables with multiple clustering columns to organize related data Use secondary indexes and materialized views to avoid denormalization of data Effortlessly handle concurrent updates with collection columns Ensure data integrity with lightweight transactions and logged batches Understand eventual consistency and use the right consistency l

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值