http://wayneshawn.github.io/2015/04/07/Cassandra-get-started/
在线资源
Cassandra Getting Started
- 2010-07-15 分布式 Key-Value 存储系统:Cassandra 入门
- 2015-03-25 Apache Cassandra Wiki
-DATASTAX Documentation
-Cassandra2.x中文教程系列Blog
Python Cassandra-driver
单节点Cassandra使用示范
1.启动Cassandra
若未设置环境变量,进入到Cassandra的bin目录下
[root@server1 bin]# ./cassandra -f
若未使用-f
选项,Cassandra会作为daemon进程运行。
2.使用cqlsh连接本地Cassandra
[root@server1 bin]# ./cqlsh -f [root@server1 bin]# ./cqlsh Connected to Test Cluster at localhost:9160. [cqlsh 4.1.1 | Cassandra 2.0.13 | CQL spec 3.1.1 | Thrift protocol 19.39.0] Use HELP for help. cqlsh> CREATE KEYSPACE mykeyspace WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh> use mykeyspace ; cqlsh:mykeyspace> create table users( name text primary key, age int, email text ); cqlsh:mykeyspace> insert into users(name, age, email) values('wayne', 21, 'leon_sin@126.com'); cqlsh:mykeyspace> insert into users(name, age, email) values('kerr', 22, 'singleon@126.com'); cqlsh:mykeyspace> cqlsh:mykeyspace> select * from users; name | age | email --------+-----+------------------- kerr | 22 | singleon@126.com lambda | 20 | 227089@qq.com wayne | 21 | leon@126.com |
CQL指代Cassandra Query Language。
3.使用Cassandra-driver示例cassandraDriverTest.py
from cassandra.cluster import Cluster cluster = Cluster() session = cluster.connect('mykeyspace') #1.you should use %s for all types of arguments #2.second argument should be a sequence, one element tuple should be ('blah',) session.execute('INSERT INTO users(name, age, email) VALUES(%s, %s, %s)', ('shawn', 21, 'shawn@163.com')) rows = session.execute('SELECT name, age, email FROM users') for (name, age, email) in rows: print name, age, email |
4.关闭Cassandra进程
可以使用ps -ef|grep cassandra
来查找其进程id,然后kill掉。
简单的两节点Cassandra集群配置
参考资源
-Initializing a multiple node cluster (single data center)
-简单配置cassandra集群
0.实验环境
VMware9.0.2,CentOS 6.5 64bits,Cassandra 2.0.13
1.先假定在如下系统上都安装了Cassandra
node0 192.168.56.100 (seed)
node1 192.168.56.201
2.更改防火墙设置或者直接关闭防火墙
对于CentOS,$setup
进入设置(图形界面),可以关闭防火墙
3.关闭Cassandra进程并清除数据
$ps -ef|grep cassandra
$kill pid
$rm -rf /var/lib/cassandra/data/system/*
|
4.设置/conf/cassandra.yaml
node0:
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "192.168.56.100"
listen_address: 192.168.56.100
rpc_address: 0.0.0.0
endpoint_snitch: GossipingPropertyFileSnitch
|
node1:
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "192.168.56.100"
listen_address: 192.168.56.201
rpc_address: 0.0.0.0
endpoint_snitch: GossipingPropertyFileSnitch
|
5.设置/conf/cassandra-rackdc.properties
例如:
# indicate the rack and dc for this node
dc=DC1
rack=RAC1
6.启动Cassandra
在我的实验中,node0的主机名为master,node1的主机名为slave1.之所以这样起,因为最初是安装一个hadoop集群配置教程来设置的。对于VMware搭建Cassandra集群来说,关键在于两个能ping通的虚拟机。
先启动node0的Cassandra
[root@master bin]# ./cassandra
再启动node1的Cassandra
[root@slave1 bin]# ./cassandra
7.检查ring是否在运行
列出来的节点状态应该UN(UP Normal)
[root@master bin]# ./nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.56.201 74.89 KB 256 100.0% e6121751-682e-4833-8de7-718eac08e718 RAC1
UN 192.168.56.100 105.21 KB 256 100.0% a153a679-5add-4995-adbf-
|
8.测试
在之前节点的测试中,我已经在mykeyspace的users表中插入了4条记录。
现在我们在node0中插入第五条记录.
[root@master bin]# ./cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.13 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> use mykeyspace ;
cqlsh:mykeyspace> select * from users;
name | age | email
--------+-----+-------------------
kerr | 22 | singleon@126.com
lambda | 20 | 2270891001@qq.com
wayne | 21 | leon_sin@126.com
shawn | 21 | shawn@163.com
(4 rows)
cqlsh:mykeyspace> insert into users(name, age, email) values('slave', 40, 'zwxx@126.com');
cqlsh:mykeyspace> select * from users;
name | age | email
--------+-----+-------------------
slave | 40 | zwxx@126.com
kerr | 22 | singleon@126.com
lambda | 20 | 2270891001@qq.com
wayne | 21 | leon_sin@126.com
shawn | 21 | shawn@163.com
(5 rows)
cqlsh:mykeyspace>
|
接下来,我们在node1进行查询,由于node1之前是使用VMware的clone功能从master拷贝来并作相应修改的,因此node1最初也在users表中有4条记录。现在我们去验证是否增加了一条记录。
[root@slave1 bin]# ./cassandra-cli -h 192.168.56.201
Connected to: "Test Cluster" on 192.168.56.201/9160
Welcome to Cassandra CLI version 2.0.13
The CLI is deprecated and will be removed in Cassandra 3.0. Consider migrating to cqlsh.
CQL is fully backwards compatible with Thrift data; see http://www.datastax.com/dev/blog/thrift-to-cql3
Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.
[default@mykeyspace]
[default@mykeyspace] list users;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: slave
=> (name=, value=, timestamp=1428733896613000)
=> (name=age, value=00000028, timestamp=1428733896613000)
=> (name=email, value=7a777878403132362e636f6d, timestamp=1428733896613000)
-------------------
RowKey: kerr
=> (name=, value=, timestamp=1428733672723000)
=> (name=age, value=00000016, timestamp=1428733672723000)
=> (name=email, value=73696e676c656f6e403132362e636f6d, timestamp=1428733672723000)
-------------------
RowKey: lambda
=> (name=, value=, timestamp=1428414359621000)
=> (name=age, value=00000014, timestamp=1428414359621000)
=> (name=email, value=323237303839313030314071712e636f6d, timestamp=1428414359621000)
-------------------
RowKey: wayne
=> (name=, value=, timestamp=1428733660801000)
=> (name=age, value=00000015, timestamp=1428733660801000)
=> (name=email, value=6c656f6e5f73696e403132362e636f6d, timestamp=1428733660801000)
-------------------
RowKey: shawn
=> (name=, value=, timestamp=1428417278072000)
=> (name=age, value=00000015, timestamp=1428417278072000)
=> (name=email, value=736861776e403136332e636f6d, timestamp=1428417278072000)
5 Rows Returned.
Elapsed time: 572 msec(s).
|
运行程序cassandraDriverTest.py,也能看到新增加了一条记录‘slave’
[Kerr@slave1 ~]$ python cassandraDriverTest.py
slave 40 zwxx@126.com
kerr 22 singleon@126.com
lambda 20 2270891001@qq.com
wayne 21 leon_sin@126.com
shawn 21 shawn@163.com
|
多节点Cassandra配置的地址问题
情景:搭建了3节点Cassandra集群,IP分别为172.16.37.17,172.16.37.18,172.16.37.19(seed 为172.16.37.18).只启动18和19上的Cassandra,那么从17节点能否使用Cassandra-driver连接数据库并查询?(节点之间互相能ping通的)
配置1
IP 172.16.37.18 seeds: "172.16.37.18" listen_address: c37b18 rpc_address: localhost endpoint_snitch: GossipingPropertyFileSnitch IP 172.16.37.17 seeds: "172.16.37.18" listen_address: c37b17 rpc_address: localhost endpoint_snitch: GossipingPropertyFileSnitch IP 172.16.37.19 seeds: "172.16.37.18" listen_address: c37b19 rpc_address: localhost endpoint_snitch: GossipingPropertyFileSnitch |
.17节点上的cassandra-driver测试程序
from cassandra.cluster import Cluster cluster = Cluster(['c37b18','c37b19']) session = cluster.connect('lsflog') res = session.execute('SELECT * FROM jcleanlog') print res |
结果:
session = cluster.connect('lsflog') File "/usr/lib/python2.6/site-packages/cassandra_driver-2.5.0-py2.6.egg/cassandra/cluster.py", line 756, in connect self.control_connection.connect() File "/usr/lib/python2.6/site-packages/cassandra_driver-2.5.0-py2.6.egg/cassandra/cluster.py", line 1867, in connect self._set_new_connection(self._reconnect_internal()) File "/usr/lib/python2.6/site-packages/cassandra_driver-2.5.0-py2.6.egg/cassandra/cluster.py", line 1902, in _reconnect_internal raise NoHostAvailable("Unable to connect to any servers", errors) cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', {'c37b18': error(111, "Tried connecting to [('172.16.37.18', 9042)]. Last error: Connection refused"), 'c37b19': error(111, "Tried connecting to [('172.16.37.19', 9042)]. Last error: Connection refused")})
相关知识(添加于20150513)
broadcast_rpc_address
- The broadcast_rpc_address should be an IP address that drivers/clients can connect to.link
- RPC address to broadcast to ·drivers· and ·other Cassandra nodes·. This cannot be set to 0.0.0.0. If left blank, this will be set to the value of rpc_address. If rpc_address is set to 0.0.0.0, broadcast_rpc_address must be set.(/conf/cassandra.yaml)
- 如果不设置broadcast_rpc_address,它会默认与设置的rpc_address相同。
rpc_address
- unset:
Resolves the address using the hostname configuration of the node. If left unset, the hostname must resolve to the IP address of this node using /etc/hostname, /etc/hosts, or DNS. - 0.0.0.0:
Listens on all configured interfaces, but you must set the broadcast_rpc_address to a value other than 0.0.0.0. - IP address
- hostname
关于Cassandra 的Port使用
(link)
- 7199 - JMX (was 8080 pre Cassandra 0.8.xx)
- 7000 - Internode communication (not used if TLS enabled)
- 7001 - TLS Internode communication (used if TLS enabled)
- 9160 - Thift client API
- 9042 - CQL native transport port
关于nodetool的使用
- 从node1尝试./nodetool <-h node2-ip> Connection refused
我目前只能在启动了Cassandra的节点上使用./nodetool status
比如我尝试从.17节点指定-h 172.16.37.18会Failed to connect to '172.16.37.18:7199' - ConnectException: 'Connection refused'.
值得注意的是从18节点自己来 ./nodetool status
正常./nodetool -h 172.16.37.18 status
Connection refused./nodetool -h localhost status
正常
似乎跟JMX设置有关
stackoverflow problem1
/conf/cassandra-env.sh
中有如下语句# jmx: metrics and administration interface # # add this if you're having trouble connecting: # JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=<public name>" # # see # https://blogs.oracle.com/jmxetc/entry/troubleshooting_connection_problems_in_jconsole # for more on configuring JMX through firewalls, etc. (Short version: # get it working with no firewall first.) # # Cassandra ships with JMX accessible *only* from localhost. # To enable remote JMX connections, uncomment lines below # with authentication and/or ssl enabled. See https://wiki.apache.org/cassandra/JmxSecurity # LOCAL_JMX=yes if [ "$LOCAL_JMX" = "yes" ]; then JVM_OPTS="$JVM_OPTS -Dcassandra.jmx.local.port=$JMX_PORT -XX:+DisableExplicitGC" else JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT" JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.rmi.port=$JMX_PORT" JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl=false" JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.authenticate=true" JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password" fi
注意上述中JMX accessible *only* from localhost
我尝试注释掉LOCAL_JMX=yes,并将后面的需要authenticate的语句注释掉,但是还是会报错。Error: Password file not found: /etc/cassandra/jmxremote.password
还需要进一步阅读关于jmx的文档。
配置2
IP 172.16.37.18 seeds: "172.16.37.18" listen_address: c37b18 rpc_address: 0.0.0.0 broadcast_rpc_address: 172.16.37.18 endpoint_snitch: GossipingPropertyFileSnitch IP 172.16.37.17 seeds: "172.16.37.18" listen_address: c37b17 rpc_address: 0.0.0.0 broadcast_rpc_address: 172.16.37.17 endpoint_snitch: GossipingPropertyFileSnitch IP 172.16.37.19 seeds: "172.16.37.18" listen_address: c37b19 rpc_address: 0.0.0.0 broadcast_rpc_address: 172.16.37.19 endpoint_snitch: GossipingPropertyFileSnitch |
.17节点上的cassandra-driver测试程序运行结果
[Row(job_id=1, event_time=2, idx=0)]