缓存数据库介绍
NoSQL(NoSQL = Not Only SQL ),意即“不仅仅是SQL”,泛指非关系型的数据库,随着互联网web2.0网站的兴起,传统的关系数据库在应付web2.0网站,特别是超大规模和高并发的SNS类型的web2.0纯动态网站已经显得力不从心,暴露了很多难以克服的问题,而非关系型的数据库则由于其本身的特点得到了非常迅速的发展。NoSQL数据库的产生就是为了解决大规模数据集合多重数据种类带来的挑战,尤其是大数据应用难题。
NoSQL数据库的四大分类
这一类数据库主要会使用到一个哈希表,这个表中有一个特定的键和一个指针指向特定的数据。Key/value模型对于IT系统来说的优势在于简单、易部署。但是如果DBA只对部分值进行查询或更新的时候,Key/value就显得效率低下了。[3] 举例如:Tokyo Cabinet/Tyrant, Redis, Voldemort, Oracle BDB.
列存储数据库。
这部分数据库通常是用来应对分布式存储的海量数据。键仍然存在,但是它们的特点是指向了多个列。这些列是由列家族来安排的。如:Cassandra, HBase, Riak.
文档型数据库
文档型数据库的灵感是来自于Lotus Notes办公软件的,而且它同第一种键值存储相类似。该类型的数据模型是版本化的文档,半结构化的文档以特定的格式存储,比如JSON。文档型数据库可 以看作是键值数据库的升级版,允许之间嵌套键值。而且文档型数据库比键值数据库的查询效率更高。如:CouchDB, MongoDb. 国内也有文档型数据库SequoiaDB,已经开源。
图形(Graph)数据库
图形结构的数据库同其他行列以及刚性结构的SQL数据库不同,它是使用灵活的图形模型,并且能够扩展到多个服务器上。NoSQL数据库没有标准的查询语言(SQL),因此进行数据库查询需要制定数据模型。许多NoSQL数据库都有REST式的数据接口或者查询API。[2] 如:Neo4J, InfoGrid, Infinite Graph.
因此,我们总结NoSQL数据库在以下的这几种情况下比较适用:1、数据模型比较简单;2、需要灵活性更强的IT系统;3、对数据库性能要求较高;4、不需要高度的数据一致性;5、对于给定key,比较容易映射复杂值的环境。
NoSQL数据库的四大分类表格分析
分类 | Examples举例 | 典型应用场景 | 数据模型 | 优点 | 缺点 |
---|---|---|---|---|---|
键值(key-value)[3] | Tokyo Cabinet/Tyrant, Redis, Voldemort, Oracle BDB | 内容缓存,主要用于处理大量数据的高访问负载,也用于一些日志系统等等。[3] | Key 指向 Value 的键值对,通常用hash table来实现[3] | 查找速度快 | 数据无结构化,通常只被当作字符串或者二进制数据[3] |
列存储数据库[3] | Cassandra, HBase, Riak | 分布式的文件系统 | 以列簇式存储,将同一列数据存在一起 | 查找速度快,可扩展性强,更容易进行分布式扩展 | 功能相对局限 |
文档型数据库[3] | CouchDB, MongoDb | Web应用(与Key-Value类似,Value是结构化的,不同的是数据库能够了解Value的内容) | Key-Value对应的键值对,Value为结构化数据 | 数据结构要求不严格,表结构可变,不需要像关系型数据库一样需要预先定义表结构 | 查询性能不高,而且缺乏统一的查询语法。 |
图形(Graph)数据库[3] | Neo4J, InfoGrid, Infinite Graph | 社交网络,推荐系统等。专注于构建关系图谱 | 图结构 | 利用图结构相关算法。比如最短路径寻址,N度关系查找等 | 很多时候需要对整个图做计算才能得出需要的信息,而且这种结构不太好做分布式的集群方案。[3] |
redis
介绍
redis是业界主流的key-value nosql 数据库之一。和Memcached类似,它支持存储的value类型相对更多,包括string(字符串)、list(链表)、set(集合)、zset(sorted set --有序集合)和hash(哈希类型)。这些数据类型都支持push/pop、add/remove及取交集并集和差集及更丰富的操作,而且这些操作都是原子性的。在此基础上,redis支持各种不同方式的排序。与memcached一样,为了保证效率,数据都是缓存在内存中。区别的是redis会周期性的把更新的数据写入磁盘或者把修改操作写入追加的记录文件,并且在此基础上实现了master-slave(主从)同步。
Redis优点
-
异常快速 : Redis是非常快的,每秒可以执行大约110000设置操作,81000个/每秒的读取操作。
-
支持丰富的数据类型 : Redis支持最大多数开发人员已经知道如列表,集合,可排序集合,哈希等数据类型。
这使得在应用中很容易解决的各种问题,因为我们知道哪些问题处理使用哪种数据类型更好解决。
-
操作都是原子的 : 所有 Redis 的操作都是原子,从而确保当两个客户同时访问 Redis 服务器得到的是更新后的值(最新值)。
-
MultiUtility工具:Redis是一个多功能实用工具,可以在很多如:缓存,消息传递队列中使用(Redis原生支持发布/订阅),在应用程序中,如:Web应用程序会话,网站页面点击数等任何短暂的数据;
安装Redis环境
要在 Ubuntu 上安装 Redis,打开终端,然后输入以下命令:
$sudo apt-get update
$sudo apt-get install redis-server
这将在您的计算机上安装Redis
启动 Redis
$redis-server
查看 redis 是否还在运行
$redis-cli
这将打开一个 Redis 提示符,如下图所示:
redis 127.0.0.1:6379>
在上面的提示信息中:127.0.0.1 是本机的IP地址,6379是 Redis 服务器运行的端口。现在输入 PING 命令,如下图所示:
redis 127.0.0.1:6379> ping
PONG
这说明现在你已经成功地在计算机上安装了 Redis。
Python操作Redis
1 2 3 4 5 6 7 |
|
在Ubuntu上安装Redis桌面管理器
要在Ubuntu 上安装 Redis桌面管理,可以从 http://redisdesktop.com/download 下载包并安装它。
Redis 桌面管理器会给你用户界面来管理 Redis 键和数据。
Redis API使用
redis-py 的API的使用可以分类为:
- 连接方式
- 连接池
- 操作
- String 操作
- Hash 操作
- List 操作
- Set 操作
- Sort Set 操作
- 管道
- 发布订阅
连接方式
1、操作模式
redis-py提供两个类Redis和StrictRedis用于实现Redis的命令,StrictRedis用于实现大部分官方的命令,并使用官方的语法和命令,Redis是StrictRedis的子类,用于向后兼容旧版本的redis-py。
1 2 3 4 5 |
|
2、连接池
redis-py使用connection pool来管理对一个redis server的所有连接,避免每次建立、释放连接的开销。默认,每个Redis实例都会维护一个自己的连接池。可以直接建立一个连接池,然后作为参数Redis,这样就可以实现多个Redis实例共享一个连接池。
操作
1. String操作
redis中的String在在内存中按照一个name对应一个value来存储。如图:
set(name, value, ex=None, px=None, nx=False, xx=False)
1 2 3 4 5 6 |
|
# EX 和 PX 可以同时出现,但后面给出的选项会覆盖前面给出的选项
redis 127.0.0.1:6379> SET key "value" EX 1000 PX 5000000
OK
redis 127.0.0.1:6379> TTL key
(integer) 4993 # 这是 PX 参数设置的值
redis 127.0.0.1:6379> SET another-key "value" PX 5000000 EX 1000
OK
redis 127.0.0.1:6379> TTL another-key
(integer) 997 # 这是 EX 参数设置的值
get(name)
返回 key 所关联的字符串值。
如果 key 不存在那么返回特殊值 nil 。
setnx(name, value)
将 key 的值设为 value ,当且仅当 key 不存在。
若给定的 key 已经存在,则 SETNX 不做任何动作。
redis> EXISTS job # job 不存在
(integer) 0
redis> SETNX job "programmer" # job 设置成功
(integer) 1
redis> SETNX job "code-farmer" # 尝试覆盖 job ,失败
(integer) 0
redis> GET job # 没有被覆盖
"programmer"
setex(name, time,value)
将值 value 关联到 key ,并将 key 的生存时间设为 seconds (以秒为单位)。
如果 key 已经存在, SETEX 命令将覆写旧值。
# key 已经存在时,SETEX 覆盖旧值
redis> SET cd "timeless"
OK
redis> SETEX cd 3000 "goodbye my love"
OK
psetex(name, time_ms, value)
这个命令和 SETEX 命令相似,但它以毫秒为单位设置 key 的生存时间,而不是像 SETEX 命令那样,以秒为单位。
redis> PSETEX mykey 1000 "Hello"
OK
redis> PTTL mykey
(integer) 999
redis> GET mykey
"Hello"
mset(*args, **kwargs)
批量设置值
MSET key value [key value ...]
同时设置一个或多个 key-value 对。
redis> MSET date "2012.3.30" time "11:00 a.m." weather "sunny"
OK
redis> MGET date time weather
1) "2012.3.30"
2) "11:00 a.m."
3) "sunny"
mget(keys, *args)
返回所有(一个或多个)给定 key 的值。
redis> MGET redis mongodb
getrange(key, start, end)
返回 key 中字符串值的子字符串,字符串的截取范围由 start 和 end 两个偏移量决定(包括 start 和 end 在内)。
负数偏移量表示从字符串最后开始计数, -1 表示最后一个字符, -2 表示倒数第二个,以此类推
redis> SET greeting "hello, my friend"
OK
redis> GETRANGE greeting 0 4 # 返回索引0-4的字符,包括4。
"hello"
redis> GETRANGE greeting -1 -5 # 不支持回绕操作
""
redis> GETRANGE greeting -3 -1 # 负数索引
"end"
redis> GETRANGE greeting 0 -1 # 从第一个到最后一个
"hello, my friend"
redis> GETRANGE greeting 0 1008611 # 值域范围不超过实际字符串,超过部分自动被符略
"hello, my friend"
setrange(name, offset, value)
1 2 3 4 |
|
redis> SET greeting "hello world"
OK
redis> SETRANGE greeting 6 "Redis"
(integer) 11
redis> GET greeting
"hello Redis"
# 对空字符串/不存在的 key 进行 SETRANGE
redis> EXISTS empty_string
(integer) 0
redis> SETRANGE empty_string 5 "Redis!" # 对不存在的 key 使用 SETRANGE
(integer) 11
redis> GET empty_string # 空白处被"\x00"填充
"\x00\x00\x00\x00\x00Redis!"
redis> SET greeting "hello world"
OK
redis> SETRANGE greeting 6 "Redis"
(integer) 11
redis> GET greeting
"hello Redis"
# 对空字符串/不存在的 key 进行 SETRANGE
redis> EXISTS empty_string
(integer) 0
redis> SETRANGE empty_string 5 "Redis!" # 对不存在的 key 使用 SETRANGE
(integer) 11
redis> GET empty_string # 空白处被"\x00"填充
"\x00\x00\x00\x00\x00Redis!"
setbit(name, offset, value)
redis> SET greeting "hello world"
OK
redis> SETRANGE greeting 6 "Redis"
(integer) 11
redis> GET greeting
"hello Redis"
# 对空字符串/不存在的 key 进行 SETRANGE
redis> EXISTS empty_string
(integer) 0
redis> SETRANGE empty_string 5 "Redis!" # 对不存在的 key 使用 SETRANGE
(integer) 11
redis> GET empty_string # 空白处被"\x00"填充
"\x00\x00\x00\x00\x00Redis!"
|
getbit(name, offset)
1 |
|
redis> SET greeting "hello world"
OK
redis> SETRANGE greeting 6 "Redis"
(integer) 11
redis> GET greeting
"hello Redis"
# 对空字符串/不存在的 key 进行 SETRANGE
redis> EXISTS empty_string
(integer) 0
redis> SETRANGE empty_string 5 "Redis!" # 对不存在的 key 使用 SETRANGE
(integer) 11
redis> GET empty_string # 空白处被"\x00"填充
"\x00\x00\x00\x00\x00Redis!"
bitcount(key, start=None, end=None)
1 2 3 4 5 |
|
strlen(name)
1 |
|
# 获取字符串的长度
redis> SET mykey "Hello world"
OK
redis> STRLEN mykey
(integer) 11
incr(self, name, amount=1)
1 2 3 4 5 6 7 |
|
redis> SET page_view 20
OK
redis> INCR page_view
(integer) 21
redis> GET page_view # 数字值在 Redis 中以字符串的形式保存
"21"
incrbyfloat(self, name, amount=1.0)
1 2 3 4 5 |
|
redis> SET mykey 10.50
OK
redis> INCRBYFLOAT mykey 0.1
"10.6"
decr(self, name, amount=1)
1 2 3 4 5 |
|
redis> SET failure_times 10
OK
redis> DECR failure_times
(integer) 9
append(key, value)
1 2 3 4 5 |
|
redis> EXISTS myphone # 确保 myphone 不存在
(integer) 0
redis> APPEND myphone "nokia" # 对不存在的 key 进行 APPEND ,等同于 SET myphone "nokia"
(integer) 5 # 字符长度
2. Hash操作
hash表现形式上有些像pyhton中的dict,可以存储一组关联性较强的数据 , redis中Hash在内存中的存储格式如下图:
hset(name, key, value)
1 2 3 4 5 6 7 8 9 |
|
redis> HSET website google "www.g.cn" # 设置一个新域
(integer) 1
redis> HSET website google "www.google.com" # 覆盖一个旧域
(integer) 0
hmset(name, mapping)
1 2 3 4 5 6 7 8 |
|
127.0.0.1:6379> hmset go k1 2 k2 3
OK
127.0.0.1:6379> hget go k1
"2"
hget(name,key)
1 |
|
hmget(name, keys, *args)
1 2 3 4 5 6 7 8 9 10 11 |
|
hmget go k1 k2
hgetall(name)
1 |
|
hlen(name)
1 |
|
hkeys(name)
1 |
|
hvals(name)
1 |
|
hexists(name, key)
1 |
|
hdel(name,*keys)
1 |
|
hincrby(name, key, amount=1)
1 2 3 4 5 |
|
127.0.0.1:6379> hincrby count page 200 (integer) 200 127.0.0.1:6379> hget count page "200" 127.0.0.1:6379> hincrby count var 200 (integer) 200 127.0.0.1:6379> hget count var "200" 127.0.0.1:6379> hincrby count var -50 (integer) 150 127.0.0.1:6379> hget count var "150"
hincrbyfloat(name, key, amount=1.0)
1 2 3 4 5 6 7 8 |
|
# 值和增量都是普通小数
redis> HSET mykey field 10.50
(integer) 1
redis> HINCRBYFLOAT mykey field 0.1
"10.6"
# 值和增量都是指数符号
redis> HSET mykey field 5.0e3
(integer) 0
redis> HINCRBYFLOAT mykey field 2.0e2
"5200"
# 对不存在的键执行 HINCRBYFLOAT
redis> EXISTS price
(integer) 0
redis> HINCRBYFLOAT price milk 3.5
"3.5"
redis> HGETALL price
1) "milk"
2) "3.5"
# 对不存在的域进行 HINCRBYFLOAT
redis> HGETALL price
1) "milk"
2) "3.5"
redis> HINCRBYFLOAT price coffee 4.5 # 新增 coffee 域
"4.5"
redis> HGETALL price
1) "milk"
2) "3.5"
3) "coffee"
4) "4.5"
hscan(name, cursor=0, match=None, count=None)
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
hscan_iter(name, match=None, count=None)
1 2 3 4 5 6 7 8 9 |
|
3. list
List操作,redis中的List在在内存中按照一个name对应一个List来存储。如图:
lpush(name,values)
1 2 3 4 5 6 7 8 |
|
redis> LPUSH languages python
(integer) 1
# 加入重复元素
redis> LPUSH languages python
(integer) 2
redis> LRANGE languages 0 -1 # 列表允许重复元素
1) "python"
2) "python"
# 加入多个元素
redis> LPUSH mylist a b c
(integer) 3
redis> LRANGE mylist 0 -1
1) "c"
2) "b"
3) "a"
lpushx(name,value)
1 2 3 4 |
|
llen(name)
1 |
|
linsert(name, where, refvalue, value))
1 2 3 4 5 6 7 |
|
lset(name, index, value)
1 2 3 4 5 6 |
|
>>lset my 0 3
lrem(name, value, num)
1 2 3 4 5 6 7 8 |
|
redis> LRANGE greet 0 4 # 查看所有元素
1) "morning"
2) "hello"
3) "morning"
4) "hello"
5) "morning"
redis> LREM greet 2 morning # 移除从表头到表尾,最先发现的两个 morning
(integer) 2 # 两个元素被移除
lpop(name)
1 2 3 4 |
|
lindex(name, index)
1 |
|
lrange(name, start, end)
1 2 3 4 5 |
|
>>rpush var ship a b c 1 2 3
>>lrange var 0 2
>>1) "lisp"
>>2) "a"
>>3) "b"
ltrim(name, start, end)
1 2 3 4 5 |
|
rpoplpush(src, dst)
1 2 3 4 |
|
blpop(keys, timeout)
1 2 3 4 5 6 7 8 |
|
redis> LPUSH command "update system..." # 为command列表增加一个值
(integer) 1
redis> LPUSH request "visit page" # 为request列表增加一个值
(integer) 1
redis> BLPOP job command 0 # job 列表为空,被跳过,紧接着 command 列表的第一个元素被弹出。
1) "command" # 弹出元素所属的列表
2) "update system..." # 弹出元素所属的值
brpoplpush(src, dst, timeout=0)
1 2 3 4 5 6 |
|
4.set集合操作
Set操作,Set集合就是不允许重复的列表
sadd(name,values)
1
#
name对应的集合中添加元素
# 添加单个元素
redis> SADD bbs "discuz.net"
(integer) 1
# 添加重复元素
redis> SADD bbs "discuz.net"
(integer) 0
# 添加多个元素
redis> SADD bbs "tianya.cn" "groups.google.com"
(integer) 2
redis> SMEMBERS bbs
1) "discuz.net"
2) "groups.google.com"
3) "tianya.cn"
scard(name)
1 |
|
sdiff(keys, *args)
1 |
|
redis> SMEMBERS peter's_movies
1) "bet man"
2) "start war"
3) "2012"
redis> SMEMBERS joe's_movies
1) "hi, lady"
2) "Fast Five"
3) "2012"
redis> SDIFF peter's_movies joe's_movies
1) "bet man"
2) "start war"
sdiffstore(dest, keys, *args)
1 |
|
sinter(keys, *args)
1 |
|
sinterstore(dest, keys, *args)
1 |
|
sismember(name, value)
1 |
|
smembers(name)
1 |
|
smove(src, dst, value)
1 |
|
spop(name)
1 |
|
srandmember(name, numbers)
1 |
|
srem(name, values)
1 |
|
sunion(keys, *args)
1 |
|
sunionstore(dest,keys, *args)
1 |
|
sscan(name, cursor=0, match=None, count=None)
sscan_iter(name, match=None, count=None)
1 |
|
有序集合,在集合的基础上,为每元素排序;元素的排序需要根据另外一个值来进行比较,所以,对于有序集合,每一个元素有两个值,即:值和分数,分数专门用来做排序。
zadd(name, *args, **kwargs)
1 2 3 4 5 |
|
zcard(name)
1 |
|
zcount(name, min, max)
1 |
|
zincrby(name, value, amount)
1 |
|
zrange( name, start, end, desc=False, withscores=False, score_cast_func=float)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
zrank(name, value)
1 2 3 4 |
|
zrem(name, values)
1 2 3 |
|
zremrangebyrank(name, min, max)
1 |
|
zremrangebyscore(name, min, max)
1 |
|
zscore(name, value)
1 |
|
zinterstore(dest, keys, aggregate=None)
1 2 |
|
zunionstore(dest, keys, aggregate=None)
1 2 |
|
zscan(name, cursor=0, match=None, count=None, score_cast_func=float)
zscan_iter(name, match=None, count=None,score_cast_func=float)
1 |
|
其他常用操作
delete(*names)
1
#
根据删除redis中的任意数据类型
exists(name)
1
#
检测redis的name是否存在
keys(pattern='*')
1
2
3
4
5
6
7
#
根据模型获取redis的name
#
更多:
#
KEYS * 匹配数据库中所有 key 。
#
KEYS h?llo 匹配 hello , hallo 和 hxllo 等。
#
KEYS h*llo 匹配 hllo 和 heeeeello 等。
#
KEYS h[ae]llo 匹配 hello 和 hallo ,但不匹配 hillo
expire(name ,time)
1
#
为某个redis的某个name设置超时时间
rename(src, dst)
1
#
对redis的name重命名为
move(name, db))
1
#
将redis的某个值移动到指定的db下
randomkey()
1
#
随机获取一个redis的name(不删除)
type(name)
1
#
获取name对应值的类型
scan(cursor=0, match=None, count=None)
scan_iter(match=None, count=None)
1
#
同字符串操作,用于增量迭代获取key
管道
redis-py默认在执行每次请求都会创建(连接池申请连接)和断开(归还连接池)一次连接操作,如果想要在一次请求中指定多个命令,则可以使用pipline实现一次请求指定多个命令,并且默认情况下一次pipline 是原子性操作。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
pool = redis.ConnectionPool(host = '10.211.55.4'
= 6379 )
|
发布订阅
发布者:服务器
订阅者:Dashboad和数据处理
Demo如下:
import redis
class RedisHelper:
def __init__(self):
self.__conn = redis.Redis(host='10.211.55.4')
self.chan_sub = 'fm104.5'
self.chan_pub = 'fm104.5'
def public(self, msg):
self.__conn.publish(self.chan_pub, msg)
return True
def subscribe(self):
pub = self.__conn.pubsub()
pub.subscribe(self.chan_sub)
pub.parse_response()
return pub
订阅者:
1 2 3 4 5 6 7 8 9 10 11 |
|
发布者:
1 2 3 4 5 6 7 |
|
更多参见:https://github.com/andymccurdy/redis-py/
http://doc.redisfans.com/
什么时候用关系型数据库,什么时候 用NoSQL?
Go for legacy relational databases (RDBMS) when:
- The data is well structured, and lends itself to a tabular arrangement (rows and columns) in a relational database. Typical examples: bank account info, customer order info, customer info, employee info, department info etc etc.
- Another aspect of the above point is : schema oriented data model. When you design a data model (tables, relationships etc) for a potential use of RDBMS, you need to come up with a well defined schema: there will be these many tables, each table having a known set of columns that store data in known typed format (CHAR, NUMBER, BLOB etc).
- Very Important: Consider whether the data is transactional in nature. In other words, whether the data will be stored, accessed and updated in the context of transactions providing the ACID semantics or is it okay to compromise some/all of these properties.
- Correctness is also important and any compromise is _unacceptable_. This stems from the fact that in most NoSQL databases, consistency is traded off in favor of performance and scalability (points on NoSQL databases are elaborated below).
- There is no strong/compelling need for a scale out architecture ; a database that linearly scales out (horizontal scaling) to multiple nodes in a cluster.
- The use case is not for “high speed data ingestion”.
- If the client applications are expecting to quickly stream large amounts of data in/out of the database then relational database may not be a good choice since they are not really designed for scaling write heavy workloads.
- In order to achieve ACID properties, lots of additional background work is done especially in writer (INSERT, UPDATE, DELETE) code paths. This definitely affects performance.
- The use case is not for “storing enormous amounts of data in the range of petabytes”.
Go for NoSQL databases when:
- There is no fixed (and predetermined) schema that data fits in:
- Scalability, Performance (high throughput and low operation latency), Continuous Availability are very important requirements to be met by the underlying architecture of database.
- Good choice for “High Speed Data Ingestion”. Such applications (for example IoT style) which generate millions of data points in a second and need a database capable of providing extreme write scalability.
- The inherent ability to horizontally scale allows to store large amounts of data across commodity servers in the cluster. They usually use low cost resources, and are able to linearly add compute and storage power as the demand grows.
source page https://www.quora.com/When-should-you-use-NoSQL-vs-regular-RDBMS
附赠redis性能测试
准备环境:
因为找不到可用的1000M网络机器,使用一根直通线将两台笔记本连起来组成1000M Ethernet网。没错,是直通线现在网卡都能自适应交叉线、直通线,速度不受影响,用了一段时间机器也没出问题。
服务端:T420 i5-2520M(2.5G)/8G ubuntu 11.10
客户端:Acer i5-2430M(2.4G)/4G mint 11
redis版本:2.6.9
测试脚本:./redis-benchmark -h xx -p xx -t set -q -r 1000 -l -d 20
长度 | 速度/sec | 带宽(MByte/s) 发送+接收 | CPU | CPU Detail |
20Byte | 17w | 24M+12M | 98.00% | Cpu0 : 21.0%us, 40.7%sy, 0.0%ni, 4.3%id, 0.0%wa, 0.0%hi, 34.0%si, 0.0%st |
100Byte | 17w | 37M+12M | 97.00% | Cpu0 : 20.3%us, 37.9%sy, 0.0%ni, 7.0%id, 0.0%wa, 0.0%hi, 34.9%si, 0.0%st |
512Byte | 12w | 76M+9M | 87.00% | Cpu0 : 20.9%us, 33.2%sy, 0.0%ni, 25.6%id, 0.0%wa, 0.0%hi, 20.3%si, 0.0%st |
1K | 9w | 94M+8M | 81.00% | Cpu0 : 19.9%us, 30.2%sy, 0.0%ni, 34.2%id, 0.0%wa, 0.0%hi, 15.6%si, 0.0%st |
2K | 5w | 105M+6M | 77.00% | Cpu0 : 18.0%us, 32.0%sy, 0.0%ni, 34.7%id, 0.0%wa, 0.0%hi, 15.3%si, 0.0%st |
5K | 2.2w | 119M+3.2M | 77.00% | Cpu0 : 22.5%us, 32.8%sy, 0.0%ni, 32.8%id, 0.0%wa, 0.0%hi, 11.9%si, 0.0%st |
10K | 1.1w | 119M+1.7M | 70.00% | Cpu0 : 18.2%us, 29.8%sy, 0.0%ni, 42.7%id, 0.0%wa, 0.0%hi, 9.3%si, 0.0%st |
20K | 0.57w | 120M+1M | 58.00% | Cpu0 : 17.8%us, 26.4%sy, 0.0%ni, 46.2%id, 0.0%wa, 0.0%hi, 9.6%si, 0.0%st |
value 在1K以上时,1000M网卡轻松的被跑慢,而且redis-server cpu连一个核心都没占用到,可见redis高效,redis的服务也不需要太高配置,瓶颈在网卡速度。
整理看redis的us都在20%左右,用户层代码资源占用比例都很小。
RabbitMQ队列
安装 http://www.rabbitmq.com/install-standalone-mac.html
安装python rabbitMQ module
1 2 3 4 5 6 7 |
|
实现最简单的队列通信
send端
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
body =
) print (
)
|
receive端
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
def
print (
% body)
print (
)
|
Work Queues
在这种模式下,RabbitMQ会默认把p发的消息依次分发给各个消费者(c),跟负载均衡差不多
消息提供者代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
message =
.join(sys.argv[ 1 :]) or
% time.time()
delivery_mode = 2 ,
print (
% message)
|
消费者代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
import
def
print (
% body)
print (
)
print (
)
|
此时,先启动消息生产者,然后再分别启动3个消费者,通过生产者多发送几条消息,你会发现,这几条消息会被依次分配到各个消费者身上
Doing a task can take a few seconds. You may wonder what happens if one of the consumers starts a long task and dies with it only partly done. With our current code once RabbitMQ delivers message to the customer it immediately removes it from memory. In this case, if you kill a worker we will lose the message it was just processing. We'll also lose all the messages that were dispatched to this particular worker but were not yet handled.
But we don't want to lose any tasks. If a worker dies, we'd like the task to be delivered to another worker.
In order to make sure a message is never lost, RabbitMQ supports message acknowledgments. An ack(nowledgement) is sent back from the consumer to tell RabbitMQ that a particular message had been received, processed and that RabbitMQ is free to delete it.
If a consumer dies (its channel is closed, connection is closed, or TCP connection is lost) without sending an ack, RabbitMQ will understand that a message wasn't processed fully and will re-queue it. If there are other consumers online at the same time, it will then quickly redeliver it to another consumer. That way you can be sure that no message is lost, even if the workers occasionally die.
There aren't any message timeouts; RabbitMQ will redeliver the message when the consumer dies. It's fine even if processing a message takes a very, very long time.
Message acknowledgments are turned on by default. In previous examples we explicitly turned them off via the no_ack=True flag. It's time to remove this flag and send a proper acknowledgment from the worker, once we're done with a task.
1 2 3 4 5 6 7 8 | def
print
% (body,)
'.'
print
|
Using this code we can be sure that even if you kill a worker using CTRL+C while it was processing a message, nothing will be lost. Soon after the worker dies all unacknowledged messages will be redelivered
消息持久化
We have learned how to make sure that even if the consumer dies, the task isn't lost(by default, if wanna disable use no_ack=True). But our tasks will still be lost if RabbitMQ server stops.
When RabbitMQ quits or crashes it will forget the queues and messages unless you tell it not to. Two things are required to make sure that messages aren't lost: we need to mark both the queue and messages as durable.
First, we need to make sure that RabbitMQ will never lose our queue. In order to do so, we need to declare it as durable:
1 | channel.queue_declare(queue = 'hello'
= True ) |
Although this command is correct by itself, it won't work in our setup. That's because we've already defined a queue called hello which is not durable. RabbitMQ doesn't allow you to redefine an existing queue with different parameters and will return an error to any program that tries to do that. But there is a quick workaround - let's declare a queue with different name, for exampletask_queue:
1 | channel.queue_declare(queue = 'task_queue'
= True ) |
This queue_declare change needs to be applied to both the producer and consumer code.
At that point we're sure that the task_queue queue won't be lost even if RabbitMQ restarts. Now we need to mark our messages as persistent - by supplying a delivery_mode property with a value 2.
1 2 3 4 5 6 |
delivery_mode = 2 ,
|
消息公平分发
如果Rabbit只管按顺序把消息发到各个消费者身上,不考虑消费者负载的话,很可能出现,一个机器配置不高的消费者那里堆积了很多消息处理不完,同时配置高的消费者却一直很轻松。为解决此问题,可以在各个消费者端,配置perfetch=1,意思就是告诉RabbitMQ在我这个消费者当前消息还没处理完的时候就不要再给我发新消息了。
1 |
|
带消息持久化+公平分发的完整代码
生产者端
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
channel.queue_declare(queue = 'task_queue'
= True )
message =
.join(sys.argv[ 1 :]) or
delivery_mode = 2 ,
print (
% message)
|
消费者端
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
channel.queue_declare(queue = 'task_queue'
= True ) print (
)
def
print (
% body)
print (
)
|
Publish\Subscribe(消息发布\订阅)
之前的例子都基本都是1对1的消息发送和接收,即消息只能发送到指定的queue里,但有些时候你想让你的消息被所有的Queue收到,类似广播的效果,这时候就要用到exchange了,
An exchange is a very simple thing. On one side it receives messages from producers and the other side it pushes them to queues. The exchange must know exactly what to do with a message it receives. Should it be appended to a particular queue? Should it be appended to many queues? Or should it get discarded. The rules for that are defined by the exchange type.
Exchange在定义的时候是有类型的,以决定到底是哪些Queue符合条件,可以接收消息
fanout: 所有bind到此exchange的queue都可以接收消息
direct: 通过routingKey和exchange决定的那个唯一的queue可以接收消息
topic:所有符合routingKey(此时可以是一个表达式)的routingKey所bind的queue可以接收消息
表达式符号说明:#代表一个或多个字符,*代表任何字符
例:#.a会匹配a.a,aa.a,aaa.a等
*.a会匹配a.a,b.a,c.a等
注:使用RoutingKey为#,Exchange Type为topic的时候相当于使用fanout
headers: 通过headers 来决定把消息发给哪些queue
消息publisher
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
message =
.join(sys.argv[ 1 :]) or
print (
% message)
|
消息subscriber
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
__author__ =
print (
)
def
print (
% body)
|
有选择的接收消息(exchange type=direct)
RabbitMQ还支持根据关键字发送,即:队列绑定关键字,发送者将数据根据关键字发送到消息exchange,exchange根据 关键字 判定应该将数据发送至指定队列。
publisher
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
severity = sys.argv[ 1 ] if len
1 else 'info' message =
.join(sys.argv[ 2 :]) or
print (
%
|
subscriber
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
sys.stderr.write(
% sys.argv[ 0 ])
print (
)
def
print (
%
|
更细致的消息过滤
Although using the direct exchange improved our system, it still has limitations - it can't do routing based on multiple criteria.
In our logging system we might want to subscribe to not only logs based on severity, but also based on the source which emitted the log. You might know this concept from the syslog unix tool, which routes logs based on both severity (info/warn/crit...) and facility (auth/cron/kern...).
That would give us a lot of flexibility - we may want to listen to just critical errors coming from 'cron' but also all logs from 'kern'.
publisher
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
routing_key = sys.argv[ 1 ] if len
1 else 'anonymous.info' message =
.join(sys.argv[ 2 :]) or
print (
%
|
subscriber
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
sys.stderr.write(
% sys.argv[ 0 ])
print (
)
def
print (
%
|
To receive all the logs run:
python receive_logs_topic.py "#"
To receive all logs from the facility "kern":
python receive_logs_topic.py "kern.*"
Or if you want to hear only about "critical" logs:
python receive_logs_topic.py "*.critical"
You can create multiple bindings:
python receive_logs_topic.py "kern.*" "*.critical"
And to emit a log with a routing key "kern.critical" type:
python emit_log_topic.py "kern.critical" "A critical kernel error"
Remote procedure call (RPC)
To illustrate how an RPC service could be used we're going to create a simple client class. It's going to expose a method named call which sends an RPC request and blocks until the answer is received:
1 2 3 |
print (
% result) |
RPC server
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
def
print (
% n)
= 'rpc_queue' )
print (
)
|
RPC client
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
self .channel.basic_consume( self
= True ,
def on_response( self
def call( self
print (
)
print (
% response) |