couchdb漫游指南

最新推荐文章于 2018-09-12 17:17:43 发布

zuroc

最新推荐文章于 2018-09-12 17:17:43 发布

阅读量222

点赞数

分类专栏：随笔文章标签： CouchDB Python MySQL Erlang 数据挖掘

随笔专栏收录该内容

262 篇文章 0 订阅

订阅专栏

[size=medium]
==== 启动 ====
balin couchdb # ./utils/run
参数有

-h display a short help message and exit
-V display version information and exit
-a FILE add configuration FILE to chain
-A DIR add configuration DIR to chain
-n reset configuration file chain (including system default)
-c print configuration file chain and exit
-i use the interactive Erlang shell
-b spawn as a background process(作为后台进程)
-p FILE set the background PID FILE (overrides system default)
-r SECONDS respawn background process after SECONDS (defaults to no respawn)
-o FILE redirect background stdout to FILE (defaults to $STDOUT_FILE)
-e FILE redirect background stderr to FILE (defaults to $STDERR_FILE)
-s display the status of the background process
-k kill the background process, will respawn if needed
-d shutdown the background process(关闭)

=== 配置 ===
balin couchdb # vi etc/couchdb/local_dev.ini
这里可以指定端口号等
常用的有

[httpd]
port = 12345
bind_address = 0.0.0.0

[admins]
用户名 = 密码

=== 使用 ===
http://123.123.123.123:12345/_utils/
可以创建数据库

===== python 中的使用 =====
http://123.123.123.123:12345/_utils/database.html?python-tests

以操纵这个数据库作为演示,python库有几个函数比如update([...])不能用,不能用用户名密码等等,也许要修一下...

from couchdb import client
from couchdb.client import Document
server = client.Server('http://123.123.123.123:12345/')

#打开数据库
db = server['python-tests']

#创建一条数据
doc_id = db.create({'type': 'Person', 'name': 'John Doe'})

#获取一条数据,这个doc接口和字典一样
doc = db[doc_id]

#_rev是版本,_id是uuid
doc.items()
[(u'_rev', u'1-2963977070'),
(u'_id', u'4a36f238f4facbe08762b1a958cef39e'),
(u'type', u'Person'),
(u'name', u'John Doe')]

#可以自己指定主键
db['JohnDoe'] = {'type': 'person', 'name': 'John Doe'}

db['JohnDoe'].items()
[(u'_rev', u'1-2744716443'),
(u'_id', u'JohnDoe'),
(u'type', u'person'),
(u'name', u'John Doe')]

#更新
badman = db['JohnDoe']
badman[age]=1234
db['JohnDoe'] = badman

#删除,可以用db.delete(doc)来删除
del db['JohnDoe']

#遍历
for row in db.view('_all_docs'):
print row.id

#看数据库信息
db.info()
{u'compact_running': False,
u'db_name': u'python-tests',
u'disk_size': 24381,
u'doc_count': 13,
u'doc_del_count': 0,
u'instance_start_time': u'1241518867280531',
u'purge_seq': 0,
u'update_seq': 21}

#文档可以有2进制的附件 put_attachment 用这个函数上传

# 查询,map_fun是一个js函数,emit是emit(key,value)。key,value均可是null
# web页面上有Select view查询,可以直接搜索测试
# 好像要用unicode字符不然找不到囧啊

db['/logo/xxx1.jpg']={"type":"logo","size":1}
db['/logo/xxx2.jpg']={"type":"logo","size":2}
db['/logo/xxx3.jpg']={"type":"logo","size":3}
db['/logo/xxx4.jpg']={"type":"logo","size":4}

map_fun = u'''
function(doc) {
if (doc.type=='logo')
emit(doc._id, doc.size);
}
'''

for row in db.query(map_fun):
print row
输出
<Row id=u'logo/xxx1.jpg', key=u'logo/xxx1.jpg', value=1>
<Row id=u'logo/xxx2.jpg', key=u'logo/xxx2.jpg', value=2>
<Row id=u'logo/xxx3.jpg', key=u'logo/xxx3.jpg', value=3>
<Row id=u'logo/xxx4.jpg', key=u'logo/xxx4.jpg', value=4>

我们还可以加上reduce函数
比如

reduce_fun = u'''
function(keys, values, rereduce) {
return sum(values)
}
'''
for row in db.query(map_fun,reduce_fun):
print row
输出
<Row key=None, value=10>

reduce 中 rereduce变量的含义如下

1. rereduce为false

* key为array，element为：[key,id]，key为map function产生的key，id为Document对应id
* values为array，elements为map function产生的结果
* 比如 reduce([ [key1,id1], [key2,id2], [key3,id3] ], [value1,value2,value3], false)

2. rereduce为true

* key为null
* values为array，element为前一次reduce返回的结果
* 比如reduce(null, [中间结果1,中间结果2,中间结果3], true)

这里有一些map/reduce演示的例子,比较好懂
http://labs.mudynamics.com/wp-content/uploads/2009/04/icouch.html

==== Creating Views ====

View 可以理解为索引了不过这个索引不是实时的...

接着上文的例子

db["_design/test"]={
"views":
{
"all": {
"map": "function(doc) { if (doc.type == 'logo') emit(null, doc) }"
},
"size_large_than_2": {
"map": "function(doc) { if (doc.size && parseInt(doc.size)>2) emit(null,doc) }"
},
"total_size": {
"map": "function(doc) { emit(null,parseInt(doc.size)) }",
"reduce": "function(keys,values) { return sum(values) }"
}
}
}

然后刷新
http://123.123.123.123:12345/_utils/database.html?python-tests

可以看到 select views中多了test

也可访问
http://123.123.123.123:12345/python-tests/_design/test/_view/all
可以加上limit这一类参数
http://123.123.123.123:12345/python-tests/_design/test/_view/all?limit=2
点着看看
http://123.123.123.123:12345/python-tests/_design/test/_view/all?limit=2&skip=1
这样可以做分页,不过(http://stackoverflow.com/questions/312163/pagination-in-couchdb)
"""A simpler method of doing this is to use the skip parameter to work
out the starting document for the page, however this method should be
used with caution. The skip \parameter simply causes the internal
engine to not return entries that it is iterating over. While this
gives the desired behaviour it is much slower than finding the first
document for the page by key. The more documents that are skipped, the
slower the request will be."""
所以最好配合下面的startkey之类的来用skip

类似参数还有

排序 descending=false
开始结束 startkey="abc"&endkey="abcZZZZZZZZZ"
可以用docid startkey_docid=null

group=true 用法有的复杂看这里,是用来合并的结果的
http://jchrisa.net/drl/_design/sofa/_show/post/markov_chains_using_couchdb_s_g

key可以的复杂的key比如
The query startkey=["foo"]&endkey=["foo",{}] will match most array
keys with "foo" in the first element, such as ["foo","bar"] and
["foo",["bar","baz"]]. However it will not match
["foo",{"an":"object"}]

点着看看

python中可以这样访问

for row in db.view('_design/test/_view/all'):
print row.id

输出
logo/xxx1.jpg
logo/xxx2.jpg
logo/xxx3.jpg
logo/xxx4.jpg

又如
for row in db.view('_design/test/_view/size_large_than_2'):
print row

<Row id=u'logo/xxx3.jpg', key=None, value={u'_rev': u'1-3347158087', u'_id': u'logo/xxx3.jpg', u'type': u'logo', u'size': 3}>
<Row id=u'logo/xxx4.jpg', key=None, value={u'_rev': u'1-1107796651', u'_id': u'logo/xxx4.jpg', u'type': u'logo', u'size': 4}>

==== 网络资源 ====

这里有一篇中文的简介,可以看看作为背景知识
http://hi.baidu.com/freeway2000/blog/item/8f76ed11f26bc8c1a6ef3f53.html

CouchDB: The Definitive Guide
http://books.couchdb.org/relax/

=== 注 ===
1.
couchdb 根据网上的测试表明
写入速度比 mysql 慢4倍
创建索引速度比 mysql 慢50倍

2.
couchdb 只写入不删除
需要定期做整理
类似垃圾回收的copy+删除
需要预留大量磁盘空间

3.
索引不是实时的
你可能看到的是旧的数据

我的个人看法:
单单看性能,couchdb的确很不理想
但是couchdb可以把数据以view的方式展现,要什么,就新建什么样的view
这种随心所欲索引方式,在不少应用的场合,
通过view的方式把这种查询结果持久化,
可以大大减少了把传统意义上的重复且相似查询.

举一个例子,
比如好友广播,
每一个人创建一个view,
也许可以吧...
[/size]

zuroc

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
couchdb漫游指南

[size=medium]==== 启动 ====balin couchdb # ./utils/run参数有 -h display a short help message and exit -V display version information and exit -a FILE add configurati...
复制链接

扫一扫

专栏目录