Tokyo Cabinet乱贴(未整理,仅供自己做笔记)

最新推荐文章于 2021-03-03 20:38:16 发布

zuroc

最新推荐文章于 2021-03-03 20:38:16 发布

阅读量1.7k

点赞数

分类专栏：随笔文章标签： Python C C++ C# TokyoCabinet

随笔专栏收录该内容

262 篇文章 0 订阅

订阅专栏

[size=large]网上关于Tokyo Cabinet的简介(http://www.162cm.com/archives/681.html)
项目主页:http://tokyocabinet.sourceforge.net/

Tokyo Cabinet 是一个DBM的实现。这里的数据库由一系列key-value对的记录构成。key和value都可以是任意长度的字节序列,既可以是二进制也可以是字符串。这里没有数据类型和数据表的概念。

当做为Hash表数据库使用时，每个key必须是不同的,因此无法存储两个key相同的值。提供了以下访问方法:提供key,value参数来存储，按 key删除记录，按key来读取记录，另外，遍历key也被支持，虽然顺序是任意的不能被保证。这些方法跟Unix标准的DBM,例如GDBM,NDBM 等等是相同的，但是比它们的性能要好得多（因此可以替代它们)

当按B+树来存储时，拥用相同key的记录也能被存储。像hash表一样的读取，存储，删除函数也都有提供。记录按照用户提供的比较函数来存储。可以采用顺序或倒序的游标来读取每一条记录。依照这个原理，向前的字符串匹配搜索和整数区间搜索也实现了。另外，B＋树的事务也是可用的。

As for database of fixed-length array, records are stored with unique natural numbers. It is impossible to store two or more records with a key overlaps. Moreover, the length of each record is limited by the specified length. Provided operations are the same as ones of hash database.
对于定长的数组，记录按自然数来标记存储。不能存储key相同的两条或更多记录。另外，每条记录的长度受到限制。读取方法和hash表的一样。

Tokyo Cabinet是用C写的，同时提供c,perl,ruby,java的API。Tokyo Cabinet在提供了POSIX和C99的平台上都可用，它以GNU Lesser Public License协议发布。
_________________________________________

囧,没有python,网上有一个python bind,不过不好使.还是自己封装靠谱

先看看文档和 example
_________________________________________

文档节选

Tokyo Cabinet runs very fast.

...

1 million records is 1.5 seconds for hash database, and 2.2 seconds for B+ tree database. Moreover, the size of database of Tokyo Cabinet is very small. For example, overhead for a record is 16 bytes for hash database, and 5 bytes for B+ tree database.

...

The database size can be up to 8EB (9.22e18 bytes).

....

Due to this simple structure, fixed-length database works faster than hash database, and its concurrency in multi-thread environment is prominent.

...

Every operation for database is encapsulated and published as lucid methods as `open' (connect), `close' (disconnect), `put' (insert), `out' (remove), `get' (retrieve)

...

That is, while a writing thread is operating the database, other reading threads and writing threads are blocked. However, while a reading thread is operating the database, reading threads are not blocked.

在
tcbmttest.c tcfmttest.c tchmttest.c tcumttest.c
中有多线程测试的代码,可以看看是如何加锁的

_______________________________________

Tokyo Cabinet - The Abstract Database API

使用什么数据库取决于open时数据库的名字

`name' specifies the name of the database.

If it is "*", the database will be an on-memory database.

If its suffix is ".tch", the database will be a hash database.

If its suffix is ".tcb", the database will be a B+ tree database.

If its suffix is ".tcf", the database will be a fixed-length database.

Otherwise, this function fails.

Tuning parameters can trail the name, separated by "#".

Each parameter is composed of the name and the number, separated by "=".

On-memory database supports "bnum", "capnum", and "capsiz". Hash database supports "mode", "bnum", "apow", "fpow", "opts", "rcnum", and "xmsiz". B+ tree database supports "mode", "lmemb", "nmemb", "bnum", "apow", "fpow", "opts", "lcnum", "ncnum", and "xmsiz". Fixed-length database supports "mode", "width", and "limsiz". "capnum" specifies the capacity number of records. "capsiz" specifies the capacity size of using memory. Records spilled the capacity are removed by the storing order. "mode" can contain "w" of writer, "r" of reader, "c" of creating, "t" of truncating, "e" of no locking, and "f" of non-blocking lock. The default mode is relevant to "wc". "opts" can contains "l" of large option, "d" of Deflate option, "b" of BZIP2 option, and "t" of TCBS option. For example, "casket.tch#bnum=1000000#opts=ld" means that the name of the database file is "casket.tch", and the bucket number is 1000000, and the options are large and Deflate.

---------------------
#include <tcutil.h>
#include <tcadb.h>
#include <stdlib.h>
#include <stdbool.h>
#include <stdint.h>

int main(int argc, char **argv){

TCADB *adb;
char *key, *value;

/* create the object */
adb = tcadbnew();

/* open the database */
if(!tcadbopen(adb, "casket.tch")){
fprintf(stderr, "open error\n");
}

/* store records */
if(!tcadbput2(adb, "foo", "hop") ||
!tcadbput2(adb, "bar", "step") ||
!tcadbput2(adb, "baz", "jump")){
fprintf(stderr, "put error\n");
}

/* retrieve records */
value = tcadbget2(adb, "foo");
if(value){
printf("%s\n", value);
free(value);
} else {
fprintf(stderr, "get error\n");
}

/* traverse records */
tcadbiterinit(adb);
while((key = tcadbiternext2(adb)) != NULL){
value = tcadbget2(adb, key);
if(value){
printf("%s:%s\n", key, value);
free(value);
}
free(key);
}

/* close the database */
if(!tcadbclose(adb)){
fprintf(stderr, "close error\n");
}

/* delete the object */
tcadbdel(adb);

return 0;
}

___________________________________________

python的anydbm接口

anydbm.open(filename[, flag[, mode]])

Open the database file filename and return a corresponding object.

If the database file already exists, the whichdb module is used to determine its type and the appropriate module is used; if it does not exist, the first module listed above that can be imported is used.

The optional flag argument can be 'r' to open an existing database for reading only, 'w' to open an existing database for reading and writing, 'c' to create the database if it doesn’t exist, or 'n', which will always create a new empty database. If not specified, the default value is 'r'.

The optional mode argument is the Unix mode of the file, used only when the database has to be created. It defaults to octal 0666 (and will be modified by the prevailing umask).

exception anydbm.error
A tuple containing the exceptions that can be raised by each of the supported modules, with a unique exception also named anydbm.error as the first item — the latter is used when anydbm.error is raised.

代码演示

import anydbm

# Open database, creating it if necessary.
db = anydbm.open('cache', 'c')

# Record some values
db['www.python.org'] = 'Python Website'
db['www.cnn.com'] = 'Cable News Network'

# Loop through contents. Other dictionary methods
# such as .keys(), .values() also work.
for k, v in db.iteritems():
print k, '\t', v

# Storing a non-string key or value will raise an exception (most
# likely a TypeError).
db['www.yahoo.com'] = 4

# Close when done.
db.close()

___________________________________________

Python gdbm封装的源代码

/* DBM module using dictionary interface */
/* Author: Anthony Baxter, after dbmmodule.c */
/* Doc strings: Mitch Chapman */

#include "Python.h"

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include "gdbm.h"

#if defined(WIN32) && !defined(__CYGWIN__)
#include "gdbmerrno.h"
extern const char * gdbm_strerror(gdbm_error);
#endif

PyDoc_STRVAR(gdbmmodule__doc__,
"This module provides an interface to the GNU DBM (GDBM) library.\n\
\n\
This module is quite similar to the dbm module, but uses GDBM instead to\n\
provide some additional functionality. Please note that the file formats\n\
created by GDBM and dbm are incompatible. \n\
\n\
GDBM objects behave like mappings (dictionaries), except that keys and\n\
values are always strings. Printing a GDBM object doesn't print the\n\
keys and values, and the items() and values() methods are not\n\
supported.");

typedef struct {
PyObject_HEAD
int di_size; /* -1 means recompute */
GDBM_FILE di_dbm;
} dbmobject;

static PyTypeObject Dbmtype;

#define is_dbmobject(v) ((v)->ob_type == &Dbmtype)
#define check_dbmobject_open(v) if ((v)->di_dbm == NULL) \
{ PyErr_SetString(DbmError, "GDBM object has already been closed"); \
return NULL; }

static PyObject *DbmError;

PyDoc_STRVAR(gdbm_object__doc__,
"This object represents a GDBM database.\n\
GDBM objects behave like mappings (dictionaries), except that keys and\n\
values are always strings. Printing a GDBM object doesn't print the\n\
keys and values, and the items() and values() methods are not\n\
supported.\n\
\n\
GDBM objects also support additional operations such as firstkey,\n\
nextkey, reorganize, and sync.");

static PyObject *
newdbmobject(char *file, int flags, int mode)
{
dbmobject *dp;

dp = PyObject_New(dbmobject, &Dbmtype);
if (dp == NULL)
return NULL;
dp->di_size = -1;
errno = 0;
if ((dp->di_dbm = gdbm_open(file, 0, flags, mode, NULL)) == 0) {
if (errno != 0)
PyErr_SetFromErrno(DbmError);
else
PyErr_SetString(DbmError, gdbm_strerror(gdbm_errno));
Py_DECREF(dp);
return NULL;
}
return (PyObject *)dp;
}

/* Methods */

static void
dbm_dealloc(register dbmobject *dp)
{
if (dp->di_dbm)
gdbm_close(dp->di_dbm);
PyObject_Del(dp);
}

static Py_ssize_t
dbm_length(dbmobject *dp)
{
if (dp->di_dbm == NULL) {
PyErr_SetString(DbmError, "GDBM object has already been closed");
return -1;
}
if (dp->di_size < 0) {
datum key,okey;
int size;
okey.dsize=0;
okey.dptr=NULL;

size = 0;
for (key=gdbm_firstkey(dp->di_dbm); key.dptr;
key = gdbm_nextkey(dp->di_dbm,okey)) {
size++;
if(okey.dsize) free(okey.dptr);
okey=key;
}
dp->di_size = size;
}
return dp->di_size;
}

static PyObject *
dbm_subscript(dbmobject *dp, register PyObject *key)
{
PyObject *v;
datum drec, krec;

if (!PyArg_Parse(key, "s#", &krec.dptr, &krec.dsize) )
return NULL;

if (dp->di_dbm == NULL) {
PyErr_SetString(DbmError,
"GDBM object has already been closed");
return NULL;
}
drec = gdbm_fetch(dp->di_dbm, krec);
if (drec.dptr == 0) {
PyErr_SetString(PyExc_KeyError,
PyString_AS_STRING((PyStringObject *)key));
return NULL;
}
v = PyString_FromStringAndSize(drec.dptr, drec.dsize);
free(drec.dptr);
return v;
}

static int
dbm_ass_sub(dbmobject *dp, PyObject *v, PyObject *w)
{
datum krec, drec;

if (!PyArg_Parse(v, "s#", &krec.dptr, &krec.dsize) ) {
PyErr_SetString(PyExc_TypeError,
"gdbm mappings have string indices only");
return -1;
}
if (dp->di_dbm == NULL) {
PyErr_SetString(DbmError,
"GDBM object has already been closed");
return -1;
}
dp->di_size = -1;
if (w == NULL) {
if (gdbm_delete(dp->di_dbm, krec) < 0) {
PyErr_SetString(PyExc_KeyError,
PyString_AS_STRING((PyStringObject *)v));
return -1;
}
}
else {
if (!PyArg_Parse(w, "s#", &drec.dptr, &drec.dsize)) {
PyErr_SetString(PyExc_TypeError,
"gdbm mappings have string elements only");
return -1;
}
errno = 0;
if (gdbm_store(dp->di_dbm, krec, drec, GDBM_REPLACE) < 0) {
if (errno != 0)
PyErr_SetFromErrno(DbmError);
else
PyErr_SetString(DbmError,
gdbm_strerror(gdbm_errno));
return -1;
}
}
return 0;
}

static PyMappingMethods dbm_as_mapping = {
(lenfunc)dbm_length, /*mp_length*/
(binaryfunc)dbm_subscript, /*mp_subscript*/
(objobjargproc)dbm_ass_sub, /*mp_ass_subscript*/
};

PyDoc_STRVAR(dbm_close__doc__,
"close() -> None\n\
Closes the database.");

static PyObject *
dbm_close(register dbmobject *dp, PyObject *unused)
{
if (dp->di_dbm)
gdbm_close(dp->di_dbm);
dp->di_dbm = NULL;
Py_INCREF(Py_None);
return Py_None;
}

PyDoc_STRVAR(dbm_keys__doc__,
"keys() -> list_of_keys\n\
Get a list of all keys in the database.");

static PyObject *
dbm_keys(register dbmobject *dp, PyObject *unused)
{
register PyObject *v, *item;
datum key, nextkey;
int err;

if (dp == NULL || !is_dbmobject(dp)) {
PyErr_BadInternalCall();
return NULL;
}
check_dbmobject_open(dp);

v = PyList_New(0);
if (v == NULL)
return NULL;

key = gdbm_firstkey(dp->di_dbm);
while (key.dptr) {
item = PyString_FromStringAndSize(key.dptr, key.dsize);
if (item == NULL) {
free(key.dptr);
Py_DECREF(v);
return NULL;
}
err = PyList_Append(v, item);
Py_DECREF(item);
if (err != 0) {
free(key.dptr);
Py_DECREF(v);
return NULL;
}
nextkey = gdbm_nextkey(dp->di_dbm, key);
free(key.dptr);
key = nextkey;
}
return v;
}

PyDoc_STRVAR(dbm_has_key__doc__,
"has_key(key) -> boolean\n\
Find out whether or not the database contains a given key.");

static PyObject *
dbm_has_key(register dbmobject *dp, PyObject *args)
{
datum key;

if (!PyArg_ParseTuple(args, "s#:has_key", &key.dptr, &key.dsize))
return NULL;
check_dbmobject_open(dp);
return PyInt_FromLong((long) gdbm_exists(dp->di_dbm, key));
}

PyDoc_STRVAR(dbm_firstkey__doc__,
"firstkey() -> key\n\
It's possible to loop over every key in the database using this method\n\
and the nextkey() method. The traversal is ordered by GDBM's internal\n\
hash values, and won't be sorted by the key values. This method\n\
returns the starting key.");

static PyObject *
dbm_firstkey(register dbmobject *dp, PyObject *unused)
{
register PyObject *v;
datum key;

check_dbmobject_open(dp);
key = gdbm_firstkey(dp->di_dbm);
if (key.dptr) {
v = PyString_FromStringAndSize(key.dptr, key.dsize);
free(key.dptr);
return v;
}
else {
Py_INCREF(Py_None);
return Py_None;
}
}

PyDoc_STRVAR(dbm_nextkey__doc__,
"nextkey(key) -> next_key\n\
Returns the key that follows key in the traversal.\n\
The following code prints every key in the database db, without having\n\
to create a list in memory that contains them all:\n\
\n\
k = db.firstkey()\n\
while k != None:\n\
print k\n\
k = db.nextkey(k)");

static PyObject *
dbm_nextkey(register dbmobject *dp, PyObject *args)
{
register PyObject *v;
datum key, nextkey;

if (!PyArg_ParseTuple(args, "s#:nextkey", &key.dptr, &key.dsize))
return NULL;
check_dbmobject_open(dp);
nextkey = gdbm_nextkey(dp->di_dbm, key);
if (nextkey.dptr) {
v = PyString_FromStringAndSize(nextkey.dptr, nextkey.dsize);
free(nextkey.dptr);
return v;
}
else {
Py_INCREF(Py_None);
return Py_None;
}
}

PyDoc_STRVAR(dbm_reorganize__doc__,
"reorganize() -> None\n\
If you have carried out a lot of deletions and would like to shrink\n\
the space used by the GDBM file, this routine will reorganize the\n\
database. GDBM will not shorten the length of a database file except\n\
by using this reorganization; otherwise, deleted file space will be\n\
kept and reused as new (key,value) pairs are added.");

static PyObject *
dbm_reorganize(register dbmobject *dp, PyObject *unused)
{
check_dbmobject_open(dp);
errno = 0;
if (gdbm_reorganize(dp->di_dbm) < 0) {
if (errno != 0)
PyErr_SetFromErrno(DbmError);
else
PyErr_SetString(DbmError, gdbm_strerror(gdbm_errno));
return NULL;
}
Py_INCREF(Py_None);
return Py_None;
}

PyDoc_STRVAR(dbm_sync__doc__,
"sync() -> None\n\
When the database has been opened in fast mode, this method forces\n\
any unwritten data to be written to the disk.");

static PyObject *
dbm_sync(register dbmobject *dp, PyObject *unused)
{
check_dbmobject_open(dp);
gdbm_sync(dp->di_dbm);
Py_INCREF(Py_None);
return Py_None;
}

static PyMethodDef dbm_methods[] = {
{"close", (PyCFunction)dbm_close, METH_NOARGS, dbm_close__doc__},
{"keys", (PyCFunction)dbm_keys, METH_NOARGS, dbm_keys__doc__},
{"has_key", (PyCFunction)dbm_has_key, METH_VARARGS, dbm_has_key__doc__},
{"firstkey", (PyCFunction)dbm_firstkey,METH_NOARGS, dbm_firstkey__doc__},
{"nextkey", (PyCFunction)dbm_nextkey, METH_VARARGS, dbm_nextkey__doc__},
{"reorganize",(PyCFunction)dbm_reorganize,METH_NOARGS, dbm_reorganize__doc__},
{"sync", (PyCFunction)dbm_sync, METH_NOARGS, dbm_sync__doc__},
{NULL, NULL} /* sentinel */
};

static PyObject *
dbm_getattr(dbmobject *dp, char *name)
{
return Py_FindMethod(dbm_methods, (PyObject *)dp, name);
}

static PyTypeObject Dbmtype = {
PyObject_HEAD_INIT(0)
0,
"gdbm.gdbm",
sizeof(dbmobject),
0,
(destructor)dbm_dealloc, /*tp_dealloc*/
0, /*tp_print*/
(getattrfunc)dbm_getattr, /*tp_getattr*/
0, /*tp_setattr*/
0, /*tp_compare*/
0, /*tp_repr*/
0, /*tp_as_number*/
0, /*tp_as_sequence*/
&dbm_as_mapping, /*tp_as_mapping*/
0, /*tp_hash*/
0, /*tp_call*/
0, /*tp_str*/
0, /*tp_getattro*/
0, /*tp_setattro*/
0, /*tp_as_buffer*/
0, /*tp_xxx4*/
gdbm_object__doc__, /*tp_doc*/
};

/* ----------------------------------------------------------------- */

PyDoc_STRVAR(dbmopen__doc__,
"open(filename, [flags, [mode]]) -> dbm_object\n\
Open a dbm database and return a dbm object. The filename argument is\n\
the name of the database file.\n\
\n\
The optional flags argument can be 'r' (to open an existing database\n\
for reading only -- default), 'w' (to open an existing database for\n\
reading and writing), 'c' (which creates the database if it doesn't\n\
exist), or 'n' (which always creates a new empty database).\n\
\n\
Some versions of gdbm support additional flags which must be\n\
appended to one of the flags described above. The module constant\n\
'open_flags' is a string of valid additional flags. The 'f' flag\n\
opens the database in fast mode; altered data will not automatically\n\
be written to the disk after every change. This results in faster\n\
writes to the database, but may result in an inconsistent database\n\
if the program crashes while the database is still open. Use the\n\
sync() method to force any unwritten data to be written to the disk.\n\
The 's' flag causes all database operations to be synchronized to\n\
disk. The 'u' flag disables locking of the database file.\n\
\n\
The optional mode argument is the Unix mode of the file, used only\n\
when the database has to be created. It defaults to octal 0666. ");

static PyObject *
dbmopen(PyObject *self, PyObject *args)
{
char *name;
char *flags = "r";
int iflags;
int mode = 0666;

if (!PyArg_ParseTuple(args, "s|si:open", &name, &flags, &mode))
return NULL;
switch (flags[0]) {
case 'r':
iflags = GDBM_READER;
break;
case 'w':
iflags = GDBM_WRITER;
break;
case 'c':
iflags = GDBM_WRCREAT;
break;
case 'n':
iflags = GDBM_NEWDB;
break;
default:
PyErr_SetString(DbmError,
"First flag must be one of 'r', 'w', 'c' or 'n'");
return NULL;
}
for (flags++; *flags != '\0'; flags++) {
char buf[40];
switch (*flags) {
#ifdef GDBM_FAST
case 'f':
iflags |= GDBM_FAST;
break;
#endif
#ifdef GDBM_SYNC
case 's':
iflags |= GDBM_SYNC;
break;
#endif
#ifdef GDBM_NOLOCK
case 'u':
iflags |= GDBM_NOLOCK;
break;
#endif
default:
PyOS_snprintf(buf, sizeof(buf), "Flag '%c' is not supported.",
*flags);
PyErr_SetString(DbmError, buf);
return NULL;
}
}

return newdbmobject(name, iflags, mode);
}

static char dbmmodule_open_flags[] = "rwcn"
#ifdef GDBM_FAST
"f"
#endif
#ifdef GDBM_SYNC
"s"
#endif
#ifdef GDBM_NOLOCK
"u"
#endif
;

static PyMethodDef dbmmodule_methods[] = {
{ "open", (PyCFunction)dbmopen, METH_VARARGS, dbmopen__doc__},
{ 0, 0 },
};

PyMODINIT_FUNC
initgdbm(void) {
PyObject *m, *d, *s;

Dbmtype.ob_type = &PyType_Type;
m = Py_InitModule4("gdbm", dbmmodule_methods,
gdbmmodule__doc__, (PyObject *)NULL,
PYTHON_API_VERSION);
if (m == NULL)
return;
d = PyModule_GetDict(m);
DbmError = PyErr_NewException("gdbm.error", NULL, NULL);
if (DbmError != NULL) {
PyDict_SetItemString(d, "error", DbmError);
s = PyString_FromString(dbmmodule_open_flags);
PyDict_SetItemString(d, "open_flags", s);
Py_DECREF(s);
}
}

[/size]

zuroc

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Tokyo Cabinet乱贴(未整理,仅供自己做笔记)

[size=large]网上关于Tokyo Cabinet的简介(http://www.162cm.com/archives/681.html)项目主页:http://tokyocabinet.sourceforge.net/Tokyo Cabinet 是一个DBM的实现。这里的数据库由一系列key-value对的记录构成。key和value都可以是任意长度的字节序列,既可以是二进制也...
复制链接

扫一扫