NoSQL(MongoDB,Riak,CouchDB,Redis)

最新推荐文章于 2021-02-26 16:42:04 发布

cuchou5321

最新推荐文章于 2021-02-26 16:42:04 发布

阅读量127

点赞数

本文来自主要介绍目前最为流行NOSQL 数据库，介绍了每个NOSQL数据库的优点，缺点，和适用的场景。
本文是来自德国的一位技术架构师写的，http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis，从 Kristof Kovacs技术文章上分析
Kristof Kovacs应该是位做机械相关的软件架构师。
最近一直在读英文资料，顺便翻译了下，有可能有些地方翻译的不准确，还请多多指教。

While SQL databases are insanely useful tools, their monopoly in the last decades is coming to an end. And it's just time: I can't even count the things that were forced into relational databases, but never really fitted them. (That being said, relational databases will always be the best for the stuff that has relations.)
关系型数据库（sql database）是非常有用的工具，sql 数据库垄断了10多年了，但这局面即将被打破。这只是时间问题：关系数据库不能适应需求的所有情况。
（话虽这么说，关系数据库永远是最好的关系型数据库）

But, the differences between NoSQL databases are much bigger than ever was between one SQL database and another. This means that it is a bigger responsibility on software architects to choose the appropriate one for a project right at the beginning.

但是，NoSQL数据库的不同远超过了关系数据库（sql database）和其他数据库。这意味着软件架构师在项目开始时有更大的需求空间选择好一个适合的 NoSQL数据库。
In this light, here is a comparison of Cassandra, Mongodb, CouchDB, Redis, Riak, Couchbase (ex-Membase), Hypertable, ElasticSearch, Accumulo, VoltDB, Kyoto Tycoon, Scalaris, Neo4j and HBase:
针对这种情况，这里对 Cassandra, Mongodb, CouchDB, Redis, Riak, Couchbase (ex-Membase), Hypertable, ElasticSearch, Accumulo, VoltDB, Kyoto Tycoon, Scalaris, Neo4j 和 HBase进行了比较：

The most popular ones

MongoDB (2.2)

Written in: C++
Main point: Retains some friendly properties of SQL. (Query, index)
License: AGPL (Drivers: Apache)
Protocol: Custom, binary (BSON)
Master/slave replication (auto failover with replica sets)
Sharding built-in
Queries are javascript expressions
Run arbitrary javascript functions server-side
Better update-in-place than CouchDB
Uses memory mapped files for data storage
Performance over features
Journaling (with --journal) is best turned on
On 32bit systems, limited to ~2.5Gb
An empty database takes up 192Mb
GridFS to store big data + metadata (not actually an FS)
Has geospatial indexing
Data center aware

Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks.

For example: For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.

?使用语言：C++
?主要特点：保留了SQL一些友好的特性（查询，索引）。
?许可： AGPL（发起者： Apache）
?通讯协议： Custom, binary（ BSON）（译注：没使用过该协议）
?Master/slave主从复制（支持自动故障转移与恢复）
?分片机制
?支持 javascript表达式查询
?可在服务器端执行任意javascript 函数
?update-in-place比CouchDB更好
?使用内存映射文件的数据存储
?性能性比功能性强
?最好打开日志功能（可修改参数journal）
?在32位操作系统上，数据库大小限制在约2.5Gb
?一个空数据库大约占192MB
?采用 GridFS存储大数据和元数据（不是真正的NF文件系统）
?有索引（译注：翻译不准）
?有数据中心意思（译注：翻译不准）
最佳的应用场景：适用于需要动态查询支持.如果你需要使用索引而不是 map/reduce功能；如果您需要对大数据库有良好的性能要求，
如果您需要使用CouchDB但数据改变太频繁而快速占满磁盘空间。

例如： Riak (V1.2)

Written in: Erlang & C, some JavaScript
Main point: Fault tolerance
License: Apache
Protocol: HTTP/REST or custom binary
Stores blobs
Tunable trade-offs for distribution and replication
Pre- and post-commit hooks in JavaScript or Erlang, for validation and security.
Map/reduce in JavaScript or Erlang
Links & link walking: use it as a graph database
Secondary indices: but only one at once
Large object support (Luwak)
Comes in "open source" and "enterprise" editions
Full-text search, indexing, querying with Riak Search
In the process of migrating the storing backend from "Bitcask" to Google's "LevelDB"
Masterless multi-site replication replication and SNMP monitoring are commercially licensed

Best used: If you want something Dynamo-like data storage, but no way you're gonna deal with the bloat and complexity. If you need very good single-site scalability, availability and fault-tolerance, but you're ready to pay for multi-site replication.

For example: Point-of-sales data collection. Factory control systems. Places where even seconds of downtime hurt. Could be used as a well-update-able web server.

?使用语言：Erlang&C，以及一些Javascript
?主要特点：具备容错能力
?许可： Apache
?通讯协议： HTTP/REST或者 custom binary
?存储集中

?可调谐的权衡分配和复制
?JavaScript or Erlang在操作前或操作后进行验证和安全支持。
?在JavaScript或Erlang中进行 Map/reduce管理
?连接及连接遍历：可作为图形数据库使用
?Secondary indices: but only one at once
?支持大数据对象
?提供开源版和企业版
?支持全文本搜索，索引，环型查询
?在迁移的过程中，存储后端可从“bitcask“到google的“LevelDB”
?支持Masterless多站点复制的复制和SNMP监控商业许可
最佳的应用场景：如果你想使用动态数据存储，但没有方式处理膨胀及复杂性的情况。如果你需要很好的单站点的可扩展性，可用性和容错性，但是你已经准备支付多站点复制。
例如：销售站点的数据搜集，工厂的控制系统；对宕机有严格要求的，适用于易于更新的 web服务器。

CouchDB (V1.2)

Written in: Erlang
Main point: DB consistency, ease of use
License: Apache
Protocol: HTTP/REST
Bi-directional (!) replication,
continuous or ad-hoc,
with conflict detection,
thus, master-master replication. (!)
MVCC - write operations do not block reads
Previous versions of documents are available
Crash-only (reliable) design
Needs compacting from time to time
Views: embedded map/reduce
Formatting views: lists & shows
Server-side document validation possible
Authentication possible
Real-time updates via '_changes' (!)
Attachment handling
thus, CouchApps (standalone js apps)

Best used: For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important.

For example: CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments.

?使用语言： Erlang
?主要特点：DB一致性、易于使用
?许可： Apache
?通讯协议： HTTP/REST
?双向数据复制
?持续进行或临时处理
?冲突检查
?master-master复制
?MVCC – 写操作不阻塞读
?文件之前的版本可用
?Crash-only（可靠的）设计
?实时的进行数据压缩
?视图：嵌入式map/reduce
?格式化视图：列表显示
?支持服务器端验证
?支持认证
?支持实时更新
?支持附件处理
?thus, CouchApps (standalone js apps)

最佳的应用场景：适用于数据变化较少，执行预定义查询的应用程序。适用于需要数据版本支持的应用程序。

例如： CRM、CMS系统。 master-master复制对于多站点部署是非常简单。

Redis (V2.8)

Written in: C
Main point: Blazing fast
License: BSD
Protocol: Telnet-like, binary safe
Disk-backed in-memory database,
Dataset size limited to computer RAM (but can span multiple machines' RAM with clustering)
Master-slave replication, automatic failover
Simple values or data structures by keys
but complex operations like ZREVRANGEBYSCORE.
INCR & co (good for rate limiting or statistics)
Bit operations (for example to implement bloom filters)
Has sets (also union/diff/inter)
Has lists (also a queue; blocking pop)
Has hashes (objects of multiple fields)
Sorted sets (high score table, good for range queries)
Lua scripting capabilities (!)
Has transactions (!)
Values can be set to expire (as in a cache)
Pub/Sub lets one implement messaging

Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory).

For example: Stock prices. Analytics. Real-time data collection. Real-time communication. And wherever you used memcached before.

?使用语言：C
?主要特点：运行非常快
?许可： BSD
?通讯协议： Telnet-like, binary safe
?有硬盘存储支持的内存数据库
?数据集的大小限制为计算机RAM（但可以跨多个机器的内存和聚类）
?主从复制，自动故障转移
?简单的值、键数据结构
?但也支持复杂操作，例如 ZREVRANGEBYSCORE
?INCR & co （适合计算极限值或统计数据）
?支持位操作
?支持 sets（同时也支持 union/diff/inter）
?支持列表（同时也支持队列、阻塞式pop操作）
?支持哈希表（带有多个属性的对象）
?支持排序
?支持事务
?可将数据设置成过期数据
?Pub/Sub允许用户实现消息机制

最佳应用场景：适用于数据变化快且数据库较小的应用程序（数据常在内存处理的）。

例如：股票价格、数据分析、实时数据搜集、实时通讯。

Clones of Google's Bigtable

HBase (V0.92.0)

Written in: Java
Main point: Billions of rows X millions of columns
License: Apache
Protocol: HTTP/REST (also Thrift)
Modeled after Google's BigTable
Uses Hadoop's HDFS as storage
Map/reduce with Hadoop
Query predicate push down via server side scan and get filters
Optimizations for real time queries
A high performance Thrift gateway
HTTP supports XML, Protobuf, and binary
Jruby-based (JIRB) shell
Rolling restart for configuration changes and minor upgrades
Random access performance is like MySQL
A cluster consists of several different types of nodes

Best used: Hadoop is probably still the best way to run Map/Reduce jobs on huge datasets. Best if you use the Hadoop/HDFS stack already.

For example: Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement.

?使用语言： Java
?主要特点：支持数十亿、数百万以上的列
?许可： Apache
?通讯协议：HTTP/REST
?Modeled after Google's BigTable

?使用类似 Hadoop's HDFS 进行存储
?Map/reduce with Hadoop
?实现谓词在server端扫描及过滤
?对实时查询进行优化
?支持 HTTP、XML、Protobuf、binary
?基于 Jruby（ JIRB）的shell
?实现滚动式配置和升级
?随机访问性能类似MySQL
?一个集群包含几种不同类型的节点

最佳的应用场景：适用于非常大的表，并且需要实时访问的场合。

例如：搜索引擎。分析日志数据。任何需要巨大的二维表的要求

Cassandra (1.2)

Written in: Java
Main point: Best of BigTable and Dynamo
License: Apache
Protocol: Thrift & custom binary CQL3
Tunable trade-offs for distribution and replication (N, R, W)
Querying by column, range of keys (Requires indices on anything that you want to search on)
BigTable-like features: columns, column families
Can be used as a distributed hash-table, with an "SQL-like" language, CQL (but no JOIN!)
Data can have expiration (set on INSERT)
Writes can be much faster than reads (when reads are disk-bound)
Map/reduce possible with Apache Hadoop
All nodes are similar, as opposed to Hadoop/HBase
Very good and reliable cross-datacenter replication

Best used: When you write more than you read (logging). If every component of the system must be in Java. ("No one gets fired for choosing Apache's stuff.")

For example: Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is data analysis.

?使用语言： Java
?主要特点：对大表格支持得最好
?许可： Apache
?通讯协议： Thrift & custom binary CQL3
?可调节的分发及复制(N, R, W)
?查询列范围内的键值
?类似大表格的特点：列，某个列集合
?Can be used as a distributed hash-table, with an "SQL-like" language, CQL (but no JOIN!)

?数据可以设置有效期
?写操作比读操作更快
?所有的节点都是相似的，而不像Hadoop/HBase
?很好的和可靠的跨数据中心的复制

最佳的应用场景：写操作多过读操作，如果每个系统组建都必须用 Java编写。
例如：银行业，金融业（虽然对于金融交易不是必须的，但这些产业对数据库的要求会比它们更大）写比读更快。

Neo4j (V1.5M02)

Written in: Java
Main point: Graph database - connected data
License: GPL, some features AGPL/commercial
Protocol: HTTP/REST (or embedding in Java)
Standalone, or embeddable into Java applications
Full ACID conformity (including durable data)
Both nodes and relationships can have metadata
Integrated pattern-matching-based query language ("Cypher")
Also the "Gremlin" graph traversal language can be used
Indexing of nodes and relationships
Nice self-contained web admin
Advanced path-finding with multiple algorithms
Indexing of keys and relationships
Optimized for reads
Has transactions (in the Java API)
Scriptable in Groovy
Online backup, advanced monitoring and High Availability is AGPL/commercial licensed

Best used: For graph-style, rich or complex, interconnected data. Neo4j is quite different from the others in this sense.

For example: For searching routes in social relations, public transport links, road maps, or network topologies.

所用语言： Java
特点：基于关系的图形数据库
使用许可： GPL，其中一些特性使用 AGPL/商业许可
协议： HTTP/REST（或嵌入在 Java中）
可独立使用或嵌入到 Java应用程序
图形的节点和边都可以带有元数据
很好的自带web管理功能
使用多种算法支持路径搜索
使用键值和关系进行索引
为读操作进行优化
支持事务（用 Java api）
使用 Gremlin图形遍历语言
支持 Groovy脚本
支持在线备份，高级监控及高可靠性支持使用 AGPL/商业许可

最佳应用场景：适用于图形一类数据。这是 Neo4j与其他nosql数据库的最显著区别

例如：社会关系，公共交通网络，地图及网络拓谱

Hypertable (0.9.6.5)

Written in: C++
Main point: A faster, smaller HBase
License: GPL 2.0
Protocol: Thrift, C++ library, or HQL shell
Implements Google's BigTable design
Run on Hadoop's HDFS
Uses its own, "SQL-like" language, HQL
Can search by key, by cell, or for values in column families.
Search can be limited to key/column ranges.
Sponsored by Baidu
Retains the last N historical values
Tables are in namespaces
Map/reduce with Hadoop

Best used: If you need a better HBase.

For example: Same as HBase, since it's basically a replacement: Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement.

?使用语言： C++
?主要特点：小的，非常快
?许可： GPL 2.0
?通讯协议： Thrift, C++ library, or HQL shell
?实现了谷歌的Bigtable的设计
?运行在Run on Hadoop's HDFS

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/12798004/viewspace-1148914/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/12798004/viewspace-1148914/

cuchou5321

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
NoSQL(MongoDB,Riak,CouchDB,Redis)

本文来自主要介绍目前最为流行NOSQL 数据库，介绍了每个NOSQL数据库的优点，缺点，和适用的场景。本文是来自德国的一位技术架构师写的，http://kkovacs.eu/cassandra-vs-mongodb-v...
复制链接

扫一扫