NoSQL(MongoDB,Riak,CouchDB,Redis)

本文来自主要介绍目前最为流行NOSQL 数据库,介绍了每个NOSQL数据库的优点,缺点,和适用的场景。
本文是来自德国的一位技术架构师写的,http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis,从 Kristof Kovacs技术文章上分析
Kristof Kovacs应该是位做机械相关的软件架构师。
最近一直在读英文资料,顺便翻译了下,有可能有些地方翻译的不准确,还请多多指教。
bb
While SQL databases are insanely useful tools, their monopoly in the last decades is coming to an end. And it's just time: I can't even count the things that were forced into relational databases, but never really fitted them. (That being said, relational databases will always be the best for the stuff that has relations.)
关系型数据库(sql database)是非常有用的工具,sql 数据库垄断了10多年了,但这局面即将被打破。这只是时间问题:关系数据库不能适应需求的所有情况。
(话虽这么说,关系数据库永远是最好的关系型数据库)

But, the differences between NoSQL databases are much bigger than ever was between one SQL database and another. This means that it is a bigger responsibility on software architects to choose the appropriate one for a project right at the beginning.

但是,NoSQL数据库的不同远超过了关系数据库(sql database)和其他数据库。这意味着软件架构师在项目开始时有更大的需求空间选择好一个适合的 NoSQL数据库。
In this light, here is a comparison of Cassandra, Mongodb, CouchDB, Redis, Riak, Couchbase (ex-Membase), Hypertable, ElasticSearch, Accumulo, VoltDB, Kyoto Tycoon, Scalaris, Neo4j and HBase:
针对这种情况,这里对 Cassandra, Mongodb, CouchDB, Redis, Riak, Couchbase (ex-Membase), Hypertable, ElasticSearch, Accumulo, VoltDB, Kyoto Tycoon, Scalaris, Neo4j 和 HBase进行了比较:

The most popular ones

MongoDB (2.2)

  • Written in: C++
  • Main point: Retains some friendly properties of SQL. (Query, index)
  • License: AGPL (Drivers: Apache)
  • Protocol: Custom, binary (BSON)
  • Master/slave replication (auto failover with replica sets)
  • Sharding built-in
  • Queries are javascript expressions
  • Run arbitrary javascript functions server-side
  • Better update-in-place than CouchDB
  • Uses memory mapped files for data storage
  • Performance over features
  • Journaling (with --journal) is best turned on
  • On 32bit systems, limited to ~2.5Gb
  • An empty database takes up 192Mb
  • GridFS to store big data + metadata (not actually an FS)
  • Has geospatial indexing
  • Data center aware

Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks.

For example: For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.


?使用语言:C++
?主要特点:保留了SQL一些友好的特性(查询,索引)。
?许可: AGPL(发起者: Apache)
?通讯协议: Custom, binary( BSON)(译注:没使用过该协议)
?Master/slave主从复制(支持自动故障转移与恢复)
?分片机制
?支持 javascript表达式查询
?可在服务器端执行任意javascript 函数
?update-in-place比CouchDB更好
?使用内存映射文件的数据存储
?性能性比功能性强
?最好打开日志功能(可修改参数journal)
?在32位操作系统上,数据库大小限制在约2.5Gb
?一个空数据库大约占192MB
?采用 GridFS存储大数据和元数据(不是真正的NF文件系统)
?有索引(译注:翻译不准)
?有数据中心意思(译注:翻译不准)
最佳的应用场景:适用于需要动态查询支持.如果你需要使用索引而不是 map/reduce功能;如果您需要对大数据库有良好的性能要求,
如果您需要使用CouchDB但数据改变太频繁而快速占满磁盘空间。

例如: Riak (V1.2)

  • Written in: Erlang & C, some JavaScript
  • Main point: Fault tolerance
  • License: Apache
  • Protocol: HTTP/REST or custom binary
  • Stores blobs
  • Tunable trade-offs for distribution and replication
  • Pre- and post-commit hooks in JavaScript or Erlang, for validation and security.
  • Map/reduce in JavaScript or Erlang
  • Links & link walking: use it as a graph database
  • Secondary indices: but only one at once
  • Large object support (Luwak)
  • Comes in "open source" and "enterprise" editions
  • Full-text search, indexing, querying with Riak Search
  • In the process of migrating the storing backend from "Bitcask" to Google's "LevelDB"
  • Masterless multi-site replication replication and SNMP monitoring are commercially licensed

Best used: If you want something Dynamo-like data storage, but no way you're gonna deal with the bloat and complexity. If you need very good single-site scalability, availability and fault-tolerance, but you're ready to pay for multi-site replication.

For example: Point-of-sales data collection. Factory control systems. Places where even seconds of downtime hurt. Could be used as a well-update-able web server.

?使用语言:Erlang&C,以及一些Javascript
?主要特点:具备容错能力
?许可: Apache
?通讯协议: HTTP/REST或者 custom binary
?存储集中

?可调谐的权衡分配和复制
?JavaScript or Erlang在操作前或操作后进行验证和安全支持。
?在JavaScript或Erlang中进行 Map/reduce管理
?连接及连接遍历:可作为图形数据库使用
?Secondary indices: but only one at once
?支持大数据对象
?提供开源版和企业版
?支持全文本搜索,索引,环型查询
?在迁移的过程中,存储后端可从“bitcask“到google的“LevelDB”
?支持Masterless多站点复制的复制和SNMP监控商业许可
最佳的应用场景:如果你想使用动态数据存储,但没有方式处理膨胀及复杂性的情况。如果你需要很好的单站点的可扩展性,可用性和容错性,但是你已经准备支付多站点复制。
例如:销售站点的数据搜集,工厂的控制系统;对宕机有严格要求的,适用于易于更新的 web服务器。

 

 

CouchDB (V1.2)

  • Written in: Erlang
  • Main point: DB consistency, ease of use
  • License: Apache
  • Protocol: HTTP/REST
  • Bi-directional (!) replication,
  • continuous or ad-hoc,
  • with conflict detection,
  • thus, master-master replication. (!)
  • MVCC - write operations do not block reads
  • Previous versions of documents are available
  • Crash-only (reliable) design
  • Needs compacting from time to time
  • Views: embedded map/reduce
  • Formatting views: lists & shows
  • Server-side document validation possible
  • Authentication possible
  • Real-time updates via '_changes' (!)
  • Attachment handling
  • thus, CouchApps (standalone js apps)

Best used: For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important.

For example: CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments.

?使用语言: Erlang
?主要特点:DB一致性、易于使用
?许可: Apache
?通讯协议: HTTP/REST
?双向数据复制
?持续进行或临时处理
?冲突检查
?master-master复制
?MVCC – 写操作不阻塞读
?文件之前的版本可用
?Crash-only(可靠的)设计
?实时的进行数据压缩
?视图:嵌入式map/reduce
?格式化视图:列表显示
?支持服务器端验证
?支持认证
?支持实时更新
?支持附件处理
?thus, CouchApps (standalone js apps)

最佳的应用场景:适用于数据变化较少,执行预定义查询的应用程序。适用于需要数据版本支持的应用程序。

例如: CRM、CMS系统。 master-master复制对于多站点部署是非常简单。

Redis (V2.8)

  • Written in: C
  • Main point: Blazing fast
  • License: BSD
  • Protocol: Telnet-like, binary safe
  • Disk-backed in-memory database,
  • Dataset size limited to computer RAM (but can span multiple machines' RAM with clustering)
  • Master-slave replication, automatic failover
  • Simple values or data structures by keys
  • but complex operations like ZREVRANGEBYSCORE.
  • INCR & co (good for rate limiting or statistics)
  • Bit operations (for example to implement bloom filters)
  • Has sets (also union/diff/inter)
  • Has lists (also a queue; blocking pop)
  • Has hashes (objects of multiple fields)
  • Sorted sets (high score table, good for range queries)
  • Lua scripting capabilities (!)
  • Has transactions (!)
  • Values can be set to expire (as in a cache)
  • Pub/Sub lets one implement messaging

Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory).

For example: Stock prices. Analytics. Real-time data collection. Real-time communication. And wherever you used memcached before.

?使用语言:C
?主要特点:运行非常快
?许可: BSD
?通讯协议: Telnet-like, binary safe
?有硬盘存储支持的内存数据库
?数据集的大小限制为计算机RAM(但可以跨多个机器的内存和聚类)
?主从复制,自动故障转移
?简单的值、键数据结构
?但也支持复杂操作,例如 ZREVRANGEBYSCORE
?INCR & co (适合计算极限值或统计数据)
?支持位操作
?支持 sets(同时也支持 union/diff/inter)
?支持列表(同时也支持队列、阻塞式pop操作)
?支持哈希表(带有多个属性的对象)
?支持排序
?支持事务
?可将数据设置成过期数据
?Pub/Sub允许用户实现消息机制
 

最佳应用场景:适用于数据变化快且数据库较小的应用程序(数据常在内存处理的)。

例如:股票价格、数据分析、实时数据搜集、实时通讯。


 

Clones of Google's Bigtable

HBase (V0.92.0)

  • Written in: Java
  • Main point: Billions of rows X millions of columns
  • License: Apache
  • Protocol: HTTP/REST (also Thrift)
  • Modeled after Google's BigTable
  • Uses Hadoop's HDFS as storage
  • Map/reduce with Hadoop
  • Query predicate push down via server side scan and get filters
  • Optimizations for real time queries
  • A high performance Thrift gateway
  • HTTP supports XML, Protobuf, and binary
  • Jruby-based (JIRB) shell
  • Rolling restart for configuration changes and minor upgrades
  • Random access performance is like MySQL
  • A cluster consists of several different types of nodes

Best used: Hadoop is probably still the best way to run Map/Reduce jobs on huge datasets. Best if you use the Hadoop/HDFS stack already.

For example: Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement.


?使用语言: Java
?主要特点:支持数十亿、数百万以上的列
?许可: Apache
?通讯协议:HTTP/REST
?Modeled after Google's BigTable

?使用类似 Hadoop's HDFS 进行存储
?Map/reduce with Hadoop
?实现谓词在server端扫描及过滤
?对实时查询进行优化
?支持 HTTP、XML、Protobuf、binary
?基于 Jruby( JIRB)的shell
?实现滚动式配置和升级
?随机访问性能类似MySQL
?一个集群包含几种不同类型的节点
 

最佳的应用场景:适用于非常大的表,并且需要实时访问的场合。

例如: 搜索引擎。分析日志数据。任何需要巨大的二维表的要求

Cassandra (1.2)

  • Written in: Java
  • Main point: Best of BigTable and Dynamo
  • License: Apache
  • Protocol: Thrift & custom binary CQL3
  • Tunable trade-offs for distribution and replication (N, R, W)
  • Querying by column, range of keys (Requires indices on anything that you want to search on)
  • BigTable-like features: columns, column families
  • Can be used as a distributed hash-table, with an "SQL-like" language, CQL (but no JOIN!)
  • Data can have expiration (set on INSERT)
  • Writes can be much faster than reads (when reads are disk-bound)
  • Map/reduce possible with Apache Hadoop
  • All nodes are similar, as opposed to Hadoop/HBase
  • Very good and reliable cross-datacenter replication

Best used: When you write more than you read (logging). If every component of the system must be in Java. ("No one gets fired for choosing Apache's stuff.")

For example: Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is data analysis.

?使用语言: Java
?主要特点:对大表格支持得最好
?许可: Apache
?通讯协议: Thrift & custom binary CQL3
?可调节的分发及复制(N, R, W)
?查询列范围内的键值
?类似大表格的特点:列,某个列集合
?Can be used as a distributed hash-table, with an "SQL-like" language, CQL (but no JOIN!)

?数据可以设置有效期
?写操作比读操作更快
?所有的节点都是相似的,而不像Hadoop/HBase
?很好的和可靠的跨数据中心的复制 

最佳的应用场景:写操作多过读操作,如果每个系统组建都必须用 Java编写。
例如:银行业,金融业(虽然对于金融交易不是必须的,但这些产业对数据库的要求会比它们更大)写比读更快。

Neo4j (V1.5M02)

  • Written in: Java
  • Main point: Graph database - connected data
  • License: GPL, some features AGPL/commercial
  • Protocol: HTTP/REST (or embedding in Java)
  • Standalone, or embeddable into Java applications
  • Full ACID conformity (including durable data)
  • Both nodes and relationships can have metadata
  • Integrated pattern-matching-based query language ("Cypher")
  • Also the "Gremlin" graph traversal language can be used
  • Indexing of nodes and relationships
  • Nice self-contained web admin
  • Advanced path-finding with multiple algorithms
  • Indexing of keys and relationships
  • Optimized for reads
  • Has transactions (in the Java API)
  • Scriptable in Groovy
  • Online backup, advanced monitoring and High Availability is AGPL/commercial licensed

Best used: For graph-style, rich or complex, interconnected data. Neo4j is quite different from the others in this sense.

For example: For searching routes in social relations, public transport links, road maps, or network topologies.

  • 所用语言: Java
  • 特点:基于关系的图形数据库
  • 使用许可: GPL,其中一些特性使用 AGPL/商业许可
  • 协议: HTTP/REST(或嵌入在 Java中)
  • 可独立使用或嵌入到 Java应用程序
  • 图形的节点和边都可以带有元数据
  • 很好的自带web管理功能
  • 使用多种算法支持路径搜索
  • 使用键值和关系进行索引
  • 为读操作进行优化
  • 支持事务(用 Java api)
  • 使用 Gremlin图形遍历语言
  • 支持 Groovy脚本
  • 支持在线备份,高级监控及高可靠性支持使用 AGPL/商业许可

 

最佳应用场景:适用于图形一类数据。这是 Neo4j与其他nosql数据库的最显著区别

例如:社会关系,公共交通网络,地图及网络拓谱


 

Hypertable (0.9.6.5)

  • Written in: C++
  • Main point: A faster, smaller HBase
  • License: GPL 2.0
  • Protocol: Thrift, C++ library, or HQL shell
  • Implements Google's BigTable design
  • Run on Hadoop's HDFS
  • Uses its own, "SQL-like" language, HQL
  • Can search by key, by cell, or for values in column families.
  • Search can be limited to key/column ranges.
  • Sponsored by Baidu
  • Retains the last N historical values
  • Tables are in namespaces
  • Map/reduce with Hadoop

Best used: If you need a better HBase.

For example: Same as HBase, since it's basically a replacement: Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement.

?使用语言: C++
?主要特点:小的,非常快
?许可: GPL 2.0
?通讯协议: Thrift, C++ library, or HQL shell
?实现了谷歌的Bigtable的设计
?运行在Run on Hadoop's HDFS

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/12798004/viewspace-1148914/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/12798004/viewspace-1148914/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值