8 Nosql Databases Compare


 

1. CouchDB

  • Written in: Erlang
  • Main point: DB consistency, ease of use
  • License: Apache
  • Protocol: HTTP/REST
  • Bi-directional (!) replication,
  • continuous or ad-hoc,
  • with conflict detection,
  • thus, master-master replication. (!)
  • MVCC - write operations do not block reads
  • Previous versions of documents are available
  • Crash-only (reliable) design
  • Needs compacting from time to time
  • Views: embedded map/reduce
  • Formatting views: lists & shows
  • Server-side document validation possible
  • Authentication possible
  • Real-time updates via '_changes' (!)
  • Attachment handling
  • thus, CouchApps (standalone js apps)

Best used:For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important.

For example:CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments.


2. Redis


  • Written in: C
  • Main point: Blazing fast
  • License: BSD
  • Protocol: Telnet-like, binary safe
  • Disk-backed in-memory database,
  • Dataset size limited to computer RAM (but can span multiple machines' RAM with clustering)
  • Master-slave replication, automatic failover
  • Simple values or data structures by keys
  • but complex operations like ZREVRANGEBYSCORE.
  • INCR & co (good for rate limiting or statistics)
  • Bit operations (for example to implement bloom filters)
  • Has sets (also union/diff/inter)
  • Has lists (also a queue; blocking pop)
  • Has hashes (objects of multiple fields)
  • Sorted sets (high score table, good for range queries)
  • Lua scripting capabilities (!)
  • Has transactions (!)
  • Values can be set to expire (as in a cache)
  • Pub/Sub lets one implement messaging

Best used:For rapidly changing data with a foreseeable database size (should fit mostly in memory).

For example:To store real-time stock prices. Real-time analytics. Leaderboards. Real-time communication. And wherever you used memcached before.


3. MongoDB

  • Written in: C++
  • Main point: Retains some friendly properties of SQL. (Query, index)
  • License: AGPL (Drivers: Apache)
  • Protocol: Custom, binary (BSON)
  • Master/slave replication (auto failover with replica sets)
  • Sharding built-in
  • Queries are javascript expressions
  • Run arbitrary javascript functions server-side
  • Better update-in-place than CouchDB
  • Uses memory mapped files for data storage
  • Performance over features
  • Journaling (with --journal) is best turned on
  • On 32bit systems, limited to ~2.5Gb
  • An empty database takes up 192Mb
  • GridFS to store big data + metadata (not actually an FS)
  • Has geospatial indexing
  • Data center aware

Best used:If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks.

For example:For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.

 

4. Riak


  • Written in: Erlang & C, some JavaScript
  • Main point: Fault tolerance
  • License: Apache
  • Protocol: HTTP/REST or custom binary
  • Stores blobs
  • Tunable trade-offs for distribution and replication
  • Pre- and post-commit hooks in JavaScript or Erlang, for validation and security.
  • Map/reduce in JavaScript or Erlang
  • Links & link walking: use it as a graph database
  • Secondary indices: but only one at once
  • Large object support (Luwak)
  • Comes in "open source" and "enterprise" editions
  • Full-text search, indexing, querying with Riak Search
  • In the process of migrating the storing backend from "Bitcask" to Google's "LevelDB"
  • Masterless multi-site replication replication and SNMP monitoring are commercially licensed

Best used:If you want something Dynamo-like data storage, but no way you're gonna deal with the bloat and complexity. If you need very good single-site scalability, availability and fault-tolerance, but you're ready to pay for multi-site replication.

For example:Point-of-sales data collection. Factory control systems. Places where even seconds of downtime hurt. Could be used as a well-update-able web server.


5. Membase


  • Written in: Erlang & C
  • Main point: Memcache compatible, but with persistence and clustering
  • License: Apache
  • Protocol: memcached + extensions
  • Very fast (200k+/sec) access of data by key
  • Persistence to disk
  • All nodes are identical (master-master replication)
  • Provides memcached-style in-memory caching buckets, too
  • Write de-duplication to reduce IO
  • Friendly cluster-management web GUI
  • Connection proxy for connection pooling and multiplexing (Moxi)
  • Incremental map/reduce
  • Cross-datacenter replication

Best used:Any application where low-latency data access, high concurrency support and high availability is a requirement.

For example:Low-latency use-cases like ad targeting or highly-concurrent web apps like online gaming (e.g. Zynga).


6. Neo4j

  • Written in: Java
  • Main point: Graph database - connected data
  • License: GPL, some features AGPL/commercial
  • Protocol: HTTP/REST (or embedding in Java)
  • Standalone, or embeddable into Java applications
  • Full ACID conformity (including durable data)
  • Both nodes and relationships can have metadata
  • Integrated pattern-matching-based query language ("Cypher")
  • Also the "Gremlin" graph traversal language can be used
  • Indexing of nodes and relationships
  • Nice self-contained web admin
  • Advanced path-finding with multiple algorithms
  • Indexing of keys and relationships
  • Optimized for reads
  • Has transactions (in the Java API)
  • Scriptable in Groovy
  • Online backup, advanced monitoring and High Availability is AGPL/commercial licensed

Best used:For graph-style, rich or complex, interconnected data. Neo4j is quite different from the others in this sense.

For example:For searching routes in social relations, public transport links, road maps, or network topologies.

 

7. Cassandra


  • Written in: Java
  • Main point: Store huge datasets in "almost" SQL
  • License: Apache
  • Protocol: CQL3 & Thrift
  • CQL3 is very similar SQL, but with some limitations that come from the scalability (most notably: no JOINs, no aggregate functions.)
  • CQL3 is now the official interface. Don't look at Thrift, unless you're working on a legacy app. This way, you can live without understanding ColumnFamilies, SuperColumns, etc.
  • Querying by key, or key range (secondary indices are also available)
  • Tunable trade-offs for distribution and replication (N, R, W)
  • Data can have expiration (set on INSERT).
  • Writes can be much faster than reads (when reads are disk-bound)
  • Map/reduce possible with Apache Hadoop
  • All nodes are similar, as opposed to Hadoop/HBase
  • Very good and reliable cross-datacenter replication
  • Distributed counter datatype.
  • You can write triggers in Java.

Best used:When you need to store data so huge that it doesn't fit on server, but still want a friendly familiar interface to it.

For example:Web analytics, to count hits by hour, by browser, by IP, etc. Transaction logging. Data collection from huge sensor arrays.

 

8. HBase

  • Written in: Java
  • Main point: Billions of rows X millions of columns
  • License: Apache
  • Protocol: HTTP/REST (also Thrift)
  • Modeled after Google's BigTable
  • Uses Hadoop's HDFS as storage
  • Map/reduce with Hadoop
  • Query predicate push down via server side scan and get filters
  • Optimizations for real time queries
  • A high performance Thrift gateway
  • HTTP supports XML, Protobuf, and binary
  • Jruby-based (JIRB) shell
  • Rolling restart for configuration changes and minor upgrades
  • Random access performance is like MySQL
  • A cluster consists of several different types of nodes

Best used:Hadoop is probably still the best way to run Map/Reduce jobs on huge datasets. Best if you use the Hadoop/HDFS stack already.

For example:Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement.

Of course, all these systems have much more features than what's listed here. I only wanted to list the key points that I base my decisions on. Also, development of all are very fast, so things are bound to change.

 


转载:http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值