Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vsHBase vs Couchbase vs Neo4j vs Hypertable vsElast

Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vsHBase vs Couchbase vs Neo4j vs Hypertable vsElasticSearch vs Accumulo vs VoltDB vs Scalariscomparison

(Yes it's a long title, since people kept asking me to write about this and that too :) I do when it has a point.)

While SQL databases are insanely useful tools, their monopoly in the last decades is coming to an end. And it's just time: I can't even count the things that were forced into relational databases, but never really fitted them. (That being said, relational databases will always be the best for the stuff that has relations.)

But, the differences between NoSQL databases are much bigger than ever was between one SQL database and another. This means that it is a bigger responsibility on software architects to choose the appropriate one for a project right at the beginning.

In this light, here is a comparison of CassandraMongodbCouchDBRedisRiakCouchbase (ex-Membase)HypertableElasticSearchAccumuloVoltDBKyoto TycoonScalarisNeo4j and HBase:

The most popular ones

MongoDB (2.2)

  • Written in: C++
  • Main point: Retains some friendly properties of SQL. (Query, index)
  • License: AGPL (Drivers: Apache)
  • Protocol: Custom, binary (BSON)
  • Master/slave replication (auto failover with replica sets)
  • Sharding built-in
  • Queries are javascript expressions
  • Run arbitrary javascript functions server-side
  • Better update-in-place than CouchDB
  • Uses memory mapped files for data storage
  • Performance over features
  • Journaling (with --journal) is best turned on
  • On 32bit systems, limited to ~2.5Gb
  • An empty database takes up 192Mb
  • GridFS to store big data + metadata (not actually an FS)
  • Has geospatial indexing
  • Data center aware

Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks.

For example: For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.

Riak (V1.2)

  • Written in: Erlang & C, some JavaScript
  • Main point: Fault tolerance
  • License: Apache
  • Protocol: HTTP/REST or custom binary
  • Stores blobs
  • Tunable trade-offs for distribution and replication
  • Pre- and post-commit hooks in JavaScript or Erlang, for validation and security.
  • Map/reduce in JavaScript or Erlang
  • Links & link walking: use it as a graph database
  • Secondary indices: but only one at once
  • Large object support (Luwak)
  • Comes in "open source" and "enterprise" editions
  • Full-text search, indexing, querying with Riak Search
  • In the process of migrating the storing backend from "Bitcask" to Google's "LevelDB"
  • Masterless multi-site replication replication and SNMP monitoring are commercially licensed

Best used: If you want something Dynamo-like data storage, but no way you're gonna deal with the bloat and complexity. If you need very good single-site scalability, availability and fault-tolerance, but you're ready to pay for multi-site replication.

For example: Point-of-sales data collection. Factory control systems. Places where even seconds of downtime hurt. Could be used as a well-update-able web server.

CouchDB (V1.2)

  • Written in: Erlang
  • Main point: DB consistency, ease of use
  • License: Apache
  • Protocol: HTTP/REST
  • Bi-directional (!) replication,
  • continuous or ad-hoc,
  • with conflict detection,
  • thus, master-master replication. (!)
  • MVCC - write operations do not block reads
  • Previous versions of documents are available
  • Crash-only (reliable) design
  • Needs compacting from time to time
  • Views: embedded map/reduce
  • Formatting views: lists & shows
  • Server-side document validation possible
  • Authentication possible
  • Real-time updates via '_changes' (!)
  • Attachment handling
  • thus, CouchApps (standalone js apps)

Best used: For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important.

For example: CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments.

Redis (V2.4)

  • Written in: C/C++
  • Main point: Blazing fast
  • License: BSD
  • Protocol: Telnet-like
  • Disk-backed in-memory database,
  • Currently without disk-swap (VM and Diskstore were abandoned)
  • Master-slave replication
  • Simple values or hash tables by keys,
  • but complex operations like ZREVRANGEBYSCORE.
  • INCR & co (good for rate limiting or statistics)
  • Has sets (also union/diff/inter)
  • Has lists (also a queue; blocking pop)
  • Has hashes (objects of multiple fields)
  • Sorted sets (high score table, good for range queries)
  • Redis has transactions (!)
  • Values can be set to expire (as in a cache)
  • Pub/Sub lets one implement messaging (!)

Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory).

For example: Stock prices. Analytics. Real-time data collection. Real-time communication. And wherever you used memcached before.

Clones of Google's Bigtable

HBase (V0.92.0)

  • Written in: Java
  • Main point: Billions of rows X millions of columns
  • License: Apache
  • Protocol: HTTP/REST (also Thrift)
  • Modeled after Google's BigTable
  • Uses Hadoop's HDFS as storage
  • Map/reduce with Hadoop
  • Query predicate push down via server side scan and get filters
  • Optimizations for real time queries
  • A high performance Thrift gateway
  • HTTP supports XML, Protobuf, and binary
  • Jruby-based (JIRB) shell
  • Rolling restart for configuration changes and minor upgrades
  • Random access performance is like MySQL
  • A cluster consists of several different types of nodes

Best used: Hadoop is probably still the best way to run Map/Reduce jobs on huge datasets. Best if you use the Hadoop/HDFS stack already.

For example: Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement.

Cassandra (1.2)

  • Written in: Java
  • Main point: Best of BigTable and Dynamo
  • License: Apache
  • Protocol: Thrift & custom binary CQL3
  • Tunable trade-offs for distribution and replication (N, R, W)
  • Querying by column, range of keys (Requires indices on anything that you want to search on)
  • BigTable-like features: columns, column families
  • Can be used as a distributed hash-table, with an "SQL-like" language, CQL (but no JOIN!)
  • Data can have expiration (set on INSERT)
  • Writes can be much faster than reads (when reads are disk-bound)
  • Map/reduce possible with Apache Hadoop
  • All nodes are similar, as opposed to Hadoop/HBase
  • Cross-datacenter replication

Best used: When you write more than you read (logging). If every component of the system must be in Java. ("No one gets fired for choosing Apache's stuff.")

For example: Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is data analysis.

Hypertable (0.9.6.5)

  • Written in: C++
  • Main point: A faster, smaller HBase
  • License: GPL 2.0
  • Protocol: Thrift, C++ library, or HQL shell
  • Implements Google's BigTable design
  • Run on Hadoop's HDFS
  • Uses its own, "SQL-like" language, HQL
  • Can search by key, by cell, or for values in column families.
  • Search can be limited to key/column ranges.
  • Sponsored by Baidu
  • Retains the last N historical values
  • Tables are in namespaces
  • Map/reduce with Hadoop

Best used: If you need a better HBase.

For example: Same as HBase, since it's basically a replacement: Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement.

Accumulo (1.4)

  • Written in: Java and C++
  • Main point: A BigTable with Cell-level security
  • License: Apache
  • Protocol: Thrift
  • Another BigTable clone, also runs of top of Hadoop
  • Cell-level security
  • Bigger rows than memory are allowed
  • Keeps a memory map outside Java, in C++ STL
  • Map/reduce using Hadoop's facitlities (ZooKeeper & co)
  • Some server-side programming

Best used: If you need a different HBase.

For example: Same as HBase, since it's basically a replacement: Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement.

Special-purpose

Neo4j (V1.5M02)

  • Written in: Java
  • Main point: Graph database - connected data
  • License: GPL, some features AGPL/commercial
  • Protocol: HTTP/REST (or embedding in Java)
  • Standalone, or embeddable into Java applications
  • Full ACID conformity (including durable data)
  • Both nodes and relationships can have metadata
  • Integrated pattern-matching-based query language ("Cypher")
  • Also the "Gremlin" graph traversal language can be used
  • Indexing of nodes and relationships
  • Nice self-contained web admin
  • Advanced path-finding with multiple algorithms
  • Indexing of keys and relationships
  • Optimized for reads
  • Has transactions (in the Java API)
  • Scriptable in Groovy
  • Online backup, advanced monitoring and High Availability is AGPL/commercial licensed

Best used: For graph-style, rich or complex, interconnected data. Neo4j is quite different from the others in this sense.

For example: For searching routes in social relations, public transport links, road maps, or network topologies.

ElasticSearch (0.20.1)

  • Written in: Java
  • Main point: Advanced Search
  • License: Apache
  • Protocol: JSON over HTTP (Plugins: Thrift, memcached)
  • Stores JSON documents
  • Has versioning
  • Parent and children documents
  • Documents can time out
  • Very versatile and sophisticated querying, scriptable
  • Write consistency: one, quorum or all
  • Sorting by score (!)
  • Geo distance sorting
  • Fuzzy searches (approximate date, etc) (!)
  • Asynchronous replication
  • Atomic, scripted updates (good for counters, etc)
  • Can maintain automatic "stats groups" (good for debugging)
  • Still depends very much on only one developer (kimchy).

Best used: When you have objects with (flexible) fields, and you need "advanced search" functionality.

For example: A dating service that handles age difference, geographic location, tastes and dislikes, etc. Or a leaderboard system that depends on many variables.

The "long tail"
(Not widely known, but definitely worthy ones)

Couchbase (ex-Membase) (2.0)

  • Written in: Erlang & C
  • Main point: Memcache compatible, but with persistence and clustering
  • License: Apache
  • Protocol: memcached + extensions
  • Very fast (200k+/sec) access of data by key
  • Persistence to disk
  • All nodes are identical (master-master replication)
  • Provides memcached-style in-memory caching buckets, too
  • Write de-duplication to reduce IO
  • Friendly cluster-management web GUI
  • Connection proxy for connection pooling and multiplexing (Moxi)
  • Incremental map/reduce
  • Cross-datacenter replication

Best used: Any application where low-latency data access, high concurrency support and high availability is a requirement.

For example: Low-latency use-cases like ad targeting or highly-concurrent web apps like online gaming (e.g. Zynga).

Scalaris (0.5)

  • Written in: Erlang
  • Main point: Distributed P2P key-value store
  • License: Apache
  • Protocol: Proprietary & JSON-RPC
  • In-memory (disk when using Tokyo Cabinet as a backend)
  • Uses YAWS as a web server
  • Has transactions (an adapted Paxos commit)
  • Consistent, distributed write operations
  • From CAP, values Consistency over Availability (in case of network partitioning, only the bigger partition works)

Best used: If you like Erlang and wanted to use Mnesia or DETS or ETS, but you need something that is accessible from more languages (and scales much better than ETS or DETS).

For example: In an Erlang-based system when you want to give access to the DB to Python, Ruby or Java programmers.

VoltDB (2.8.4.1)

  • Written in: Java
  • Main point: Fast transactions and repidly changing data
  • License: GPL 3
  • Protocol: Proprietary
  • In-memory relational database.
  • Can export data into Hadoop
  • Supports ANSI SQL
  • Stored procedures in Java
  • Cross-datacenter replication

Best used: Where you need to act fast on massive amounts of incoming data.

For example: Point-of-sales data analysis. Factory control systems.

Kyoto Tycoon (0.9.56)

  • Written in: C++
  • Main point: A lightweight network DBM
  • License: GPL
  • Protocol: HTTP (TSV-RPC or REST)
  • Based on Kyoto Cabinet, Tokyo Cabinet's successor
  • Multitudes of storage backends: Hash, Tree, Dir, etc (everything from Kyoto Cabinet)
  • Kyoto Cabinet can do 1M+ insert/select operations per sec (but Tycoon does less because of overhead)
  • Lua on the server side
  • Language bindings for C, Java, Python, Ruby, Perl, Lua, etc
  • Uses the "visitor" pattern
  • Hot backup, asynchronous replication
  • background snapshot of in-memory databases
  • Auto expiration (can be used as a cache server)

Best used: When you want to choose the backend storage algorithm engine very precisely. When speed is of the essence.

For example: Caching server. Stock prices. Analytics. Real-time data collection. Real-time communication. And wherever you used memcached before.

Of course, all these systems have much more features than what's listed here. I only wanted to list the key points that I base my decisions on. Also, development of all are very fast, so things are bound to change.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
资源包主要包含以下内容: ASP项目源码:每个资源包中都包含完整的ASP项目源码,这些源码采用了经典的ASP技术开发,结构清晰、注释详细,帮助用户轻松理解整个项目的逻辑和实现方式。通过这些源码,用户可以学习到ASP的基本语法、服务器端脚本编写方法、数据库操作、用户权限管理等关键技术。 数据库设计文件:为了方便用户更好地理解系统的后台逻辑,每个项目中都附带了完整的数据库设计文件。这些文件通常包括数据库结构图、数据表设计文档,以及示例数据SQL脚本。用户可以通过这些文件快速搭建项目所需的数据库环境,并了解各个数据表之间的关系和作用。 详细的开发文档:每个资源包都附有详细的开发文档,文档内容包括项目背景介绍、功能模块说明、系统流程图、用户界面设计以及关键代码解析等。这些文档为用户提供了深入的学习材料,使得即便是从零开始的开发者也能逐步掌握项目开发的全过程。 项目演示与使用指南:为帮助用户更好地理解和使用这些ASP项目,每个资源包中都包含项目的演示文件和使用指南。演示文件通常以视频或图文形式展示项目的主要功能和操作流程,使用指南则详细说明了如何配置开发环境、部署项目以及常见问题的解决方法。 毕业设计参考:对于正在准备毕业设计的学生来说,这些资源包是绝佳的参考材料。每个项目不仅功能完善、结构清晰,还符合常见的毕业设计要求和标准。通过这些项目,学生可以学习到如何从零开始构建一个完整的Web系统,并积累丰富的项目经验。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值