Edmond Lau 在quora上给的comment,总结的很好,更多的comments点击链接查看。
The main problems that a NoSQL aims to solve typically revolve around issues of scale. When data no longer fits on a single MySQL server or when a single machine can no longer handle the query load, some strategy for sharding and replication is required.
The pitch behind most NoSQL databases like Cassandra, HBase, Voldemort and others is that because they were designed from the ground up to be distributed and to handle large data volumes, they can provide some combination of the following benefits that a simple installation of MySQL or Postgres can't easily offer:
- Automatic sharding of data. New data gets automatically assigned to the appropriate node.
- Automatic replication of data. Multiple nodes each store a copy of the data, up to a certain configured replication factor.
- Schema-less data for simpler migrations. Schema changes for large tables can take a long time and lock the tables, blocking any writes. A database with only a loosely defined schema (like Casasndra and HBase's column families) or none at all in key/value stores should make this easier.
- Automatic scalability by adding new nodes. Adding new nodes automatically re-partition the data for load balancing purposes.
- Multiple nodes that can accept writes. Unlike a standard MySQL master/slave setup, multiple nodes in a NoSQL database can accept updates, thereby supporting much higher query throughput.
A few other key/value stores that are often lumped into the NoSQL category, like Redis and Tokyo Cabinet, aim less to provide distributed scalability but instead optimize for high-performance lookups at the cost of no longer supporting relational queries.
In practice, the level and reliability of support for each of these benefits varies from system to system. Facebook, FriendFeed, Ning, and other companies have demonstrated that scaling capabilities can often be built within the application layer on top of standard relational databases like MySQL; moreover, failures with Cassandra at Digg [1] and the decision not to use Cassandra for the primary data store at Twitter [2] suggest that the MySQL-based systems may, at least for now, be more robust than many of the newer, unvetted systems. The outcome of Facebook's decision to use HBase for its new messages product may change this landscape.
[1] http://gigaom.com/2010/09/08/dig...
[2] http://engineering.twitter.com/2...