Quora上的一个哥们对mongodb由爱转恨

最新推荐文章于 2021-10-07 23:43:32 发布

macyang

最新推荐文章于 2021-10-07 23:43:32 发布

阅读量2.2k

点赞数

分类专栏： mongodb

mongodb 专栏收录该内容

21 篇文章 0 订阅

订阅专栏

Some people who talk about leaving MongoDB:

Anonymous: http://pastebin.com/raw.php?i=FD...

Zopyx: http://www.zopyx.de/blog/goodbye...
Bump: http://devblog.bu.mp/from-mongod...
Urban Airship: http://blog.schmichael.com/2011/...
Shareaholic: http://blog.shareaholic.com/2012...
DigiDoc: http://svs.io/post/31724990463/w...
Etsy: talk about a project that failed to get off the ground with MongoDB

mcfunley.com

Why MongoDB Never Worked Out at Etsy
Aphyr's Talk: MongoDB 2.4.3 is not consistent, even in the mode where itshould be. Here's a long post about it (Call me maybe: MongoDB). When we talk about product maturity & quality below, this is the exact type of issue that comes up.

Their arguments center around a few core themes:

Product Maturity: low QA quality, sub-systems repeatedly breaking, driver inconsistencies, scattered and incomplete documentation, complex node management.
Design Decisions: single write lock, memory-mapped files means that the server has little control over its performance, replication has no proxying causing ridiculous connection numbers, sharding is possible but very complex to implement b/c of several special nodes, sharding is unreliable, lots of pitfalls and gotchas.
Wrong Trade-Offs: in a few cases, people started using MongoDB without grasping some key difference between MongoDB and traditional RDBMS. Joins, audit trails, aggregation, schema management, queries into nested objects... MongoDB makes some things easy in exchange for extra work elsewhere, but it's not always clear what these trade-offs are.

I have worked with MongoDB in production settings for about 2 years and am currently the #1 rated MongoDB person on Stackoverflow, so I have seen several of these limitations.
http://stackoverflow.com/tags/mo...

The key thing to understand about MongoDB is that it's not a magic bullet. It has significant tradeoffs like everything else.

Instead, I think of MongoDB as a "twist" on a typical RDBMS. It's not a Dynamo DB, it's not a Bigtable DB, it's not a Key-Value DB. It's really a hybrid database with features from a few different places.

It has secondary indexes and complex queries like SQL DBs. It implements replication similar to an SQL DB. It has a Map/Reduce/Aggregation framework like a Big-Table DB. It has auto-sharding features that are halfway between some SQL implementations and Key-Value DBs.

It has strong consistency like DynamoDB or SimpleDB but it does not have MVCC (details http://blog.mongodb.org/post/523... ). It has highly configurable write concerns front and center but the first two modes "fire & forget" and "safe" do not actually commit anything to the disk.

At the end of the day, MongoDB kind of lives in its own little niche. It makes a lot of unique trade-offs that must be understood to use it effectively.

Also what did they move to?

Generally something more specific to their goals.

If you end up needing to run lots of Map/Reduce, you probably end up running Hadoop (or their new Hadoop plug-in). MongoDB is not designed to match the speed of Hadoop's Map/Reduce.
If you shard heavily and only use Key/Value look-ups, then Riak is probably easier to manage on a large scale. In fact, the Bump post says exactly this: We decided to move to Riak because it offers better operational qualities than MongoDB...Nagios will email us instead of page us,...
If you're using MongoDB heavily as a cache, maybe you end up using Membase / Redis / HBase.
If you start using MongoDB as a Queue, you will eventually want to look at real queuing systems. RabbitMQ, ActiveMQ, ZeroMQ, etc.
If you start using MongoDB to do search, you will eventually find that things like Solr & Sphinx are better suited to resolve the full spectrum of search queries.

The key here is that MongoDB is really a set of trade-offs. Many of the databases above are very specific in what they do. MongoDB is less specific but is serviceable for handling many different cases.

However, once you get to a certain scale, MongoDB will underperform the specialized solution. In fact, I'm seeing this at my day job where we are actively moving several sub-systems off MongoDB and onto better-suited products

ref: http://www.quora.com/MongoDB/Which-companies-have-moved-away-from-MongoDB-and-why

First I think you should take a look at people leaving MongoDB and why:
Gaëtan Voyer-Perrault's answer to MongoDB: Which companies have moved away from MongoDB and why?

This will give you some idea about the trade-offs of MongoDB.

Quoting myself:

I think of MongoDB as a "twist" on a typical RDBMS. It's not a Dynamo DB, it's not a Bigtable DB, it's not a Key-Value DB. It's really a hybrid database with features from a few different places.

It has secondary indexes and complex queries like SQL DBs. It implements replication similar to an SQL DB. It has a Map/Reduce/Aggregation framework like a Big-Table DB. It has auto-sharding features that are halfway between some SQL implementations and Key-Value DBs.

So what are the advantages? A quick list:

Replication and Sharding are relatively easy to implement (though managing Shards is still very painful).
Storing objects allows you to nest children and reduce queries / space requirements. I can get an Order and its OrderDetails and its OrderShipping information in a single query with one index look-up.
MongoDB has a lot of flexibility for "durability" of data. So you can increase write throughput by being more tolerant of data loss.
MongoDB has extra atomic update operations such as "$inc" & "$push". That last one allows for operation on arrays. So MongoDB queries can affect arrays. This is a good fit for certain types of real-time reporting data.
MongoDB supports indexing on arrays. This can be a nice way to access smaller sets of data (fails to scale with sharding though)
MongoDB requires very few management commands from the developer. You don't need to create tables or manage column definitions or stored procedures. That stated, your code has to be very responsible when it comes to inserting data or you may just be inserting garbage.

Of course these are all trade-offs

Flexible durability means that it can be easy to lose data. For 2+ years the default driver setting would ignore many errors including basic things like "duplicate key".
Flexible schema means that anyone with access can put garbage into your DB. Your code has to be very defensive when pulling data back out.
"Nested documents" means that you don't get any joins. If you want join Orders and Users you have to do this manually in your code.
Replication and Sharding are supported, but not all features are performant with Sharding.

Overall, MongoDB is nice tool to have in your toolbox, but it's just one of many non-relational DBs that can help you solve problems.

Ref: http://www.quora.com/MongoDB/What-are-the-advantages-of-running-MongoDB-compared-to-e-g-MySQL

If you partition your data at the application level, MySQL scalability isn't an issue. Facebook reported [1] running 1800 MySQL servers with just two DBAs in 2008. You can't do joins across partitions, but the NoSQL databases don't allow this anyway. Facebook hasn't confirmed using Cassandra as the primary source for any data, and it seems like inbox search might be their only use of it. [2]
These distributed databases like Cassandra, MongoDB, and CouchDB[3] aren't actually very scalable or stable. Twitter apparently has been trying to move from MySQL to Cassandra for over a year. When someone reports using one of these systems as their primary data store for over 1000 machines for over a year, I'll reconsider my opinion on this.

<< Update as of August 2011: after I wrote this, foursquare reported an 11-hour downtime because of MongoDB. [4] Separately, a friend's startup that was going through explosive growth tried to switch to MongoDB and gave up after a month due to instability. Twitter gave up on the Cassandra migration. [5] Facebook is moving away from Cassandra. [6] HBase is getting better but is still risky if you don't have people around with a deep understanding of it. [7] >>
The primary online data store for an application is the worst place to take a risk with new technology. If you lose your database or there's corruption, it's a disaster that could be impossible to recover from. If you're not the developer of one of these new databases, and you're one of a very small number of companies using them at scale in production, you're at the mercy of the developer to fix bugs and handle scalability issues as they come up.
You can actually get pretty far on a single MySQL database and not even have to worry about partitioning at the application level. You can "scale up" to a machine with lots of cores and tons of ram, plus a replica. If you have a layer of memcached servers in front of the databases (which are easy to scale out) then the database basically only has to worry about writes. You can also use S3 or some other distributed hash table to take the largest objects out of rows in the database. There's no need to burden yourself with making a system scale more than 10x further than it needs to, as long as you're confident that you'll be able to scale it as you grow.
Many of the problems created by manually partitioning the data over a large number of MySQL machines can be mitigated by creating a layer below the application and above MySQL that automatically distributes data. FriendFeed described a good example implementation of this [8].
Personally, I believe the relational data model is the "right" way to structure most of the data for an application like Quora (and for most user-generated content sites). Schemas allow the data to persist in a typed manner across lots of new versions of the application as it's developed, they serve as documentation, and prevent a lot of bugs. And SQL lets you move the computation to the data as necessary rather than having to fetch a ton of data and post-process it in the application everywhere. I think the "NoSQL" fad will end when someone finally implements a distributed relational database with relaxed semantics.

Ref: http://www.quora.com/Quora-Infrastructure/Why-does-Quora-use-MySQL-as-the-data-store-instead-of-NoSQLs-such-as-Cassandra-MongoDB-or-CouchDB

macyang

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Quora上的一个哥们对mongodb由爱转恨

Some people who talk about leaving MongoDB:Anonymous: http://pastebin.com/raw.php?i=FD...Zopyx: http://www.zopyx.de/blog/goodbye...Bump: http://devblog.bu.mp/from-mongod...Urban Airship: htt
复制链接

扫一扫