数据库的全文检索性能对比

http://stackoverflow.com/questions/1037991/how-much-more-performant-is-postgres-than-mysql-on-fulltext-search

I ran benchmarks a few years ago on large datasets and found that :

  • MySQL FULLTEXT

Is pretty slow. Another drawback is that it forces MyISAM on you which brings a lot of problems. Also index updates are quite slow once the index reaches a certain size : when you insert a new row, a substantial part of the index is re-generated, sometimes a few hundred megabytes of index are rewritten just to insert a forum post. In other words, it's OK for a small forum with a few MBytes of posts, but there is a reason Wikipedia doesn't use it...

  • PostgreSQL fulltext

Is about 10-100x faster than MySQL fulltext, is a lot more powerful, gist is fast on inserts/updates, no problem with locks, in other words it's a totally decent solution.

However searches get slow when the data set is larger than RAM because of MVCC, postgres needs to check the visibility of rows by hitting the heap. Note this may change in a future version. If your query returns 10 rows, no problem. However, if you want to SELECT WHERE (fulltext query) ORDER BY date LIMIT 10 and the fulltext matches 10.000 rows, it can get pretty slow. Still faster than MySQL but not the performance you'd want.

  • Xapian : I tested this, there are also Lucene and Sphinx which have good reputation.

Xapian does not have to conform to the same restrictions as a database, so it can make a lot more opimizations. For instance, it's a single-writer multiple-reader concurrency model, so you'll need some sort of update queue to update your index in the background. It also has its own on-disk format. The result is that it is incredibly fast, even when the dataset is much larger than RAM, and especially on complicated queries matching lots of rows, with sorts, and returning only the most relevant ones.

The index is huge too, it probably contains lots of duplicated stuff. The consequence is that it doesn't need to seek to retrieve the stuff.

Basically once Postgres started to hit the IO-seek wall, MySQL was long dead, and Xapian kept blazing fast.

But it is not as nicely integrated in the database, so it is more work to use. It is only worth it if you have a huge dataset. If this is your case, try it, it's amazing. If your dataset fits in RAM, postgres will just work with a lot less hassle. Also if you want to combine fulltext and database queries, well, integration becomes important.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值