Apache Solr vs Elasticsearch

我们都知道业界两个最流行的开源搜索引擎,Solr和ElasticSearch。两者都建立在Apache Lucene开源平台之上,它们的主要功能非常相似,但是在部署的易用性,可扩展性和其他功能方面也存在巨大差异。

1. Apache Solr

在过去的十年里,solr发展壮大,拥有广泛的用户群体。solr提供分布式索引、分片、副本集、负载均衡和自动故障转移和恢复功能。不少互联网巨头,如Netflix,eBay,Instagram和Amazon(CloudSearch)均使用Solr。

solr的主要特点:

  • 全文索引
  • 高亮
  • 分面搜索
  • 实时索引
  • 动态聚类
  • 数据库集成
  • NoSQL特性和丰富的文档处理(例如Word和PDF文件)

2. Elasticsearch

Elasticsearch在Solr推出几年后才面世的,通过REST和schema-free(不需要预先定义 Schema,solr是需要预先定义的)的JSON文档提供分布式、多租户全文搜索引擎。Elasticsearch可扩展为准实时搜索引擎。其中一个关键特性是多租户功能,可根据不同的用途分索引,可以同时操作多个索引。

上图中,可以在google中的搜索热度,可以看出在2013年后,Elasticsearch与Solr相比具有很大的吸引力,但这并不意味着Apache Solr已经死了。虽然不少人不认可,但Solr仍然是最流行的搜索引擎之一,具有强大的开源社区支持。

性能对比

大型互联网公司,实际生产环境测试,将搜索引擎从Solr转到Elasticsearch以后的平均查询速度有了50倍的提升。

3. 功能特性的差异

说实话,有些特性我确实没研究过,结论也是引入Kelvin Tan

API

FeatureSolr 7.2.1Elasticsearch 6.2.4
FormatXML, CSV, JSONJSON
HTTP REST API
Binary API SolrJ TransportClient, Thrift (through a plugin)
JMX support ES specific stats are exposed through the REST API
Official client libraries JavaJava, Groovy, PHP, Ruby, Perl, Python, .NET, Javascript Official list of clients
Community client libraries PHP, Ruby, Perl, Scala, Python, .NET, Javascript, Go, Erlang, ClojureClojure, Cold Fusion, Erlang, Go, Groovy, Haskell, Java, JavaScript, .NET, OCaml, Perl, PHP, Python, R, Ruby, Scala, Smalltalk, Vert.x Complete list
3rd-party product integration (open-source)Drupal, Magento, Django, ColdFusion, Wordpress, OpenCMS, Plone, Typo3, ez Publish, Symfony2, Riak (via Yokozuna)Drupal, Django, Symfony2, Wordpress, CouchBase
3rd-party product integration (commercial)DataStax Enterprise Search, Cloudera Search, Hortonworks Data Platform, MapRSearchBlox, Hortonworks Data Platform, MapR etc Complete list
OutputJSON, XML, PHP, Python, Ruby, CSV, Velocity, XSLT, native JavaJSON, XML/HTML (via plugin)

Infrastructure

FeatureSolr 7.2.1Elasticsearch 6.2.4
Master-slave replication Not an issue because shards are replicated across nodes.
Integrated snapshot and restoreFilesystemFilesystem, AWS Cloud Plugin for S3 repositories, HDFS Plugin for Hadoop environments, Azure Cloud Plugin for Azure storage repositories

Indexing

FeatureSolr 7.2.1Elasticsearch 6.2.4
Data ImportDataImportHandler - JDBC, CSV, XML, Tika, URL, Flat File[DEPRECATED in 2.x] Rivers modules - ActiveMQ, Amazon SQS, CouchDB, Dropbox, DynamoDB, FileSystem, Git, GitHub, Hazelcast, JDBC, JMS, Kafka, LDAP, MongoDB, neo4j, OAI, RabbitMQ, Redis, RSS, Sofa, Solr, St9, Subversion, Twitter, Wikipedia
ID field for updates and deduplication
DocValues
Partial Doc Updates with stored fields with _source field
Custom Analyzers and Tokenizers
Per-field analyzer chain
Per-doc/query analyzer chain
Index-time synonyms Supports Solr and Wordnet synonym format
Query-time synonyms Solr 6 provides proper multi-word synonyms via SynonymGraphFilter Synonym Graph Token Filter is in beta in ES 6.2
Multiple indexes
Near-Realtime Search/Indexing
Complex documents
Schemaless
Multiple document types per schema One set of fields per schema, one schema per core
Online schema changes Schemaless mode or via dynamic fields. Only backward-compatible changes.
Apache Tika integration
Dynamic fields
Field copying via multi-fields
Hash-based deduplication Murmur plugin or ER plugin
Index-time sorting

Searching

FeatureSolr 7.2.1Elasticsearch 6.2.4
Lucene Query parsing
Structured Query DSL JSON Query DSL is new in Solr 7.x
Span queries via SOLR-2703
Spatial/geo search
Multi-point spatial search
Faceting Top N term accuracy can be controlled with shard_size
Advanced Faceting New Analytics component and JSON faceting APIblog post
Geo-distance Faceting
Pivot Facets
More Like This
Boosting by functions
Boosting using scripting languages
Push Queries Via Streaming Expressions Percolation. Distributed percolation supported in 1.0
Field collapsing/Results grouping
Query Re-Ranking via Rescoring or a plugin
Index-based Spellcheck Phrase Suggester
Wordlist-based Spellcheck
Autocomplete
Document-oriented Autocomplete Solr suggester return phrases not documents.
Learning to Rank Via https://github.com/o19s/elasticsearch-learning-to-rank
Query elevation workaround
Intra-index joins via parent-child query via has_children and top_children queries
Inter-index joins Joined index has to be single-shard and replicated across all nodes.
Resultset Scrolling via scan search type
Filter queries also supports filtering by native scripts
Filter execution order local params and cache property
Alternative QueryParsers DisMax, eDisMax query_string, dis_max, match, multi_match etc
Negative boosting but awkward. Involves positively boosting the inverse set of negatively-boosted documents.
Search across multiple indexes it can search across multiple compatible collections
Result highlighting
Custom Similarity
Searcher warming on index reload Warmers API
Term Vectors API
SQL queries Via Parallel SQL. SolrCloud only
Distributed Map/Reduce processing Via Streaming Expressions. SolrCloud only

Distributed

FeatureSolr 7.2.1Elasticsearch 6.2.4
Self-contained cluster Depends on separate ZooKeeper server Only Elasticsearch nodes
Automatic node discovery ZooKeeper internal Zen Discovery or ZooKeeper
Partition tolerance The partition without a ZooKeeper quorum will stop accepting indexing requests or cluster state changes, while the partition with a quorum continues to function. Partitioned clusters can diverge unless discovery.zen.minimum_master_nodes set to at least N/2+1, where N is the size of the cluster. If configured correctly, the partition without a quorum will stop operating, while the other continues to work. See this
Automatic failover If all nodes storing a shard and its replicas fail, client requests will fail, unless requests are made with the shards.tolerant=true parameter, in which case partial results are retuned from the available shards.
Automatic leader election
Shard replication
Sharding
Automatic shard rebalancing Solr Autoscaling is new in Solr 7. it can be machine, rack, availability zone, and/or data center aware. Arbitrary tags can be assigned to nodes and it can be configured to not assign the same shard and its replicates on a node with the same tags.
Change # of shards Shards can be added (when using implicit routing) or split (when using compositeId). Cannot be lowered. Replicas can be increased anytime. each index has 5 shards by default. Number of primary shards cannot be changed once the index is created. Replicas can be increased anytime. The Shrink Index API lets you reindex the index into a new index with fewer shards.
Shard splitting You can use the Index Splitting API to index to a new index with primary shards split.
Relocate shards and replicas can be done by creating a shard replicate on the desired node and then removing the shard from the source node can move shards and replicas to any node in the cluster on demand
Control shard routing shards or _route_ parameterrouting parameter
Pluggable shard/replica assignment New Autoscaling API replaces the old rule-based replica assignment Probabilistic shard balancing with Tempest plugin
Avoid duplicate indexing on replicas Solr 7 provides 3 kinds of replica types: NRT (default and the pre-Solr 7 behavior), tlog and pull. Non-SolrCloud master-slave replication can be achieved with tlog replica types.
ConsistencyIndexing requests are synchronous with replication. A indexing request won't return until all replicas respond. No check for downed replicas. They will catch up when they recover. When new replicas are added, they won't start accepting and responding to requests until they are finished replicating the index.Replication between nodes is synchronous by default, thus ES is consistent by default, but it can be set to asynchronous on a per document indexing basis. Index writes can be configured to fail is there are not sufficient active shard replicas. The default is quorum, but all or one are also available.

4. 总结

  • Solr 利用 Zookeeper 进行分布式管理,而 Elasticsearch 自身带有分布式协调管理功能;
  • Solr 支持更多格式的数据,而 Elasticsearch 仅支持json文件格式;
  • Solr 官方提供的功能更多,而 Elasticsearch 本身更注重于核心功能,高级功能多有第三方插件提供;
  • Solr 在传统的搜索应用中表现好于 Elasticsearch,但在处理实时搜索应用时效率明显低于 Elasticsearch。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值