The Well-Configured Solr Instance 告诉你如何调节solr实例到最佳性能 |
Configuring solrconfig.xml
solrconfig.xml的配置对solr工作的影响很大
能完成以下内容:
request handlers, which process the requests to Solr, such as requests to add documents to the index or requests to return results for a query
listeners, processes that "listen" for particular query-related events; listeners can be used to trigger the execution of special code, such as invoking some common queries to warm-up caches the Request Dispatcher for managing HTTP communications the Admin Web interface parameters related to replication and duplication (these parameters are covered in detail in Legacy Scaling and Distribution)
主要讲述的内容:
DataDir and DirectoryFactory in SolrConfig
Lib Directives in SolrConfig Schema Factory Definition in SolrConfig IndexConfig in SolrConfig RequestHandlers and SearchComponents in SolrConfig InitParams in SolrConfig UpdateHandlers in SolrConfig Query Settings in SolrConfig RequestDispatcher in SolrConfig Update Request Processors Codec Factory
Substituting Properties in Solr Config Files
solrconf,xml中支持动态设置属性值
${propertyname[:option default value]}
给予默认值或者运行时指定值或者报错
几种指定变量的方式:
JVM System Properties
Any JVM System properties, usually specified using the -D flag when starting the JVM, can be used as variables in any XML configuration file in Solr.
For example, in the sample solrconfig.xml files, you will see this value which defines the locking type to use:
<lockType>${solr.lock.type:native}</lockType> Which means the lock type defaults to "native" but when starting Solr, you could override this using a JVM system property by launching the Solr it with: bin/solr start -Dsolr.lock.type=none In general, any Java system property that you want to set can be passed through the bin/solr script using the standard -Dproperty=value syntax. Alternatively, you can add common system properties to the SOLR_OPTS environment variable defined in the Solr include file (bin/solr.in.sh). For more information about how the Solr include file works, refer to: Taking Solr to Production.
设置参数的两种方式:
一个是启动时传入
一个是在solr的初始化文件中设置
solrcore.properties
If the configuration directory for a Solr core contains a file named solrcore.properties that file can contain
any arbitrary user defined property names and values using the Java standard properties file format, and those properties can be used as variables in the XML configuration files for that Solr core. For example, the following solrcore.properties file could be created in the conf/ directory of a collection using one of the example configurations, to override the lockType used. #conf/solrcore.properties solr.lock.type=none
第二种方式使用 solrcore.properties
这个文件的名称和位置默认在conf下,可以使用core.properties中指定名称和位置
User defined properties from core.properties
For example, consider the following core.properties file:
#core.properties name=collection2 my.custom.prop=edismax The my.custom.prop property can then be used as a variable, such as in solrconfig.xml: <requestHandler name="/select"> <lst name="defaults"> <str name="defType">${my.custom.prop}</str> </lst> </requestHandler>
Implicit Core Properties
隐式定义的核心属性:
All implicit properties use the solr.core. name prefix, and reflect the runtime value of the equivalent
core.pr
operties property: solr.core.name solr.core.config solr.core.schema solr.core.dataDir solr.core.transient solr.core.loadOnStartup
DataDir and DirectoryFactory in SolrConfig
Specifying a Location for Index Data with the dataDir Parameter
通过dataDir指定索引数据的存放位置
<dataDir>/var/data/solr/</dataDir>
If you are using replication to replicate the Solr index (as described in Legacy Scaling and Distribution), then the <dataDir> directory should correspond to the index directory used in the replication configuration.
相对路径和绝对路径及副本设置
Specifying the DirectoryFactory For Your Index
You can force a particular implementation by specifying solr.MMapDirector
yFactory, solr.NIOFSDirectoryFactory, or solr.SimpleFSDirectoryFactory. <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/> The solr.RAMDirectoryFactory is memory based, not persistent, and does not work with replication. Use this DirectoryFactory to store your index in RAM. <directoryFactory class="org.apache.solr.core.RAMDirectoryFactory"/>
不同操作系统采用不同的文件目录系统,还可以将索引建在hdfs上
solr.HdfsDirectoryFactory instead of either of the above implementations.
Lib Directives in SolrConfig
能够使用正则表达式,所有的位置都是相对solr实例:
All directories are resolved as relative to the Solr instanceDir
<lib dir="../../../contrib/extraction/lib" regex=".*\.jar" />
<lib dir="../../../dist/" regex="solr-cell-\d.*\.jar" /> <lib dir="../../../contrib/clustering/lib/" regex=".*\.jar" /> <lib dir="../../../dist/" regex="solr-clustering-\d.*\.jar" /> <lib dir="../../../contrib/langid/lib/" regex=".*\.jar" /> <lib dir="../../../dist/" regex="solr-langid-\d.*\.jar" /> <lib dir="../../../contrib/velocity/lib" regex=".*\.jar" /> <lib dir="../../../dist/" regex="solr-velocity-\d.*\.jar" />
Schema Factory Definition in SolrConfig
While the "read" features of the Solr API are supported for all Schema types, support for making Schema modifications programatically depends on the <schemaFactory/> in use.
Managed Schema Default
Solr implicitly uses a ManagedIndexSchemaFactory
一个例子:
<schemaFactory class="ManagedIndexSchemaFactory">
<bool name="mutable">true</bool> <str name="managedSchemaResourceName">managed-schema</str> </schemaFactory> mutable - controls whether changes may be made to the Schema data. This must be set to true to allow edits to be made with the Schema API. managedSchemaResourceName is an optional parameter that defaults to "managed-schema", and defines a new name for the schema file that can be anything other than "schema.xml".
Classic schema.xml
disallows any programatic changes to the Schema at run time.
<schemaFactory class="ClassicIndexSchemaFactory"/>
不支持运行时的修改,仅仅支持修改后重新加载生效模式
Switching from schema.xml to Managed Schema
可以将 不能编辑的schema.xml转为可编辑的 模式在solrconfig.xml中配置
Changing to Manually Edited schema.xml
改变为手动编辑的模式
步骤:
Rename the managed-schema file to schema.xml. Modify solrconfig.xml to replace the schemaFactory class. Remove any ManagedIndexSchemaFactory definition if it exists. Add a ClassicIndexSchemaFactory definition as shown above Reload the core(s). If you are using SolrCloud, you may need to modify the files via ZooKeeper.
IndexConfig in SolrConfig
In most cases, the defaults are fine
<indexConfig> ... </indexConfig>
Parameters covered in this section:
Writing New Segments Merging Index Segments Compound File Segments Index Locks Other Indexing Settings
Writing New Segments
ramBufferSizeMB
<ramBufferSizeMB>100</ramBufferSizeMB>
maxBufferedDocs
<maxBufferedDocs>1000</maxBufferedDocs>
useCompoundFile
<useCompoundFile>false</useCompoundFile>
上面是文件的更新控制
Merging Index Segments
mergePolicyFactory
default in Solr is to use a TieredMergePolicy Other policies available are the LogByteSizeMergePolicy and LogDocMergePolicy.
<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">10</int> <int name="segmentsPerTier">10</int> </mergePolicyFactory>
Controlling Segment Sizes: Merge Factors
For TieredMergePolicy, this is controlled by setting the <int name="maxMergeAtOnce"> and <int name="segmentsPerTier"> options, while LogByteSizeMergePolicy has a single <int name="mergeFactor"> option (all of which default to "10").
对于合并索引片段能加快搜索但是需要提交创建索引的时间
Customizing Merge Policies
一个例子:
<mergePolicyFactory class="org.apache.solr.index.SortingMergePolicyFactory">
<str name="sort">timestamp desc</str> <str name="wrapped.prefix">inner</str> <str name="inner.class">org.apache.solr.index.TieredMergePolicyFactory</str> <int name="inner.maxMergeAtOnce">10</int> <int name="inner.segmentsPerTier">10</int> </mergePolicyFactory>
mergeScheduler
The merge scheduler controls how merges are performed
The default ConcurrentMergeScheduler 多线程
The alternative, SerialMergeScheduler, 串行线程
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/>
mergedSegmentWarmer
有利于近时时搜索
<mergedSegmentWarmer class="org.apache.lucene.index.SimpleMergedSegmentWarmer"/>
Compound File Segments
复合文件片段
Index Locks
lockType
锁类型
StandardDirectoryFactory (the default)
native
simple single hdfs
<lockType>native</lockType>
writeLockTimeout
写入锁的超时时间
<writeLockTimeout>1000</writeLockTimeout>
Other Indexing Settings
其余的一些参数:
reopenReaders
deletionPolicy
infoStream
例子:
<reopenReaders>true</reopenReaders>
<deletionPolicy class="solr.SolrDeletionPolicy"> <str name="maxCommitsToKeep">1</str> <str name="maxOptimizedCommitsToKeep">0</str> <str name="maxCommitAge">1DAY</str> </deletionPolicy> <infoStream>false</infoStream>
RequestHandlers and SearchComponents in SolrConfig
Request Handlers
SearchHandlers UpdateRequestHandlers ShardHandlers Other Request Handlers Search Components Default Components First-Components and Last-Components Components Other Useful Components
Request Handlers
请求处理器和路径的映射关系
SearchHandlers
参数和特点
UpdateRequestHandlers
ShardHandlers
Other Request Handlers
其实solr中的处理器的类型也不是很多现在也就四五种,两种常用的,搜索和更新
Search Components
Search components define the logic that is used by the SearchHandler to perform queries for users.
对应的是搜索处理器
Default Components
除了用 first-components and last-component 来定义外,默认组件的顺序是:
query solr.QueryComponent Described in the section
Query Syntax and Parsing.
facet solr.FacetComponent Described in the section Faceting. mlt solr.MoreLikeThisComponent Described in the section MoreLikeThis. highlight solr.HighlightComponent Described in the section Highlighting. stats solr.StatsComponent Described in the section The Stats Component. debug solr.DebugComponent Described in the section on Common Query Parameters expand solr.ExpandComponent Described in the section Collapse and Expand Results.
可以通过配置相同名称对默认的组件进行替换
First-Components and Last-Components
<arr name="first-components">
<str>mycomponent</str> </arr> <arr name="last-components"> <str>spellcheck</str> </arr> Components
如果不使用 first和last来添加组件,默认的组件将不启动
<arr name="components">
<str>mycomponent</str> <str>query</str> <str>debug</str> </arr>
Other Useful Components
SpellCheckComponent, described in the section
Spell Checking.
TermVectorComponent, described in the section The Term Vector Component. QueryElevationComponent, described in the section The Query Elevation Component. TermsComponent, described in the section The Terms Component.
InitParams in SolrConfig
An <initParams> section of solrconfig.xml allows you to define request handler parameters outside of the handler configuration.
<initParams path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
<lst name="defaults"> <str name="df">_text_</str> </lst> </initParams>
给指定的处理器路径进行统一的默认配置
If we later want to change the /query request handler to search a different field by default, we could override the <initParams> by defining the parameter in the <requestHandler> section for /query
可以在当个路径中进行覆盖
Wildcards
例子:
<initParams name="myParams" path="/myhandler,/root/*,/root1/**">
<lst name="defaults"> <str name="fl">_text_</str> </lst> <lst name="invariants"> <str name="rows">10</str> </lst> <lst name="appends"> <str name="df">title</str> </lst> </initParams>
UpdateHandlers in SolrConfig
<updateHandler class="solr.DirectUpdateHandler2"> ... </updateHandler>
Topics covered in this section:
Commits commit and softCommit autoCommit commitWithin Event Listeners Transaction Log
Commits
Data sent to Solr is not searchable until it has been committed to the index.
commit and softCommit
commit是硬提交,数据完全提交到硬盘中
softCommit 能快速的将索引可见,实现近实时索引,但是机器挂了会丢数据
autoCommit
<autoCommit>
<maxDocs>10000</maxDocs> <maxTime>1000</maxTime> <openSearcher>false</openSearcher> </autoCommit>
<autoSoftCommit>
<maxTime>1000</maxTime> </autoSoftCommit>
commitWithin
for that reason the default is to perform a soft commit
<commitWithin>
<softCommit>false</softCommit> </commitWithin>
With this configuration, when you call commitWithin as part of
your update message, it will automatically perform a hard commit every time.
Event Listeners
These can be triggered to occur after any commit (event="postCommit") or only after optimize commands (event="postOptimize")
两种监听配置
监听到后可以进行相应的处理:
RunExecutableListener
有些参数--
Transaction Log
a transaction log is required for that feature. It is configured in the updateHandler section of solrconfig.xml.
<updateLog>
<str name="dir">${solr.ulog.dir:}</str> </updateLog>
有一些配置的参数;
<updateLog>
<str name="dir">${solr.ulog.dir:}</str> <int name="numRecordsToKeep">500</int> <int name="maxNumLogsToKeep">20</int> <int name="numVersionBuckets">65536</int> </updateLog>
Query Settings in SolrConfig
The settings in this section affect the way that Solr will process and respond to queries
<query>
... </query>
Topics covered in this section:
Caches Query Sizing and Warming Query-Related Listeners
Caches
将查询的条件和结果缓存下来,当再次查询时从缓存中获取,提高查询速度.
当重新打卡索引时对缓存进行预热更新.
使用的有三种:
In Solr, there are three cache implementations: solr.search.LRUCache, solr.search.FastLRUCache, and solr.search.LFUCache .
filterCache
当使用fq参数查询时,会将条件和结果缓存下来,等待下次相同的查询条件命中,进行快速返回
<filterCache class="solr.LRUCache"
size="512" initialSize="512" autowarmCount="128"/>
queryResultCache
This cache holds the results of previous searches: ordered lists of document IDs (DocList) based on a query, a sort, and the range of documents requested
<queryResultCache class="solr.LRUCache"
size="512" initialSize="512" autowarmCount="128" maxRamMB="1000"/>
documentCache
This cache holds Lucene Document objects (the stored fields for each document).
Since Lucene internal document IDs are transient, this cache is not auto-warmed.
<documentCache class="solr.LRUCache"
size="512" initialSize="512" autowarmCount="0"/> User Defined Caches 自定义缓存
<cache name="myUserCache" class="solr.LRUCache"
size="4096" initialSize="1024" autowarmCount="1024" regenerator="org.mycompany.mypackage.MyRegenerator" /> 预热器的另一个配置:
regenerator="solr.NoOpRegenerator".
Query Sizing and Warming
maxBooleanClauses
最大布尔查询数量,依赖最后一个初始化配置:
<maxBooleanClauses>1024</maxBooleanClauses>
enableLazyFieldLoading <enableLazyFieldLoading>true</enableLazyFieldLoading>
useFilterForSortedQuery
没有使用score进行排序时很有用
<useFilterForSortedQuery>true</useFilterForSortedQuery>
queryResultWindowSize 超范围查询结果缓存:大于指定数目:
<queryResultWindowSize>20</queryResultWindowSize>
queryResultMaxDocsCached
<queryResultMaxDocsCached>200</queryResultMaxDocsCached> useColdSearcher This setting controls whether search requests for which there is not a currently registered searcher should wait for a new searcher to warm up (false) or proceed immediately (true). When set to "false", requests will block until the searcher has warmed its caches. <useColdSearcher>false</useColdSearcher> maxWarmingSearchers
<maxWarmingSearchers>2</maxWarmingSearchers>
Query-Related Listeners
两种类型:
<listener event="newSearcher" class="solr.QuerySenderListener">
<arr name="queries"> <!-- <lst><str name="q">solr</str><str name="sort">price asc</str></lst> <lst><str name="q">rocks</str><str name="sort">weight asc</str></lst> --> </arr> </listener> <listener event="firstSearcher" class="solr.QuerySenderListener"> <arr name="queries"> <lst><str name="q">static firstSearcher warming in solrconfig.xml</str></lst> </arr> </listener>
RequestDispatcher in SolrConfig
Topics in this section: handleSelect Element requestParsers Element httpCaching Element
handleSelect Element
向后兼容
<requestDispatcher handleSelect="true" >
... </requestDispatcher>
requestParsers Element
The <requestParsers> sub-element controls values related to parsing requests. This is an empty XML element that doesn't have any content, only attributes.
几个参数的介绍;
<requestParsers enableRemoteStreaming="true"
multipartUploadLimitInKB="2048000" formdataUploadLimitInKB="2048" addHttpRequestToContext="false" />
httpCaching Element
<httpCaching never304="false"
lastModFrom="openTime" etagSeed="Solr"> <cacheControl>max-age=30, public</cacheControl> </httpCaching>
cacheControl Element
Update Request Processors
Anatomy and life cycle
Configuration Update processors in SolrCloud Using custom chains Update Request Processor Factories
Anatomy and life cycle
更新过程有默认的处理链,除非你配置了一个自己的处理链.
处理器要有处理器工厂,符合两个要求:
An update request processor need not be thread safe because it is used by one and only
one requesthread and destroyed once the request is complete.
The factory class can accept configuration parameters and maintain any state that may be required between requests. The factory class must be thread-safe.
Configuration
配置在solrconfig.xml中加载时就加载或者使用参数,运行时加载
自定义需要参考默认的处理器,一些必备的处理过程
The default update request processor chain
按照顺序:
LogUpdateProcessorFactory - Tracks the commands processed during this request and
logs them
DistributedUpdateProcessorFactory - Responsible for distributing update requests to the right node e.g.
routing requests to the leader of the right shard and distributing updates from the leader to each replica. This processor is activated only in SolrCloud mode. RunUpdateProcessorFactory - Executes the update using internal Solr APIs.
Custom update request processor chain
updateRequestProcessorChain
<updateRequestProcessorChain name="dedupe">
<processor class="solr.processor.SignatureUpdateProcessorFactory"> <bool name="enabled">true</bool> <str name="signatureField">id</str> <bool name="overwriteDupes">false</bool> <str name="fields">name,features,cat</str> <str name="signatureClass">solr.processor.Lookup3Signature</str> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> Solr will automatically insert DistributedUpdateProcessorFactory in this chain that does not include it just prior to the RunUpdateProcessorFactory
Configuring individual processors as top-level plugins
updateProcessor
<updateProcessor class="solr.processor.SignatureUpdateProcessorFactory"
name="signature"> <bool name="enabled">true</bool> <str name="signatureField">id</str> <bool name="overwriteDupes">false</bool> <str name="fields">name,features,cat</str> <str name="signatureClass">solr.processor.Lookup3Signature</str> </updateProcessor> <updateProcessor class="solr.RemoveBlankFieldUpdateProcessorFactory" name="remove_blanks"/>
接下来可以使用作为自定义的参数:
updateRequestProcessorChains and updateProcessors
<updateProcessorChain name="custom" processor="remove_blanks,signature">
<processor class="solr.RunUpdateProcessorFactory" /> </updateProcessorChain> Update processors in SolrCloud
A critical SolrCloud functionality is the routing and distributing of requests – for update requests this routing is implemented by the DistributedUpdateRequestProcessor, and this processor is given a special status by Solr due to its important function.
更新处理器链中分布式更新处理时,分布式处理器之前是在接收到的节点进行处理,到分布式处理器后会进行路由的分发,到指定的lead节点处理,后进行日志记录,分发到副本进行处理;
举个栗子:
For example, consider the "dedupe" chain which we saw in a section above. Assume that a 3 node SolrCloud
cluster exists where node A hosts the leader of shard1, node B hosts the leader of shard2 and node C hosts the replica of shard2. Assume that an update request is sent to node A which forwards the update to node B (because the update belongs to shard2) which then distributes the update to its replica node C. Let's see what happens at each node: Node A: Runs the update through the SignatureUpdateProcessor (which computes the signature and puts it in the "id" field), then LogUpdateProcessor and then DistributedUpdateProcessor. This processor determines that the update actually belongs to node B and is forwarded to node B. The update is not processed further. This is required because the next processor which is RunUpdateProcessor will execute the update against the local shard1 index which would lead to duplicate data on shard1 and shard2. Node B: Receives the update and sees that it was forwarded by another node. The update is directly sent to DistributedUpdateProcessor because it has already been through the SignatureUpdateProcessor on node A and doing the same signature computation again would be redundant. The DistributedUpdateProc essor determines that the update indeed belongs to this node, distributes it to its replica on Node C and then forwards the update further in the chain to RunUpdateProcessor. Node C: Receives the update and sees that it was distributed by its leader. The update is directly sent to DistributedUpdateProcessor which performs some consistency checks and forwards the update further in the chain to RunUpdateProcessor. In summary: All processors before DistributedUpdateProcessor are only run on the first node that receives an update request whether it be a forwarding node (e.g. node A in the above example) or a leader (e.g. node B). We call these pre-processors or just processors. All processors after DistributedUpdateProcessor run only on the leader and the replica nodes. They are not executed on forwarding nodes. Such processors are called "post-processors".
post-processors
<updateProcessorChain name="custom" processor="signature"
post-processor="remove_blanks"> <processor class="solr.RunUpdateProcessorFactory" /> </updateProcessorChain>
Using custom chains
update.chain request parameter
你可以选择使用那个更新处理器链来处理请求
update.chain
curl
"http://localhost:8983/solr/gettingstarted/update/json?update.chain=dedupe&commit=tr ue" -H 'Content-type: application/json' -d ' [ { "name" : "The Lightning Thief", "features" : "This is just a test", "cat" : ["book","hardcover"] }, { "name" : "The Lightning Thief", "features" : "This is just a test", "cat" : ["book","hardcover"] } ]'
processor & post-processor request parameters
使用这两个参数来构造一个动态的处理过程
Constructing a chain at request time
# Executing processors configured in solrconfig.xml as (pre)-processors
curl "http://localhost:8983/solr/gettingstarted/update/json?processor=remove_blanks,signa ture&commit=true" -H 'Content-type: application/json' -d ' [ { "name" : "The Lightning Thief", "features" : "This is just a test", "cat" : ["book","hardcover"] }, { "name" : "The Lightning Thief", "features" : "This is just a test", "cat" : ["book","hardcover"] } ]' # Executing processors configured in solrconfig.xml as pre and post processors curl "http://localhost:8983/solr/gettingstarted/update/json?processor=remove_blanks&postprocessor=signature&commit=true" -H 'Content-type: application/json' -d ' [ { "name" : "The Lightning Thief", "features" : "This is just a test", "cat" : ["book","hardcover"] }, { "name" : "The Lightning Thief", "features" : "This is just a test", "cat" : ["book","hardcover"] } ]'
Configuring a custom chain as a default
将自己配置的自定义处理过程定义为默认的处理的两种方式:
This can be done by adding either "update.chain" or "processor" and "post-processor" as default parameter for a
given path which can be done either via InitParams in SolrConfig or by adding them in a "defaults" section which is supported by all request handlers.
例子:
InitParams
<initParams path="/update/**">
<lst name="defaults"> <str name="update.chain">add-unknown-fields-to-the-schema</str> </lst> </initParams>
defaults
<requestHandler name="/update/extract"
startup="lazy" class="solr.extraction.ExtractingRequestHandler" > <lst name="defaults"> <str name="update.chain">add-unknown-fields-to-the-schema</str> </lst> </requestHandler>
Update Request Processor Factories
有下列工厂类,具体功能见文档:
AddSchemaFieldsUpdateProcessorFactory:
CloneFieldUpdateProcessorFactory: DefaultValueUpdateProcessorFactory: DocBasedVersionConstraintsProcessorFactory: DocExpirationUpdateProcessorFactory: IgnoreCommitOptimizeUpdateProcessorFactory: RegexpBoostProcessorFactory: SignatureUpdateProcessorFactory: StatelessScriptUpdateProcessorFactory: TimestampUpdateProcessorFactory: URLClassifyProcessorFactory: UUIDUpdateProcessorFactory:
FieldMutatingUpdateProcessorFactory derived factories
ConcatFieldUpdateProcessorFactory
CountFieldValuesUpdateProcessorFactory FieldLengthUpdateProcessorFactory FirstFieldValueUpdateProcessorFactory HTMLStripFieldUpdateProcessorFactory IgnoreFieldUpdateProcessorFactory LastFieldValueUpdateProcessorFactory MaxFieldValueUpdateProcessorFactory MinFieldValueUpdateProcessorFactory ParseBooleanFieldUpdateProcessorFactory ParseDateFieldUpdateProcessorFactory ParseNumericFieldUpdateProcessorFactory derived classes ParseDoubleFieldUpdateProcessorFactory: Attempts to mutate selected fields that have only CharSequence-typed values into Double values. ParseFloatFieldUpdateProcessorFactory : Attempts to mutate selected fields that have only CharSequence-typed values into Float values. ParseIntFieldUpdateProcessorFactory : Attempts to mutate selected fields that have only CharSequence-typed values into Integer values. ParseLongFieldUpdateProcessorFactory : Attempts to mutate selected fields that have only CharSequence-typed values into Long values.
PreAnalyzedUpdateProcessorFactory
RegexReplaceProcessorFactory : RemoveBlankFieldUpdateProcessorFactory : TrimFieldUpdateProcessorFactory: TruncateFieldUpdateProcessorFactory: UniqFieldsUpdateProcessorFactory :
Update Processor factories that can be loaded as plugins
可以自己扩展的接个插件工厂包:
LangDetectLanguageIdentifierUpdateProcessorFactory : 这个是google的??
TikaLanguageIdentifierUpdateProcessorFactory
UIMAUpdateRequestProcessorFactory
Update Processor factories you should not modify or remove
最好不要乱修改 solr的更新处理器工厂
Codec Factory
定义写入磁盘的编码方式,没有定义solr将使用默认值,在solrconfig.xml中定义
A compressionMode option:
BEST_SPEED (default) is optimized for search speed performance BEST_COMPRESSION is optimized for disk space usage
例子:
<codecFactory class="solr.SchemaCodecFactory"> <str name="compressionMode">BEST_COMPRESSION</str> </codecFactory> |
Solr Cores and solr.xml
In Solr, the term core is used to refer to a single index and associated transaction log and configuration files (including the solrconfig.xml and Schema files, among others).
In standalone mode, solr.xml must reside in solr_home. In SolrCloud mode, solr.xml will be loaded from Zookeeper if it exists, with fallback to solr_home.
The recommended way is to dynamically create cores/collections using the APIs
The following sections describe these options in more detail.
Format of solr.xml: Details on how to define solr.xml, including the acceptable parameters for the solr.xml file Defining core.properties: Details on placement of core.properties and available property options. CoreAdmin API: Tools and commands for core administration using a REST API. Config Sets: How to use configsets to avoid duplicating effort when defining a new core.
Format of solr.xml
This section will describe the default solr.xml file included with Solr and how to modify it for your needs. For details on how to configure core.properties, see the section
Defining core.properties
.
Defining solr.xml
Solr.xml Parameters The <solr> Element The <solrcloud> element The <logging> element The <logging><watcher> element The <shardHandlerFactory> element Substituting JVM System Properties in solr.xml
Defining solr.xml
You can find solr.xml in your Solr Home directory or in Zookeeper. The default solr.xml file looks like this: <solr> <solrcloud> <str name="host">${host:}</str> <int name="hostPort">${jetty.port:8983}</int> <str name="hostContext">${hostContext:solr}</str> <int name="zkClientTimeout">${zkClientTimeout:15000}</int> <bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool> </solrcloud> <shardHandlerFactory name="shardHandlerFactory" class="HttpShardHandlerFactory"> <int name="socketTimeout">${socketTimeout:0}</int> <int name="connTimeout">${connTimeout:0}</int> </shardHandlerFactory> </solr> Unless the -DzkHost or -DzkRun are specified at startup time, this section is ignored.
Solr.xml Parameters
The <solr> Element
几个属性值的介绍
The <solrcloud> element
This section is ignored unless the solr instance is started with either -DzkRun or -DzkHost
solrcloud模式下的参数配置及访问控制令牌配置
The <logging> element
日志类及是否启用
The <logging><watcher> element
日志监控配置信息
The <shardHandlerFactory> element
定义分片处理器:
Custom shard handlers can be defined in solr.xml if you wish to create a custom shard handler.
<shardHandlerFactory name="ShardHandlerFactory" class="qualified.class.name"> Since this is a custom shard handler, sub-elements are specific to the implementation.
Substituting JVM System Properties in solr.xml
可以在 solr.xml中配置 jvm属性
${propertyname[:option default value]} 设置默认值
动态设置jvm的属性值将覆盖设置的默认值
<solr>
<shardHandlerFactory name="shardHandlerFactory" class="HttpShardHandlerFactory"> <int name="socketTimeout">${socketTimeout:0}</int> <int name="connTimeout">${connTimeout:0}</int> </shardHandlerFactory> </solr>
Defining core.properties
core.properties文件是典型的javaproperties文件形式,例子:
name=my_core_name
Placement of core.properties
core.properties的位置在solr_home下的core文件中
Defining core.properties Files
name
The name of the SolrCore. You'll use this name to reference the SolrCore when running
commands with the CoreAdminHandler
config
The configuration file name for a given core. The default is solrconfig.xml.
schema
The schema file name for a given core. The default is schema.xml but please note that if
you are using a "managed schema" (the default behavior) then any value for this property which does not match the effective managedSchemaResourceName will be read once, backed up, and converted for managed schema use.
dataDir
The core's data directory (where indexes are stored) as either an absolute pathname, or a
path relative to the value of instanceDir. This is data by default.
configSet
The name of a defined configset, if desired, to use to configure the core
properties
The name of the properties file for this core. The value can be an absolute pathname or a
path relative to the value of instanceDir
transient
If true, the core can be unloaded if Solr reaches the transientCacheSize. The default
if not specified is false. Cores are unloaded in order of least recently used first.
Setting to true is not recommended in SolrCloud mode.
loadOnStartup
If true, the default if it is not specified, the core will loaded when Solr starts. Setting to fals
e is not recommended in SolrCloud mode.
coreNodeName
Used only in SolrCloud, this is a unique identifier for the node hosting this replica. By
default a coreNodeName is generated automatically, but setting this attribute explicitly allows you to manually assign a new core to replace an existing replica. For example: when replacing a machine that has had a hardware failure by restoring from backups on a new machine with a new hostname or port..
ulogDir
The absolute or relative directory for the update log for this core (SolrCloud)
shard
The shard to assign this core to (SolrCloud) collection
The name of the collection this core is part of (SolrCloud).
roles
Future param for SolrCloud or a way for users to mark nodes for their own use
这个不太理解
Additional "user defined" properties may be specified for use as variables. For more information on how to define local properties, see the section Substituting Properties in Solr Config Files.
用户自定义属性???
CoreAdmin API 适用于单机版本
SolrCloud users should not typically use the CoreAdmin API directly
solrcloud模式通常不直接使用coreadmin api
the cores running in that node and is accessible at the /solr/admin/cores path.
HTTP requests that specify an "action" request parameter
All action names are uppercase, and are defined in depth in the sections below
STATUS
CREATE RELOAD RENAME SWAP UNLOAD MERGEINDEXES SPLIT REQUESTSTATUS
STATUS
The STATUS action returns the status of all running Solr cores, or status for only the named core.
http://localhost:8983/solr/admin/cores?action=STATUS&core=core0
Input
core 指定core的名字
indexInfo 是否返回索引的信息 默认返回,当数量过多时,加快返回可以设置为false
CREATE
The CREATE action creates a new core and registers it. If a Solr core with the given name already exists, it will continue to handle requests while the new core is initializing. When the new core is ready, it will take new requests and the old core will be unloaded.
创建一个已经存在的core,旧的core将被替换掉
http://localhost:8983/solr/admin/cores?action=CREATE&name=coreX&instanceDir=path/to/dir&config=config_file_name.xml&dataDir=data
Input name |
Solr入门之官方文档6.0阅读笔记系列(十)
最新推荐文章于 2024-06-01 10:26:00 发布
本文为Solr初学者提供了一份详尽的6.0官方文档阅读笔记,涵盖了核心概念、安装配置、索引创建、查询检索等方面,旨在帮助读者快速掌握Solr的基本操作和应用。
摘要由CSDN通过智能技术生成