Solr入门之官方文档6.0阅读笔记系列(十)

本文为Solr初学者提供了一份详尽的6.0官方文档阅读笔记,涵盖了核心概念、安装配置、索引创建、查询检索等方面,旨在帮助读者快速掌握Solr的基本操作和应用。
摘要由CSDN通过智能技术生成
The Well-Configured Solr Instance
告诉你如何调节solr实例到最佳性能
Configuring solrconfig.xml

solrconfig.xml的配置对solr工作的影响很大

能完成以下内容:
request handlers, which process the requests to Solr, such as requests to add documents to the index or requests to return results for a query
listeners, processes that "listen" for particular query-related events; listeners can be used to trigger the execution of special code, such as invoking some common queries to warm-up caches
the Request Dispatcher for managing HTTP communications
the Admin Web interface
parameters related to replication and duplication (these parameters are covered in detail in  Legacy Scaling and Distribution)

主要讲述的内容:
DataDir and DirectoryFactory in SolrConfig
Lib Directives in SolrConfig
Schema Factory Definition in SolrConfig
IndexConfig in SolrConfig
RequestHandlers and SearchComponents in SolrConfig
InitParams in SolrConfig
UpdateHandlers in SolrConfig
Query Settings in SolrConfig
RequestDispatcher in SolrConfig
Update Request Processors
Codec Factory


Substituting Properties in Solr Config Files

solrconf,xml中支持动态设置属性值
${propertyname[:option default value]}
给予默认值或者运行时指定值或者报错

几种指定变量的方式:
JVM System Properties
Any JVM System properties, usually specified using the -D flag when starting the JVM, can be used as variables in any XML configuration file in Solr.

For example, in the sample solrconfig.xml files, you will see this value which defines the locking type to use:
<lockType>${solr.lock.type:native}</lockType>
Which means the lock type defaults to "native" but when starting Solr, you could override this using a JVM
system property by launching the Solr it with:
bin/solr start -Dsolr.lock.type=none
In general, any Java system property that you want to set can be passed through the bin/solr script using the
standard -Dproperty=value syntax. Alternatively, you can add common system properties to the SOLR_OPTS
environment variable defined in the Solr include file (bin/solr.in.sh). For more information about how the
Solr include file works, refer to: Taking Solr to Production.


设置参数的两种方式:
一个是启动时传入
一个是在solr的初始化文件中设置

solrcore.properties

If the configuration directory for a Solr core contains a file named solrcore.properties that file can contain
any arbitrary user defined property names and values using the Java standard  properties file format, and those
properties can be used as variables in the XML configuration files for that Solr core.
For example, the following solrcore.properties file could be created in the conf/ directory of a collection
using one of the example configurations, to override the lockType used.
#conf/solrcore.properties
solr.lock.type=none


第二种方式使用 solrcore.properties

这个文件的名称和位置默认在conf下,可以使用core.properties中指定名称和位置

User defined properties from core.properties

For example, consider the following core.properties file:
#core.properties
name=collection2
my.custom.prop=edismax
The my.custom.prop property can then be used as a variable, such as in solrconfig.xml:
<requestHandler name="/select">
<lst name="defaults">
<str name="defType">${my.custom.prop}</str>
</lst>
</requestHandler>


Implicit Core Properties

隐式定义的核心属性:
All implicit properties use the solr.core. name prefix, and reflect the runtime value of the equivalent  core.pr
operties property:
solr.core.name
solr.core.config
solr.core.schema
solr.core.dataDir
solr.core.transient
solr.core.loadOnStartup



DataDir and DirectoryFactory in SolrConfig

Specifying a Location for Index Data with the dataDir Parameter
通过dataDir指定索引数据的存放位置

<dataDir>/var/data/solr/</dataDir>
If you are using replication to replicate the Solr index (as described in Legacy Scaling and Distribution), then the <dataDir> directory should correspond to the index directory used in the replication configuration.

相对路径和绝对路径及副本设置

Specifying the DirectoryFactory For Your Index

You can force a particular implementation by specifying solr.MMapDirector
yFactory, solr.NIOFSDirectoryFactory, or solr.SimpleFSDirectoryFactory.
<directoryFactory name="DirectoryFactory"
class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
The solr.RAMDirectoryFactory is memory based, not persistent, and does not work with replication. Use
this DirectoryFactory to store your index in RAM.
<directoryFactory class="org.apache.solr.core.RAMDirectoryFactory"/>


不同操作系统采用不同的文件目录系统,还可以将索引建在hdfs上
solr.HdfsDirectoryFactory instead of either of the above implementations.

Lib Directives in SolrConfig

能够使用正则表达式,所有的位置都是相对solr实例:
All directories are resolved as relative to the Solr instanceDir


<lib dir="../../../contrib/extraction/lib" regex=".*\.jar" />
<lib dir="../../../dist/" regex="solr-cell-\d.*\.jar" />
<lib dir="../../../contrib/clustering/lib/" regex=".*\.jar" />
<lib dir="../../../dist/" regex="solr-clustering-\d.*\.jar" />
<lib dir="../../../contrib/langid/lib/" regex=".*\.jar" />
<lib dir="../../../dist/" regex="solr-langid-\d.*\.jar" />
<lib dir="../../../contrib/velocity/lib" regex=".*\.jar" />
<lib dir="../../../dist/" regex="solr-velocity-\d.*\.jar" />

Schema Factory Definition in SolrConfig

While the "read" features of the Solr API are supported for all Schema types, support for making Schema modifications programatically depends on the <schemaFactory/> in use.

Managed Schema Default

Solr implicitly uses a ManagedIndexSchemaFactory 

一个例子:
<schemaFactory class="ManagedIndexSchemaFactory">
<bool name="mutable">true</bool>
<str name="managedSchemaResourceName">managed-schema</str>
</schemaFactory>

mutable - controls whether changes may be made to the Schema data. This must be set to  true to allow edits to be made with the Schema API.
managedSchemaResourceName is an optional parameter that defaults to "managed-schema", and defines a new name for the schema file that can be anything other than "schema.xml".

Classic schema.xml

disallows any programatic changes to the Schema at run time. 

<schemaFactory class="ClassicIndexSchemaFactory"/>

不支持运行时的修改,仅仅支持修改后重新加载生效模式

Switching from schema.xml to Managed Schema

可以将 不能编辑的schema.xml转为可编辑的 模式在solrconfig.xml中配置

Changing to Manually Edited schema.xml

改变为手动编辑的模式

步骤:

Rename the managed-schema file to schema.xml.
Modify solrconfig.xml to replace the schemaFactory class.
Remove any ManagedIndexSchemaFactory definition if it exists.
Add a ClassicIndexSchemaFactory definition as shown above Reload the core(s).
If you are using SolrCloud, you may need to modify the files via ZooKeeper.


IndexConfig in SolrConfig

In most cases, the defaults are fine
<indexConfig>
...
</indexConfig>

Parameters covered in this section:
Writing New Segments
Merging Index Segments
Compound File Segments
Index Locks
Other Indexing Settings


Writing New Segments

ramBufferSizeMB

<ramBufferSizeMB>100</ramBufferSizeMB>



maxBufferedDocs

<maxBufferedDocs>1000</maxBufferedDocs>


useCompoundFile

<useCompoundFile>false</useCompoundFile>

上面是文件的更新控制

Merging Index Segments

mergePolicyFactory
default in Solr is to use a TieredMergePolicy
Other policies available are the LogByteSizeMergePolicy and LogDocMergePolicy. 

<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">10</int>
<int name="segmentsPerTier">10</int>
</mergePolicyFactory>

Controlling Segment Sizes: Merge Factors

For TieredMergePolicy, this is controlled by setting the <int name="maxMergeAtOnce"> and <int name="segmentsPerTier"> options, while LogByteSizeMergePolicy has a single <int name="mergeFactor"> option (all of which default to "10").

对于合并索引片段能加快搜索但是需要提交创建索引的时间

Customizing Merge Policies


一个例子:

<mergePolicyFactory class="org.apache.solr.index.SortingMergePolicyFactory">
<str name="sort">timestamp desc</str>
<str name="wrapped.prefix">inner</str>
<str name="inner.class">org.apache.solr.index.TieredMergePolicyFactory</str>
<int name="inner.maxMergeAtOnce">10</int>
<int name="inner.segmentsPerTier">10</int>
</mergePolicyFactory>

mergeScheduler

The merge scheduler controls how merges are performed


The default ConcurrentMergeScheduler  多线程
The alternative, SerialMergeScheduler,  串行线程

<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/>


mergedSegmentWarmer

有利于近时时搜索

<mergedSegmentWarmer class="org.apache.lucene.index.SimpleMergedSegmentWarmer"/>

Compound File Segments

复合文件片段

Index Locks
lockType
锁类型
StandardDirectoryFactory (the default)

native
simple
single
hdfs 

<lockType>native</lockType>


writeLockTimeout
写入锁的超时时间
<writeLockTimeout>1000</writeLockTimeout>

Other Indexing Settings

其余的一些参数:
reopenReaders  
deletionPolicy
infoStream

例子:
<reopenReaders>true</reopenReaders>
<deletionPolicy class="solr.SolrDeletionPolicy">
<str name="maxCommitsToKeep">1</str>
<str name="maxOptimizedCommitsToKeep">0</str>
<str name="maxCommitAge">1DAY</str>
</deletionPolicy>
<infoStream>false</infoStream>


RequestHandlers and SearchComponents in SolrConfig


Request Handlers
SearchHandlers
UpdateRequestHandlers
ShardHandlers
Other Request Handlers
Search Components
Default Components
First-Components and Last-Components
Components
Other Useful Components

Request Handlers

请求处理器和路径的映射关系

SearchHandlers

参数和特点

UpdateRequestHandlers

ShardHandlers

Other Request Handlers

其实solr中的处理器的类型也不是很多现在也就四五种,两种常用的,搜索和更新


Search Components

Search components define the logic that is used by the SearchHandler to perform queries for users.

对应的是搜索处理器

Default Components

除了用 first-components and last-component 来定义外,默认组件的顺序是:

query  solr.QueryComponent  Described in the section      Query Syntax and Parsing.
facet    solr.FacetComponent Described in the section       Faceting.
mlt      solr.MoreLikeThisComponent Described in the section MoreLikeThis.
highlight solr.HighlightComponent Described in the section Highlighting.
stats      solr.StatsComponent Described in the section The Stats Component.
debug   solr.DebugComponent Described in the section on Common Query Parameters
expand  solr.ExpandComponent Described in the section Collapse and Expand Results.


可以通过配置相同名称对默认的组件进行替换

First-Components and Last-Components


<arr name="first-components">
<str>mycomponent</str>
</arr>
<arr name="last-components">
<str>spellcheck</str>
</arr>

Components

如果不使用 first和last来添加组件,默认的组件将不启动

<arr name="components">
<str>mycomponent</str>
<str>query</str>
<str>debug</str>
</arr>


Other Useful Components

SpellCheckComponent, described in the section  Spell Checking.
TermVectorComponent, described in the section The Term Vector Component.
QueryElevationComponent, described in the section The Query Elevation Component.
TermsComponent, described in the section The Terms Component.


InitParams in SolrConfig

An <initParams> section of solrconfig.xml allows you to define request handler parameters outside of the handler configuration.

<initParams path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
<lst name="defaults">
<str name="df">_text_</str>
</lst>
</initParams>

给指定的处理器路径进行统一的默认配置

If we later want to change the /query request handler to search a different field by default, we could override the <initParams> by defining the parameter in the <requestHandler> section for /query
可以在当个路径中进行覆盖


Wildcards


例子:

<initParams name="myParams" path="/myhandler,/root/*,/root1/**">
<lst name="defaults">
<str name="fl">_text_</str>
</lst>
<lst name="invariants">
<str name="rows">10</str>
</lst>
<lst name="appends">
<str name="df">title</str>
</lst>
</initParams>

UpdateHandlers in SolrConfig
<updateHandler class="solr.DirectUpdateHandler2">
...
</updateHandler>


Topics covered in this section:
Commits
commit and softCommit
autoCommit
commitWithin
Event Listeners
Transaction Log


Commits

Data sent to Solr is not searchable until it has been committed to the index.

commit and softCommit

commit是硬提交,数据完全提交到硬盘中
softCommit 能快速的将索引可见,实现近实时索引,但是机器挂了会丢数据

autoCommit

<autoCommit>
<maxDocs>10000</maxDocs>
<maxTime>1000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>


<autoSoftCommit>
<maxTime>1000</maxTime>
</autoSoftCommit>


commitWithin

for that reason the default is to perform a soft commit


<commitWithin>
<softCommit>false</softCommit>
</commitWithin>


With this configuration, when you call commitWithin as part of
 your update message, it will automatically perform a hard commit every time.


Event Listeners

These can be triggered to occur after any commit (event="postCommit") or only after optimize commands (event="postOptimize")

两种监听配置

监听到后可以进行相应的处理:
RunExecutableListener
有些参数--


Transaction Log

a transaction log is required for that feature. It is configured in the updateHandler section of solrconfig.xml.

<updateLog>
<str name="dir">${solr.ulog.dir:}</str>
</updateLog>


有一些配置的参数;
<updateLog>
<str name="dir">${solr.ulog.dir:}</str>
<int name="numRecordsToKeep">500</int>
<int name="maxNumLogsToKeep">20</int>
<int name="numVersionBuckets">65536</int>
</updateLog>



Query Settings in SolrConfig

The settings in this section affect the way that Solr will process and respond to queries

<query>
...
</query>


Topics covered in this section:
Caches
Query Sizing and Warming
Query-Related Listeners



Caches

将查询的条件和结果缓存下来,当再次查询时从缓存中获取,提高查询速度.
当重新打卡索引时对缓存进行预热更新.
使用的有三种:
In Solr, there are three cache implementations: solr.search.LRUCache, solr.search.FastLRUCache, and solr.search.LFUCache .

filterCache

当使用fq参数查询时,会将条件和结果缓存下来,等待下次相同的查询条件命中,进行快速返回

<filterCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="128"/>


queryResultCache

This cache holds the results of previous searches: ordered lists of document IDs (DocList) based on a query, a sort, and the range of documents requested

<queryResultCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="128"
maxRamMB="1000"/>



documentCache
This cache holds Lucene Document objects (the stored fields for each document). 
Since Lucene internal document IDs are transient, this cache is not auto-warmed. 

<documentCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>



User Defined Caches

自定义缓存

<cache name="myUserCache" class="solr.LRUCache"
size="4096"
initialSize="1024"
autowarmCount="1024"
regenerator="org.mycompany.mypackage.MyRegenerator" />


预热器的另一个配置:
regenerator="solr.NoOpRegenerator".


Query Sizing and Warming


maxBooleanClauses

最大布尔查询数量,依赖最后一个初始化配置:

<maxBooleanClauses>1024</maxBooleanClauses>



enableLazyFieldLoading


<enableLazyFieldLoading>true</enableLazyFieldLoading>


useFilterForSortedQuery

没有使用score进行排序时很有用

<useFilterForSortedQuery>true</useFilterForSortedQuery>

queryResultWindowSize

超范围查询结果缓存:大于指定数目:

<queryResultWindowSize>20</queryResultWindowSize>

queryResultMaxDocsCached


<queryResultMaxDocsCached>200</queryResultMaxDocsCached>


useColdSearcher

This setting controls whether search requests for which there is not a currently registered searcher should wait for a new searcher to warm up (false) or proceed immediately (true). When set to "false", requests will block until the searcher has warmed its caches.

<useColdSearcher>false</useColdSearcher>

maxWarmingSearchers

<maxWarmingSearchers>2</maxWarmingSearchers>

Query-Related Listeners

两种类型:

<listener event="newSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<!--
<lst><str name="q">solr</str><str name="sort">price asc</str></lst>
<lst><str name="q">rocks</str><str name="sort">weight asc</str></lst>
-->
</arr>
</listener>
<listener event="firstSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<lst><str name="q">static firstSearcher warming in solrconfig.xml</str></lst>
</arr>
</listener>



RequestDispatcher in SolrConfig
Topics in this section:
handleSelect Element
requestParsers Element
httpCaching Element



handleSelect Element
向后兼容

<requestDispatcher handleSelect="true" >
...
</requestDispatcher>


requestParsers Element

The <requestParsers> sub-element controls values related to parsing requests. This is an empty XML element that doesn't have any content, only attributes.

几个参数的介绍;
<requestParsers enableRemoteStreaming="true"
multipartUploadLimitInKB="2048000"
formdataUploadLimitInKB="2048"
addHttpRequestToContext="false" />


httpCaching Element

<httpCaching never304="false"
lastModFrom="openTime"
etagSeed="Solr">
<cacheControl>max-age=30, public</cacheControl>
</httpCaching>


cacheControl Element




Update Request Processors

Anatomy and life cycle
Configuration
Update processors in SolrCloud
Using custom chains
Update Request Processor Factories



Anatomy and life cycle

更新过程有默认的处理链,除非你配置了一个自己的处理链.
处理器要有处理器工厂,符合两个要求:
An update request processor need not be thread safe because it is used by one and only 
one requesthread and destroyed once the request is complete.
The factory class can accept configuration parameters and maintain any state that may be

required between requests. The factory class must be thread-safe.


Configuration

配置在solrconfig.xml中加载时就加载或者使用参数,运行时加载

自定义需要参考默认的处理器,一些必备的处理过程

The default update request processor chain

按照顺序:
LogUpdateProcessorFactory - Tracks the commands processed during this request and 
logs them
DistributedUpdateProcessorFactory - Responsible for distributing update requests to the right node e.g.
routing requests to the leader of the right shard and distributing updates from the leader to each replica. This processor is activated only in SolrCloud mode.                                             RunUpdateProcessorFactory - Executes the update using internal Solr APIs.



Custom update request processor chain

updateRequestProcessorChain
<updateRequestProcessorChain name="dedupe">
<processor class="solr.processor.SignatureUpdateProcessorFactory">
<bool name="enabled">true</bool>
<str name="signatureField">id</str>
<bool name="overwriteDupes">false</bool>
<str name="fields">name,features,cat</str>
<str name="signatureClass">solr.processor.Lookup3Signature</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

Solr will automatically insert DistributedUpdateProcessorFactory in this chain that does not include it just prior to the RunUpdateProcessorFactory



Configuring individual processors as top-level plugins

updateProcessor

<updateProcessor class="solr.processor.SignatureUpdateProcessorFactory"
name="signature">
<bool name="enabled">true</bool>
<str name="signatureField">id</str>
<bool name="overwriteDupes">false</bool>
<str name="fields">name,features,cat</str>
<str name="signatureClass">solr.processor.Lookup3Signature</str>
</updateProcessor>
<updateProcessor class="solr.RemoveBlankFieldUpdateProcessorFactory"
name="remove_blanks"/>


接下来可以使用作为自定义的参数:
updateRequestProcessorChains and updateProcessors

<updateProcessorChain name="custom" processor="remove_blanks,signature">
<processor class="solr.RunUpdateProcessorFactory" />
</updateProcessorChain>

Update processors in SolrCloud

A critical SolrCloud functionality is the routing and distributing of requests – for update requests this routing is implemented by the DistributedUpdateRequestProcessor, and this processor is given a special status by Solr due to its important function.

更新处理器链中分布式更新处理时,分布式处理器之前是在接收到的节点进行处理,到分布式处理器后会进行路由的分发,到指定的lead节点处理,后进行日志记录,分发到副本进行处理;

举个栗子:

For example, consider the "dedupe" chain which we saw in a section above. Assume that a 3 node SolrCloud
cluster exists where node A hosts the leader of shard1, node B hosts the leader of shard2 and node C hosts the
replica of shard2. Assume that an update request is sent to node A which forwards the update to node B
(because the update belongs to shard2) which then distributes the update to its replica node C. Let's see what
happens at each node:
Node A: Runs the update through the SignatureUpdateProcessor (which computes the signature and puts
it in the "id" field), then LogUpdateProcessor and then DistributedUpdateProcessor. This processor
determines that the update actually belongs to node B and is forwarded to node B. The update is not
processed further. This is required because the next processor which is RunUpdateProcessor will execute
the update against the local shard1 index which would lead to duplicate data on shard1 and shard2.
Node B: Receives the update and sees that it was forwarded by another node. The update is directly sent
to DistributedUpdateProcessor because it has already been through the SignatureUpdateProcessor on
node A and doing the same signature computation again would be redundant. The DistributedUpdateProc
essor determines that the update indeed belongs to this node, distributes it to its replica on Node C and
then forwards the update further in the chain to RunUpdateProcessor.
Node C: Receives the update and sees that it was distributed by its leader. The update is directly sent to
DistributedUpdateProcessor which performs some consistency checks and forwards the update further in
the chain to RunUpdateProcessor.
In summary:
All processors before DistributedUpdateProcessor are only run on the first node that receives an update
request whether it be a forwarding node (e.g. node A in the above example) or a leader (e.g. node B). We
call these pre-processors or just processors.
All processors after DistributedUpdateProcessor run only on the leader and the replica nodes. They are
not executed on forwarding nodes. Such processors are called "post-processors".



post-processors

<updateProcessorChain name="custom" processor="signature"
post-processor="remove_blanks">
<processor class="solr.RunUpdateProcessorFactory" />
</updateProcessorChain>



Using custom chains

update.chain request parameter

你可以选择使用那个更新处理器链来处理请求

update.chain

curl
"http://localhost:8983/solr/gettingstarted/update/json?update.chain=dedupe&commit=tr
ue" -H 'Content-type: application/json' -d '
[
{
"name" : "The Lightning Thief",
"features" : "This is just a test",
"cat" : ["book","hardcover"]
},
{
"name" : "The Lightning Thief",
"features" : "This is just a test",
"cat" : ["book","hardcover"]
}
]'



processor & post-processor request parameters

使用这两个参数来构造一个动态的处理过程


Constructing a chain at request time


# Executing processors configured in solrconfig.xml as (pre)-processors
curl
"http://localhost:8983/solr/gettingstarted/update/json?processor=remove_blanks,signa
ture&commit=true" -H 'Content-type: application/json' -d '
[
{
"name" : "The Lightning Thief",
"features" : "This is just a test",
"cat" : ["book","hardcover"]
},
{
"name" : "The Lightning Thief",
"features" : "This is just a test",
"cat" : ["book","hardcover"]
}
]'
# Executing processors configured in solrconfig.xml as pre and post processors
curl
"http://localhost:8983/solr/gettingstarted/update/json?processor=remove_blanks&postprocessor=signature&commit=true" -H 'Content-type: application/json' -d '
[
{
"name" : "The Lightning Thief",
"features" : "This is just a test",
"cat" : ["book","hardcover"]
},
{
"name" : "The Lightning Thief",
"features" : "This is just a test",
"cat" : ["book","hardcover"]
}
]'


Configuring a custom chain as a default

将自己配置的自定义处理过程定义为默认的处理的两种方式:

This can be done by adding either "update.chain" or "processor" and "post-processor" as default parameter for a
given path which can be done either via InitParams in SolrConfig or by adding them in a "defaults" section which
is supported by all request handlers.


例子:
InitParams

<initParams path="/update/**">
<lst name="defaults">
<str name="update.chain">add-unknown-fields-to-the-schema</str>
</lst>
</initParams>



defaults

<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="update.chain">add-unknown-fields-to-the-schema</str>
</lst>
</requestHandler>



Update Request Processor Factories

有下列工厂类,具体功能见文档:

AddSchemaFieldsUpdateProcessorFactory:
CloneFieldUpdateProcessorFactory:
DefaultValueUpdateProcessorFactory:
DocBasedVersionConstraintsProcessorFactory:
DocExpirationUpdateProcessorFactory:
IgnoreCommitOptimizeUpdateProcessorFactory
RegexpBoostProcessorFactory:
SignatureUpdateProcessorFactory:
StatelessScriptUpdateProcessorFactory
TimestampUpdateProcessorFactory
URLClassifyProcessorFactory
UUIDUpdateProcessorFactory


FieldMutatingUpdateProcessorFactory derived factories

ConcatFieldUpdateProcessorFactory
CountFieldValuesUpdateProcessorFactory
FieldLengthUpdateProcessorFactory
FirstFieldValueUpdateProcessorFactory
HTMLStripFieldUpdateProcessorFactory
IgnoreFieldUpdateProcessorFactory
LastFieldValueUpdateProcessorFactory
MaxFieldValueUpdateProcessorFactory
MinFieldValueUpdateProcessorFactory
ParseBooleanFieldUpdateProcessorFactory
ParseDateFieldUpdateProcessorFactory
ParseNumericFieldUpdateProcessorFactory derived classes
       ParseDoubleFieldUpdateProcessorFactoryAttempts to mutate selected fields that have only CharSequence-typed values into Double values.
     ParseFloatFieldUpdateProcessorFactory Attempts to mutate selected fields that have only CharSequence-typed values into Float values.
      ParseIntFieldUpdateProcessorFactory Attempts to mutate selected fields that have only  CharSequence-typed values into Integer values.
     ParseLongFieldUpdateProcessorFactory Attempts to mutate selected fields that have only  CharSequence-typed values into Long values.


PreAnalyzedUpdateProcessorFactory
RegexReplaceProcessorFactory :
RemoveBlankFieldUpdateProcessorFactory :
TrimFieldUpdateProcessorFactory:
TruncateFieldUpdateProcessorFactory:
UniqFieldsUpdateProcessorFactory :


Update Processor factories that can be loaded as plugins

可以自己扩展的接个插件工厂包:

LangDetectLanguageIdentifierUpdateProcessorFactory : 这个是google的??

TikaLanguageIdentifierUpdateProcessorFactory

UIMAUpdateRequestProcessorFactory

Update Processor factories you should not modify or remove

最好不要乱修改 solr的更新处理器工厂


Codec Factory

定义写入磁盘的编码方式,没有定义solr将使用默认值,在solrconfig.xml中定义
A compressionMode option:
 BEST_SPEED (default) is optimized for search speed performance
 BEST_COMPRESSION is optimized for disk space usage


例子:
<codecFactory class="solr.SchemaCodecFactory">
<str name="compressionMode">BEST_COMPRESSION</str>
</codecFactory>

Solr Cores and solr.xml

In Solr, the term core is used to refer to a single index and associated transaction log and configuration files (including the solrconfig.xml and Schema files, among others). 

In standalone mode, solr.xml must reside in solr_home. In SolrCloud mode, solr.xml will be loaded from Zookeeper if it exists, with fallback to solr_home.

The recommended way is to dynamically create cores/collections using the APIs

The following sections describe these options in more detail.
Format of solr.xml: Details on how to define solr.xml, including the acceptable parameters for the solr.xml file
Defining core.properties: Details on placement of core.properties and available property options.
CoreAdmin API: Tools and commands for core administration using a REST API.
Config Sets: How to use configsets to avoid duplicating effort when defining a new core.


Format of solr.xml

This section will describe the default solr.xml file included with Solr and how to modify it for your needs. For details on how to configure core.properties, see the section  Defining core.properties .

Defining solr.xml
Solr.xml Parameters
The <solr> Element
The <solrcloud> element
The <logging> element
The <logging><watcher> element
The <shardHandlerFactory> element
Substituting JVM System Properties in solr.xml


Defining solr.xml
You can find solr.xml in your Solr Home directory or in Zookeeper. The default solr.xml file looks like this:
<solr>
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">${jetty.port:8983}</int>
<str name="hostContext">${hostContext:solr}</str>
<int name="zkClientTimeout">${zkClientTimeout:15000}</int>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:0}</int>
<int name="connTimeout">${connTimeout:0}</int>
</shardHandlerFactory>
</solr>

Unless the -DzkHost or -DzkRun are specified at startup time, this section is ignored.

Solr.xml Parameters

The <solr> Element

几个属性值的介绍

The <solrcloud> element

This section is ignored unless the solr instance is started with either -DzkRun or -DzkHost

solrcloud模式下的参数配置及访问控制令牌配置

The <logging> element

日志类及是否启用

The <logging><watcher> element

日志监控配置信息

The <shardHandlerFactory> element

定义分片处理器:
Custom shard handlers can be defined in solr.xml if you wish to create a custom shard handler.
<shardHandlerFactory name="ShardHandlerFactory" class="qualified.class.name">
Since this is a custom shard handler, sub-elements are specific to the implementation.



Substituting JVM System Properties in solr.xml

可以在 solr.xml中配置 jvm属性  ${propertyname[:option default value]} 设置默认值


动态设置jvm的属性值将覆盖设置的默认值
<solr>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:0}</int>
<int name="connTimeout">${connTimeout:0}</int>
</shardHandlerFactory>
</solr>


Defining core.properties

core.properties文件是典型的javaproperties文件形式,例子:
name=my_core_name

Placement of core.properties

core.properties的位置在solr_home下的core文件中

Defining core.properties Files

name

The name of the SolrCore. You'll use this name to reference the SolrCore when running
commands with the CoreAdminHandler

config

The configuration file name for a given core. The default is solrconfig.xml.

schema

The schema file name for a given core. The default is schema.xml but please note that if
you are using a "managed schema" (the default behavior) then any value for this property
which does not match the effective managedSchemaResourceName will be read once,
backed up, and converted for managed schema use.


dataDir

The core's data directory (where indexes are stored) as either an absolute pathname, or a
path relative to the value of instanceDir. This is data by default.

configSet

The name of a defined configset, if desired, to use to configure the core 

properties

The name of the properties file for this core. The value can be an absolute pathname or a
path relative to the value of instanceDir

transient

If true, the core can be unloaded if Solr reaches the transientCacheSize. The default
if not specified is false. Cores are unloaded in order of least recently used first. 
Setting to true is not recommended in SolrCloud mode.

loadOnStartup

If true, the default if it is not specified, the core will loaded when Solr starts. Setting to fals
is not recommended in SolrCloud mode.

coreNodeName

Used only in SolrCloud, this is a unique identifier for the node hosting this replica. By
default a coreNodeName is generated automatically, but setting this attribute explicitly
allows you to manually assign a new core to replace an existing replica. For example:
when replacing a machine that has had a hardware failure by restoring from backups on a
new machine with a new hostname or port..


ulogDir

The absolute or relative directory for the update log for this core (SolrCloud)

shard

The shard to assign this core to (SolrCloud)

collection

The name of the collection this core is part of (SolrCloud).

roles

Future param for SolrCloud or a way for users to mark nodes for their own use
这个不太理解

Additional "user defined" properties may be specified for use as variables. For more information on how to define local properties, see the section Substituting Properties in Solr Config Files.

用户自定义属性???

CoreAdmin API  适用于单机版本

SolrCloud users should not typically use the CoreAdmin API directly

solrcloud模式通常不直接使用coreadmin api

the cores running in that node and is accessible at the /solr/admin/cores path.

HTTP requests that specify an "action" request parameter

All action names are uppercase, and are defined in depth in the sections below

STATUS
CREATE
RELOAD
RENAME
SWAP
UNLOAD
MERGEINDEXES
SPLIT
REQUESTSTATUS


STATUS

The STATUS action returns the status of all running Solr cores, or status for only the named core.

http://localhost:8983/solr/admin/cores?action=STATUS&core=core0

Input

core 指定core的名字

indexInfo  是否返回索引的信息 默认返回,当数量过多时,加快返回可以设置为false

CREATE

The CREATE action creates a new core and registers it.

If a Solr core with the given name already exists, it will continue to handle requests while the new core is initializing. When the new core is ready, it will take new requests and the old core will be unloaded.

创建一个已经存在的core,旧的core将被替换掉

http://localhost:8983/solr/admin/cores?action=CREATE&name=coreX&instanceDir=path/to/dir&config=config_file_name.xml&dataDir=data


Input

name
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值