Solr基本操作
启动:
.\solr.cmd start
关闭:
.\solr.cmd stop -all
开启cloud模式
bin/solr -e cloud
提示开启几个node,用于测试选1个即可。
附录
简介
Apache Solr是美国阿帕奇(Apache)软件基金会的一款基于Lucene(一款全文搜索引擎)的搜索服务器。该产品支持层面搜索、垂直搜索、高亮显示搜索结果等。
参考:http://www.cnnvd.org.cn/web/xxk/ldxqById.tag?CNNVD=CNNVD-201908-031
The entire configuration itself can be passed as a request parameter using the dataConfig parameter rather than using a file. When configuration errors are encountered, the error message is returned in XML format. Due to security concerns, this only works if you start Solr with -Denable.dih.dataConfigParam=true.
除了使用配置文件以外,整个配置本身可以使用dataConfig参数作为HTTP请求的参数的形式出现。如果出现了配置错误,错误信息会以XML的格式返回。由于安全考虑,此项功能只在启动Solr时加上-Denable.dih.dataConfigParam=true
参数才可以实现。
来源:
https://github.com/apache/lucene-solr/blob/325824cd391c8e71f36f17d687f52344e50e9715/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc
The DataImportHandler contains several built-in transformers. You can also write your own custom transformers, as described in the DIHCustomTransformer section of the Solr Wiki. The ScriptTransformer (described below) offers an alternative method for writing your own transformers.
DataImportHandler有几个内置的transformers,也可以自己实现transformers。而ScriptTransformer提供了一种自己写transformers的替代方式。
https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html
The solrconfig.xml file is the configuration file with the most parameters affecting Solr itself.
参考:https://lucene.apache.org/solr/guide/6_6/configuring-solrconfig-xml.html
schema.xml is usually the first file you configure when setting up a new Solr installation.
参考:http://www.solrtutorial.com/schema-xml.html
一些概念的解释:
DataSource:
A data source specifies the origin of data and its type.
参考:
https://github.com/apache/lucene-solr/blob/325824cd391c8e71f36f17d687f52344e50e9715/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc
The types of data sources available are described below.
- ContentStreamDataSource
- FieldReaderDataSource
- FileDataSource
- JdbcDataSource
- URLDataSource
Entity Processors
Entity processors extract data, transform it, and add it to a Solr index.
- XPathEntityProcessor(This processor is used when indexing XML formatted data. The data source is typically URLDataSource or FileDataSource. )
- The MailEntityProcessor(The MailEntityProcessor uses the Java Mail API to index email messages using the IMAP protocol. The MailEntityProcessor works by connecting to a specified mailbox using a username and password, fetching the email headers for each message, and then fetching the full email contents to construct a document (one document for each mail message).)
- The TikaEntityProcessor(The TikaEntityProcessor uses Apache Tika to process incoming documents. This is similar to Uploading Data with Solr Cell using Apache Tika, but using DataImportHandler options instead.)
Transformers
Transformers manipulate the fields in a document returned by an entity. A transformer can create new fields or modify existing ones. You must tell the entity which transformers your import operation will be using, by adding an attribute containing a comma separated list to the element.
关键的是下面这个:
The ScriptTransformer
The script transformer allows arbitrary transformer functions to be written in any scripting language supported by Java, such as Javascript, JRuby, Jython, Groovy, or BeanShell. Javascript is integrated into Java by default; you’ll need to integrate other languages yourself.
Each function you write must accept a row variable (which corresponds to a Java Map<String,Object>, thus permitting get,put,remove operations). Thus you can modify the value of an existing field or add new fields. The return value of the function is the returned object.
The script is inserted into the DIH configuration file at the top level and is called once for each row.
脚本转换器允许使用Java支持的语言(比如Javascript, JRuby, Jython, Groovy, or BeanShell)编写任意转换函数,其中Javascript语言已默认集成在Java中了,使用其他语言的话需要自己整合。
每个转换函数都
必须接收一个row
变量(与Java中的Map<String,Object>类型对应,所以是支持get、put、remove等操作的),函数的功能是修改已知field的值,或者添加新的fields。处理完之后,将row对象作为返回值返回。
这个脚本会以最高级别插入到DIH配置文件中,每个row会调用一次。
What is a row?
A row in DataImportHandler is a Map (Map<String, Object>). In the map , the key is the name of the field and the value can be anything which is a valid Solr type. The value can also be a Collection of the valid Solr types (this may get mapped to a multi-valued field). If the DataSource is RDBMS a query cannot emit a multivalued field. But it is possible to create a multivalued field by joining an entity with another.i.e if the sub-entity returns multiple rows for one row from parent entity it can go into a multivalued field. If the datasource is xml, it is possible to return a multivalued field.
参考:
https://cwiki.apache.org/confluence/display/solr/DataImportHandler#DataImportHandler-HttpDataSource#
Collections and Cores
The bin/solr script can also help you create new collections (in SolrCloud mode) or cores (in standalone mode), or delete collections.
Authentication
The bin/solr script allows enabling or disabling Basic Authentication, allowing you to configure authentication from the command line.
Currently, this script only enables Basic Authentication, and is only available when using SolrCloud mode.
参考:
https://lucene.apache.org/solr/guide/6_6/solr-control-script-reference.html
使用Solr管理UI
https://lucene.apache.org/solr/guide/6_6/using-the-solr-administration-user-interface.html
Solr Admin UI概览
Solr features a Web interface that makes it easy for Solr administrators and programmers to view Solr configuration details, run queries and analyze document fields in order to fine-tune a Solr configuration and access online documentation and other help.
solr配置dataimport方法
https://blog.csdn.net/u010367582/article/details/54095343
https://blog.csdn.net/ClementAD/article/details/47666043
https://blog.csdn.net/haiyanggeng/article/details/80562138
关注solr的bugs
https://issues.apache.org/jira/browse/SOLR-12985?jql=project%20%3D%20SOLR%20AND%20issuetype%20%3D%20Bug%20AND%20component%20%3D%20%22contrib%20-%20DataImportHandler%22
js调用java
https://www.ibm.com/support/knowledgecenter/en/SSHS8R_8.0.0/com.ibm.worklight.dev.doc/devref/t_calling_java_code_from_a_javas.html
Solr如何新建core/collection
参考:
https://lucene.apache.org/solr/guide/6_6/running-solr.html#RunningSolr-CreateaCore
可以使用这个命令来查看create core的时候有什么帮助信息:
bin/solr create -help
Create a core or collection depending on whether Solr is running in standalone (core) or SolrCloud
mode (collection).
如果是创建core,那就表明是standalone
模式;
如果是创建collection,那就表明是SolrCloud
模式。
使用命令行创建core之后,发现原来可以通过HTTP请求来创建:
http://localhost:8983/solr/admin/cores?action=CREATE&name=new_core&instanceDir=new_core
对于CVE-2019-1093
漏洞而言,创建之后并没有具有DataImport功能的core,
直接使用自带的example:
bin/solr -e dih
删除core:
http://localhost:8983/solr/admin/cores?action=UNLOAD&core=new_core&deleteIndex=true&deleteDataDir=true&deleteInstanceDir=true
Solr使用SolrCloud模式
参考:
https://lucene.apache.org/solr/guide/7_4/solr-tutorial.html
使用7.4版本作为演示环境:
https://archive.apache.org/dist/lucene/solr/7.4.0/solr-7.4.0.zip
会新建两个node,这两个node默认情况会分别监听在8983端口和7574端口。
每个node会有单独的logs: