Apache Solr背景知识

最新推荐文章于 2024-06-25 10:33:00 发布

caiqiiqi

最新推荐文章于 2024-06-25 10:33:00 发布

阅读量590

点赞数

分类专栏： java Web 安全

本文链接：https://blog.csdn.net/caiqiiqi/article/details/100123705

版权

安全同时被 3 个专栏收录

264 篇文章 9 订阅

订阅专栏

Web

220 篇文章 7 订阅

订阅专栏

java

213 篇文章 3 订阅

订阅专栏

Solr基本操作

启动：

.\solr.cmd start

关闭：

.\solr.cmd stop -all

开启cloud模式

bin/solr -e cloud

提示开启几个node，用于测试选1个即可。

附录

简介

Apache Solr是美国阿帕奇（Apache）软件基金会的一款基于Lucene（一款全文搜索引擎）的搜索服务器。该产品支持层面搜索、垂直搜索、高亮显示搜索结果等。
参考：http://www.cnnvd.org.cn/web/xxk/ldxqById.tag?CNNVD=CNNVD-201908-031

The entire configuration itself can be passed as a request parameter using the dataConfig parameter rather than using a file. When configuration errors are encountered, the error message is returned in XML format. Due to security concerns, this only works if you start Solr with -Denable.dih.dataConfigParam=true.

除了使用配置文件以外，整个配置本身可以使用dataConfig参数作为HTTP请求的参数的形式出现。如果出现了配置错误，错误信息会以XML的格式返回。由于安全考虑，此项功能只在启动Solr时加上-Denable.dih.dataConfigParam=true参数才可以实现。
在这里插入图片描述
来源：
https://github.com/apache/lucene-solr/blob/325824cd391c8e71f36f17d687f52344e50e9715/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc

The DataImportHandler contains several built-in transformers. You can also write your own custom transformers, as described in the DIHCustomTransformer section of the Solr Wiki. The ScriptTransformer (described below) offers an alternative method for writing your own transformers.

DataImportHandler有几个内置的transformers，也可以自己实现transformers。而ScriptTransformer提供了一种自己写transformers的替代方式。
https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html

The solrconfig.xml file is the configuration file with the most parameters affecting Solr itself.

参考：https://lucene.apache.org/solr/guide/6_6/configuring-solrconfig-xml.html

schema.xml is usually the first file you configure when setting up a new Solr installation.

参考：http://www.solrtutorial.com/schema-xml.html

一些概念的解释：

DataSource：

A data source specifies the origin of data and its type.

参考：
https://github.com/apache/lucene-solr/blob/325824cd391c8e71f36f17d687f52344e50e9715/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc
The types of data sources available are described below.

ContentStreamDataSource
FieldReaderDataSource
FileDataSource
JdbcDataSource
URLDataSource

Entity Processors

Entity processors extract data, transform it, and add it to a Solr index.

XPathEntityProcessor(This processor is used when indexing XML formatted data. The data source is typically URLDataSource or FileDataSource. )
The MailEntityProcessor(The MailEntityProcessor uses the Java Mail API to index email messages using the IMAP protocol. The MailEntityProcessor works by connecting to a specified mailbox using a username and password, fetching the email headers for each message, and then fetching the full email contents to construct a document (one document for each mail message).)
The TikaEntityProcessor(The TikaEntityProcessor uses Apache Tika to process incoming documents. This is similar to Uploading Data with Solr Cell using Apache Tika, but using DataImportHandler options instead.)

Transformers

Transformers manipulate the fields in a document returned by an entity. A transformer can create new fields or modify existing ones. You must tell the entity which transformers your import operation will be using, by adding an attribute containing a comma separated list to the element.

关键的是下面这个：

The ScriptTransformer

The script transformer allows arbitrary transformer functions to be written in any scripting language supported by Java, such as Javascript, JRuby, Jython, Groovy, or BeanShell. Javascript is integrated into Java by default; you’ll need to integrate other languages yourself.
Each function you write must accept a row variable (which corresponds to a Java Map<String,Object>, thus permitting get,put,remove operations). Thus you can modify the value of an existing field or add new fields. The return value of the function is the returned object.
The script is inserted into the DIH configuration file at the top level and is called once for each row.

脚本转换器允许使用Java支持的语言（比如Javascript, JRuby, Jython, Groovy, or BeanShell）编写任意转换函数，其中Javascript语言已默认集成在Java中了，使用其他语言的话需要自己整合。
每个转换函数都
必须接收一个row变量（与Java中的Map<String,Object>类型对应，所以是支持get、put、remove等操作的），函数的功能是修改已知field的值，或者添加新的fields。处理完之后，将row对象作为返回值返回。
这个脚本会以最高级别插入到DIH配置文件中，每个row会调用一次。

What is a row?

A row in DataImportHandler is a Map (Map<String, Object>). In the map , the key is the name of the field and the value can be anything which is a valid Solr type. The value can also be a Collection of the valid Solr types (this may get mapped to a multi-valued field). If the DataSource is RDBMS a query cannot emit a multivalued field. But it is possible to create a multivalued field by joining an entity with another.i.e if the sub-entity returns multiple rows for one row from parent entity it can go into a multivalued field. If the datasource is xml, it is possible to return a multivalued field.
参考：
https://cwiki.apache.org/confluence/display/solr/DataImportHandler#DataImportHandler-HttpDataSource#

Collections and Cores

The bin/solr script can also help you create new collections (in SolrCloud mode) or cores (in standalone mode), or delete collections.

Authentication

The bin/solr script allows enabling or disabling Basic Authentication, allowing you to configure authentication from the command line.

Currently, this script only enables Basic Authentication, and is only available when using SolrCloud mode.

参考：
https://lucene.apache.org/solr/guide/6_6/solr-control-script-reference.html

使用Solr管理UI

https://lucene.apache.org/solr/guide/6_6/using-the-solr-administration-user-interface.html

Solr Admin UI概览

Solr features a Web interface that makes it easy for Solr administrators and programmers to view Solr configuration details, run queries and analyze document fields in order to fine-tune a Solr configuration and access online documentation and other help.

solr配置dataimport方法

https://blog.csdn.net/u010367582/article/details/54095343
https://blog.csdn.net/ClementAD/article/details/47666043
https://blog.csdn.net/haiyanggeng/article/details/80562138

关注solr的bugs

https://issues.apache.org/jira/browse/SOLR-12985?jql=project%20%3D%20SOLR%20AND%20issuetype%20%3D%20Bug%20AND%20component%20%3D%20%22contrib%20-%20DataImportHandler%22

js调用java

https://www.ibm.com/support/knowledgecenter/en/SSHS8R_8.0.0/com.ibm.worklight.dev.doc/devref/t_calling_java_code_from_a_javas.html

Solr如何新建core/collection

参考：
https://lucene.apache.org/solr/guide/6_6/running-solr.html#RunningSolr-CreateaCore
可以使用这个命令来查看create core的时候有什么帮助信息：

bin/solr create -help

Create a core or collection depending on whether Solr is running in standalone (core) or SolrCloud
mode (collection).

如果是创建core，那就表明是standalone模式；
如果是创建collection，那就表明是SolrCloud模式。

使用命令行创建core之后，发现原来可以通过HTTP请求来创建：

http://localhost:8983/solr/admin/cores?action=CREATE&name=new_core&instanceDir=new_core

在这里插入图片描述
对于CVE-2019-1093漏洞而言，创建之后并没有具有DataImport功能的core，

直接使用自带的example：

bin/solr -e dih

删除core：

http://localhost:8983/solr/admin/cores?action=UNLOAD&core=new_core&deleteIndex=true&deleteDataDir=true&deleteInstanceDir=true

Solr使用SolrCloud模式

参考：
https://lucene.apache.org/solr/guide/7_4/solr-tutorial.html
使用7.4版本作为演示环境：
https://archive.apache.org/dist/lucene/solr/7.4.0/solr-7.4.0.zip

会新建两个node，这两个node默认情况会分别监听在8983端口和7574端口。
每个node会有单独的logs：
在这里插入图片描述

caiqiiqi

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Apache Solr背景知识

附录简介Apache Solr是美国阿帕奇（Apache）软件基金会的一款基于Lucene（一款全文搜索引擎）的搜索服务器。该产品支持层面搜索、垂直搜索、高亮显示搜索结果等。参考：http://www.cnnvd.org.cn/web/xxk/ldxqById.tag?CNNVD=CNNVD-201908-031The entire configuration itself can be...
复制链接

扫一扫

专栏目录