将Solarium与SOLR一起使用进行搜索-高级

最新推荐文章于 2024-09-13 19:41:31 发布

culi3182

最新推荐文章于 2024-09-13 19:41:31 发布

阅读量235

点赞数

文章标签： python java 大数据 php linux

原文链接：https://www.sitepoint.com/using-solarium-solr-search-advanced/

版权

This is the fourth and final part of a series on using Apache’s SOLR search implementation along with Solarium, a PHP library to integrate it into your application as if it were native.

这是将Apache的SOLR搜索实现与Solarium结合使用的系列教程的第四部分，也是最后一部分，Solarium是一个PHP库，可以将它集成到您的应用程序中，就像原生一样。

In the first three parts we installed and configured SOLR and Solarium and started building an example application for searching movies. We’ve also looked at faceted search.

在前三部分中，我们安装并配置了SOLR和Solarium，并开始构建用于搜索电影的示例应用程序。我们还研究了多面搜索。

We’re going to wrap up the series by looking at some more advanced features of SOLR, and how to use them with Solarium.

我们将通过研究SOLR的一些更高级的功能以及如何在Solarium中使用它们来结束本系列。

用SOLR突出显示结果 (Highlighting Results with SOLR)

The Highlighting component allows you to highlight the parts of a document which have matched your search. Its behavior around what gets shown depends on the field – if it’s a title chances are it’ll show it in its entirety with the matched words present, and longer fields – such as a synopsis or the body of an article – it will highlight the words but using snippets; much like Google’s search results do.

突出显示组件使您可以突出显示与搜索匹配的文档部分。其围绕所显示内容的行为取决于字段-如果是标题，则将完整显示它并显示匹配的单词，而更长的字段(例如，提要或文章的正文)将突出显示该字段文字，但使用摘要；就像Google的搜索结果一样。

To set up highlighting, you first need to specify the fields to include. Then, you can set a prefix and corresponding postfix for the highlighted words or phrases. So for example, to make highlighted words and phrases bold:

要设置突出显示，首先需要指定要包括的字段。然后，您可以为突出显示的单词或短语设置前缀和相应的后缀。因此，例如，使突出显示的单词和短语变为粗体：

$hl = $query->getHighlighting();
$hl->setFields(array('title', 'synopsis'));
$hl->setSimplePrefix('<strong>');
$hl->setSimplePostfix('</strong>');

Alternatively, to add a background color:

或者，添加背景色：

$hl = $query->getHighlighting();
$hl->setFields(array('title', 'synopsis'));
$hl->setSimplePrefix('<span style="background:yellow;">');
$hl->setSimplePostfix('</span>');

Or you can even use per-field settings:

或者甚至可以使用每个字段的设置：

$hl = $query->getHighlighting();
$hl->getField('title')->setSimplePrefix('<strong>')->setSimplePostfix('</strong>');
$hl->getField('synopsis')->setSimplePrefix('<span style="background:yellow;">')->setSimplePostfix('</span>');

Once you’ve configured the highlighting component in your search implementation, there’s a little more work to do involved in displaying it in your search results view.

在搜索实现中配置突出显示组件后，需要做更多的工作才能将其显示在搜索结果视图中。

First, you need to extract the highlighted document from the highlighting component by ID:

首先，您需要通过ID从突出显示组件中提取突出显示的文档：

$highlightedDoc = $highlighting->getResult($document->id);

Now, you can access all the highlighted fields by iterating through them, as properties of the highlighted document:

现在，您可以遍历所有突出显示的字段，作为突出显示文档的属性来访问它们：

if($highlightedDoc){
    foreach($highlightedDoc as $field => $highlight) {
        echo implode(' (...) ', $highlight) . '<br/>';
    }
}

Or, you can use getField():

或者，您可以使用getField() ：

if($highlightedDoc){
    $highlightedTitle = $highlightedDoc->getField('title');
}

Highlighted fields don’t simply return text, however Instead, they’ll return an array of “snippets” of text. If there are no matches for that particular field – for example if your search matched on title but not synopsis – then that array will be empty.

高亮显示的字段不只是返回文本，而是返回文本“片段”的数组。如果该特定字段没有匹配项-例如，如果您的搜索匹配标题，但没有大纲，则该数组为空。

The code above will return a maximum of one snippet. To change this behavior, you can use the setSnippets() method:

上面的代码最多返回一个代码段。若要更改此行为，可以使用setSnippets()方法：

$hl = $query->getHighlighting();
$hl->setSnippets(5);
// . . . as before . . .

For example, suppose you search for the word “star”. One of the results has a synopsis that reads as follows:

例如，假设您搜索单词“ star”。结果之一具有一个摘要，内容如下：

This not to be missed movie theater event will feature one of the most memorable moments in TV history and exclusive clips about the making of The Best of Both Worlds and Star Trek: The Next Generation Season 3. Set in the 24th century, The Next Generation was created by Gene Roddenberry over 20 years after the original Star Trek series. The Next Generation became the longest running series of the Star Trek franchise, consisting of 178 episodes over 7 seasons. Star Trek: The Next Generation – The Best of Both Worlds is the first opportunity to see The Best of Both Worlds, one of the greatest TV episodes of all time, as a gloriously remastered full-length feature in select movie theaters nationwide.

这场不容错过的电影院活动将以电视史上最令人难忘的时刻为特色，并独家制作有关“两全其美”和“星际迷航：下一代”第三季的制作短片。背景设定于24世纪，下一代由Gene Roddenberry在原始《星际迷航》系列推出20多年后创立。下一代成为《星际迷航》系列中运行时间最长的系列，由7个赛季的178集组成。《星际迷航：下一代–两全其美》是第一个机会，有史以来最伟大的电视节目之一，即《两全其美》，是在美国部分电影院精心制作的完整长片。

The highlighted document’s synopsis array will contain three items:

突出显示的文档的摘要数组将包含三个项目：

history and exclusive clips about the making of The Best of Both Worlds and Star Trek: The Next Generation
历史和有关制作“两全其美”和“ 星际迷航：下一代”的独家剪辑
after the original Star Trek series. The Next Generation became the longest running series of the Star
在原始《星际迷航》系列之后。下一代成为《星际争霸》中运行时间最长的系列
Trek franchise, consisting of 178 episodes over 7 seasons. Star Trek: The Next Generation – The Best of
跋涉专营权，包括7个赛季的178集。星际迷航：下一代–最佳

One way to display multiple snippets is to implode them, for example:

显示多个摘要的一种方法是将其内implode ，例如：

implode(' ... ', $highlightedDoc->getField('synopsis'))

This results in the following:

结果如下：

history and exclusive clips about the making of The Best of Both Worlds and Star Trek: The Next Generation … after the original Star Trek series. The Next Generation became the longest running series of the Star … Trek franchise, consisting of 178 episodes over 7 seasons. Star Trek: The Next Generation – The Best of

历史和有关“两全其美”和“ 星际迷航：下一代”制作过程的独家剪辑……是在最初的《星际迷航》系列之后。下一代成为《星际迷航》系列中运行时间最长的系列，由7个赛季的178集组成。星际迷航：下一代–最佳

There are a number of other parameters you can use to modify the behavior of the highlighting component, which are explained here.

您可以使用许多其他参数来修改突出显示组件的行为，在此进行了说明。

将精彩片段集成到我们的电影搜索中 (Integrating Highlighting into Our Movie Search)

Now that we’ve covered how to use highlighting, integrating it into our movie search application should be straightforward.

既然我们已经介绍了如何使用突出显示，将突出显示集成到我们的电影搜索应用程序中应该很简单。

The first thing to do is modify app/controllers/HomeController.php by adding the following, just before we run the search:

首先要做的是在运行搜索之前通过添加以下内容来修改app/controllers/HomeController.php ：

// Get highlighting component, and apply settings
$hl = $query->getHighlighting();
$hl->setSnippets(5);
$hl->setFields(array('title', 'synopsis'));

$hl->setSimplePrefix('<span style="background:yellow;">');
$hl->setSimplePostfix('</span>');

// Execute the query and return the result
$resultset = $this->client->select($query);

Then the search results – which you’ll remember are in app/views/home/index.blade.php – become:

然后，您会记得在app/views/home/index.blade.php的搜索结果变为：

@if (isset($resultset))    
<header>
    <p>Your search yielded <strong>{{ $resultset->getNumFound() }}</strong> results:</p>
    <hr />
</header>

@foreach ($resultset as $document)

    <?php $highlightedDoc = $highlighting->getResult($document->id); ?>

    <h3>{{ (count($highlightedDoc->getField('title'))) ? implode(' ... ', $highlightedDoc->getField('title')) : $document->title }}</h3>

    <dl>
        <dt>Year</dt>
        <dd>{{ $document->year }}</dd>

        @if (is_array($document->cast))
        <dt>Cast</dt>
        <dd>{{ implode(', ', $document->cast) }}</dd>              
        @endif

    </dl>

    {{ (count($highlightedDoc->getField('synopsis'))) ? implode(' ... ', $highlightedDoc->getField('synopsis')) : $document->synopsis }}

@endforeach
@endif

Notice how each search result essentially mixes and matches fields between the search result document, and the highlighted document – the latter is effectively a subset of the former. Depending on your schema, you may have all your fields available in the highlighted version.

请注意，每个搜索结果实际上是如何混合并匹配搜索结果文档和突出显示的文档之间的字段的-后者实际上是前者的子集。根据您的架构，您可能在突出显示的版本中拥有所有字段。

建议者–添加自动完成功能 (Suggester – Adding Autocomplete)

The Suggester component is used to suggest query terms based on incomplete query input. Essentially it examines the index on a given field and extracts search terms which match a certain pattern. You can then order those suggestions by frequency to increase the relevance of the search.

“建议者”组件用于基于不完整的查询输入来建议查询词。本质上，它检查给定字段上的索引并提取与特定模式匹配的搜索词。然后，您可以按频率排序这些建议，以增加搜索的相关性。

To set up the suggester, we need to configure it in your solrconfig.xml file. Open it up place the following snippet of XML somewhere near the other <searchComponent> declarations:

要设置建议程序，我们需要在您的solrconfig.xml文件中对其进行solrconfig.xml 。打开它，将以下XML片段放在其他<searchComponent>声明附近：

<searchComponent class="solr.SpellCheckComponent" name="suggest">
    <lst name="spellchecker">
        <str name="name">suggest</str>
        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
        <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory</str>
        <str name="field">title</str>  <!-- the indexed field to derive suggestions from -->
        <float name="threshold">0.005</float>
        <str name="buildOnCommit">true</str>
    </lst>
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
    <lst name="defaults">
        <str name="spellcheck">true</str>
        <str name="spellcheck.dictionary">suggest</str>
        <str name="spellcheck.onlyMorePopular">true</str>
        <str name="spellcheck.count">5</str>
        <str name="spellcheck.collate">true</str>
    </lst>
    <arr name="components">
        <str>suggest</str>
    </arr>
</requestHandler>

You’ll notice a number of references to “spellcheck”, but this is simply because the Suggester component reuses much of that functionality internally.

您会注意到许多对“拼写检查”的引用，但这仅仅是因为“建议者”组件在内部重用了许多功能。

The important bit to notice is the <str name="field"> item, which tells the component that we want to use the title field on which to base our suggestions.

需要注意的重要一点是<str name="field">项，它告诉组件我们要使用标题字段作为建议的基础。

Restart SOLR, and you can now try running a suggest query through your web browser:

重新启动SOLR，现在您可以尝试通过网络浏览器运行建议查询：

`http://localhost:8983/solr/suggest?q=ho`

(You may need to alter the port number, depending on how you set up SOLR)

(您可能需要更改端口号，具体取决于您如何设置SOLR)

The output should look a little like this:

输出应该看起来像这样：

<?xml version="1.0" encoding="UTF-8"?>
<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">0</int>
    </lst>
    <lst name="spellcheck">
        <lst name="suggestions">
            <lst name="ho">
                <int name="numFound">4</int>
                <int name="startOffset">0</int>
                <int name="endOffset">2</int>
                <arr name="suggestion">
                    <str>house</str>
                    <str>houses</str>
                    <str>horror</str>
                    <str>home</str>
                </arr>
            </lst>
            <str name="collation">house</str>
        </lst>
    </lst>
</response>

As you can see, SOLR has returned four possible matches for “ho” – *ho**use, **ho**uses, **ho**rror and **ho**me. Despite *home and horror being before house in the alphabet, house appears first by virtue of being one of the most common search terms in our index.

如您所见，SOLR已针对“ ho”返回了四个可能的匹配-“ ho” 使用， “ house”，“ horror”和“ home” 。 尽管*家庭和恐怖是字母表中的房子前，房子由于是在我们的索引中最常见的搜寻字词的第一次出现。

Let’s use this component to create an autocomplete for our search box, which will suggest common search terms as the user types their query.

让我们使用此组件为搜索框创建一个自动完成功能，当用户键入他们的查询时，它将建议常见的搜索词。

First, define the route:

首先，定义路线：

public function getAutocomplete()
{
    // get a suggester query instance
    $query = $client->createSuggester();
    $query->setQuery(Input::get('term'));
    $query->setDictionary('suggest');
    $query->setOnlyMorePopular(true);
    $query->setCount(10);
    $query->setCollate(true);

    // this executes the query and returns the result
    $resultset = $client->suggester($query);

    $suggestions = array();

    foreach ($resultset as $term => $termResult) {
        foreach ($termResult as $result) {
            $suggestions[] = $result;
        }
    }

    return Response::json($suggestions);
}

Include JQuery UI (and JQuery itself) in your layout:

在布局中包括JQuery UI(和JQuery本身)：

<script src="//code.jquery.com/jquery-1.11.0.min.js"></script>
<script src="//code.jquery.com/ui/1.10.4/jquery-ui.min.js"></script>

Include a JQuery UI theme:

包括一个JQuery UI主题：

<link rel="stylesheet" type="text/css" href="//code.jquery.com/ui/1.10.4/themes/redmond/jquery-ui.css">

And finally, add some JS to initialize the autocomplete:

最后，添加一些JS来初始化自动完成功能：

$(function () {
    $('input[name="q"]').autocomplete({
        source: '/autocomplete',
        minLength: 2
    });
});

That’s all there is to it – try it out by running a few searches.

这就是全部–通过运行一些搜索来尝试一下。

基于阵列的配置 (Array-based Configuration)

If you prefer, you can use an array to set up your query – for example:

如果愿意，可以使用数组来设置查询-例如：

$select = array(
  'query'         => Input::get('q'),
  'query_fields'  => array('title', 'cast', 'synopsis'),
  'start'         => 0,
  'rows'          => 100,
  'fields'        => array('*', 'id', 'title', 'synopsis', 'cast', 'score'),      
  'sort'          => array('year' => 'asc'),      
  'filterquery' => array(
      'maxprice' => array(
          'year' => 'year:[1990 TO 1990]'
      ),
  ),    
  'component' => array(
    'facetset' => array(
      'facet' => array(        
        array('type' => 'field', 'key' => 'rating', 'field' => 'rating'),
      )
    ),
  ),
);

$query = $this->client->createSelect($select);

添加其他核心 (Adding Additional Cores)

At startup, SOLR traverses the specified home directory looking for cores, which it identifies when it locates a file called core.propeties. So far we’ve used a core called collection1, and you’ll see that it contains three key items:

在启动时，SOLR会遍历指定的主目录以查找核心，并在找到名为core.propeties的文件时识别该核心。到目前为止，我们已经使用了一个名为collection1的核心，您将看到它包含三个关键项：

The core.propertes file. At its most basic, it simply contains the name of the instance.

core.propertes文件。从最基本的角度来看，它仅包含实例的名称。

The conf directory contains the configuration files for the instance. As a minimum, this directory must contain a schema.xml and an solrconfig.xml file.

conf目录包含实例的配置文件。至少，此目录必须包含schema.xml和solrconfig.xml文件。

The data directory holds the indexes. The location of this directory can be overridden, and if it doesn’t exist it’ll be created for you.

data目录包含索引。该目录的位置可以被覆盖，如果不存在，它将为您创建。

So, to create a new instance follow these steps:

因此，要创建一个新实例，请按照下列步骤操作：

Create a new directory in your home directory – movies in the example application
在主目录中创建一个新目录–示例应用程序中的movies
Create a conf directory in that
在其中创建一个conf目录
Create or copy a schema.xml file and solrconfig.xml file in the conf directory, and customize accordingly
在conf目录中创建或复制schema.xml文件和solrconfig.xml文件，并进行相应的自定义
Create a text file called core.properties in the home directory, with the following contents:
在主目录中创建一个名为core.properties的文本文件，其内容如下：

name=instancename

…where instancename is the name of your new directory.

…其中instancename是新目录的名称。

Note that the schema.xml configuration that ships in the examples directory contains references to a number of text files – for example stopwords.txt, protwords.txt etc – which you may need to copy over as well.

需要注意的是schema.xml中配置产品的examples目录中包含了一些文本文件的引用-例如stopwords.txt ， protwords.txt等等-你可能需要以及复制过来。

Then restart SOLR.

然后重新启动SOLR。

You can also add a new core via the administrative web interface in your web browser – click Core Admin on the left hand side, then Add Core.

您也可以通过Web浏览器中的管理Web界面添加新的核心-单击左侧的Core Admin，然后单击Add Core 。

附加配置 (Additional Configuration)

There are a few additional configuration files worth a mention.

还有一些其他配置文件值得一提。

The stopwords.txt file – or more specifically, the language-specific files such as lang/stopwords_en.txt – contain words which should be ignored by the search indexer, such as “a”, “the” and “at”. In most cases, you probably won’t need to modify this file.

stopwords.txt文件(或更具体地说，是特定于语言的文件，例如lang/stopwords_en.txt )包含搜索索引器应忽略的单词，例如“ a”，“ the”和“ at”。在大多数情况下，您可能不需要修改此文件。

Depending on your application, you may find that you need to add words to protwords.txt. This file contains a list of protected words that aren’t “stemmed” – that is, reduced to their basic form; for example “asked” becomes “ask”, “working” becomes “work”. Sometimes stemming attempts to “correct” words, perhaps removing what it thinks are erroneous letters of numbers at the end. You might be dealing with geographical areas and find that “Maine” is stemmed to “maine”.

根据您的应用程序，您可能会发现需要在protwords.txt添加单词。该文件包含未“被阻止”的受保护单词的列表-即简化为基本形式；例如，“询问”变为“询问”，“工作”变为“工作”。有时会尝试“纠正”单词，也许会删除其认为是错误的数字字母的结尾。您可能正在处理地理区域，并且发现“缅因州”源自“主要”。

You can specify synonyms – words with the same meaning – in synonyms.txt. Separate synonyms with commas on a per-line basis. For example:

您可以在synonyms.txt指定同义词(具有相同含义的单词)。在每行的基础上用逗号分隔同义词。例如：

GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
Television, Televisions, TV, TVs

You may also use synoyms.txt to help correct common spelling mistakes using synonym mappings, for example:

您也可以使用synoyms.txt通过同义词映射帮助纠正常见的拼写错误，例如：

assassination => assasination
environment => enviroment

If you’re using currency fields, you may wish to update and keep an eye on currency.xml, which specifies some example exchange rates – which of course are highly volatile.

如果您使用的是货币字段，则可能希望更新并留意currency.xml ，它指定了一些示例汇率-当然，汇率波动很大。

摘要 (Summary)

In this series we’ve looked at Apache’s SOLR implementation for search, and used the PHP Solarium library to interact with it. We’ve installed and configured SOLR along with an example schema, and built an application designed to search a set of movies, which demonstrates a number of features of SOLR. We’ve looked at faceted search, highlighting results and the DisMax component. Hopefully this will give you enough of a grounding to adapt it to use SOLR for search in your applications.

在本系列文章中，我们研究了Apache的SOLR实现搜索，并使用PHP Solarium库与之交互。我们已经安装和配置了SOLR以及示例模式，并构建了一个旨在搜索电影的应用程序，该应用程序演示了SOLR的许多功能。我们研究了分面搜索，突出显示结果和DisMax组件。希望这将为您提供足够的基础，使其适合在应用程序中使用SOLR进行搜索。

For further reading, you may wish to download the SOLR reference guide as a PDF, or consult the Solarium documentation.

为了进一步阅读，您可能希望以PDF格式下载SOLR参考指南，或者查阅Solarium文档。