Django—全文检索功能

最新推荐文章于 2021-06-19 17:49:50 发布

???111

最新推荐文章于 2021-06-19 17:49:50 发布

阅读量140

点赞数

文章标签： python c/c++

原文链接：http://www.cnblogs.com/wuwui/p/11174702.html

版权

前言

Whoosh:搜索引擎 jieba:分词器 django-heystack:支持引擎的第三方app

准备

Pip3 install whoosh
Pip3 install jieba
Pip3 install django-haystack

配置

将 haystack 加入 INSTALLED_APP中：

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    #其它app ...
    'search_liu',
    'haystack',
]

再加入如下配置：

project/settings.py

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'search_liu.whoosh_cn_backend.WhooshEngine', #使用whoosh搜索引擎
        'PATH': os.path.join(BASE_DIR, 'whooshindex'),
    },
}
HAYSTACK_SEARCH_RESULTS_PER_PAGE = 10  #每十项结果为一页
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'

'ENGINE': 'search_liu.whoosh_cn_backend.WhooshEngine' 虽然目前这个引擎还不存在，但我们接下来会创建它。

'PATH' 索引文件需要存放的位置，我们设置为项目根目录 BASE_DIR 下的 whoosh_index 文件夹（在建立索引是会自动创建）。

配置建立索引文件

在app下建立 search_indexes.py 文件并写上如下代码：

class newsIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)
    def get_model(self):
        return news def index_queryset(self, using=None): return self.get_model().objects.filter(newsState=2) #限制搜索条件

因为我要检索多张表，所以我选择在search这个app下的 search_indexes.py 写了三个表名+index类，之后就会同时对这三个表建立索引文件。

然后在 templates/search/indexes/youapp/\<model_name>_text.txt 中写下需要检索的字段，多张表就有多个txt文件。

{{ object.title }}
{{ object.mainBody }}

修改搜索引擎为中文分词

在 search app 下建立 ChineseAnalyser.py 文件，写下如下的代码：

import jieba
from whoosh.analysis import Tokenizer, Token
class ChineseTokenizer(Tokenizer):
    def __call__(self, value, positions=False, chars=False,
                 keeporiginal=False, removestops=True,
                 start_pos=0, start_char=0, mode='', **kwargs):
        t = Token(positions, chars, removestops=removestops, mode=mode,
                  **kwargs)
        seglist = jieba.cut_for_search(value)  
        for w in seglist:
            t.original = t.text = w
            t.boost = 1.0
            if positions:
                t.pos = start_pos + value.find(w)
            if chars:
                t.startchar = start_char + value.find(w)
                t.endchar = start_char + value.find(w) + len(w)
            yield t

def chinese_analyzer():
    return ChineseTokenizer()

在 python 下的 Lib\site-packages\haystack\backends 目录中找到 whoosh_backend.py 文件 复制到 search app 下，并改名为 whoosh_cn_backend.py

在其中加入

from search import ChineseAnalyser

并找到语句并做修改如下：

schema_fields[field_class.index_fieldname] = TEXT(stored=True, analyzer=ChineseAnalyser.chinese_analyzer(), field_boost=field_class.boost, sortable=True)

最后运行命令：python3 manage.py rebuild_index 就可以建立索引文件了。

创建搜索表单

<div class="input-group" style="width:370px">
                <div style="float:right">
                    <form action="" id="search_form" method="get"  onsubmit='return sub_search_form()'>
                        <!--不要改name='q'-->
                        <input type="text" class="form-control" style="width:229px;float:left;" name="q" placeholder="&nbsp;&nbsp;请输入关键字">
                        <span class="input-group-btn" >
                            <button class="btn btn-info" id="search" style="width:60px;height:34px;background-color:purple;border-color:purple" type="submit"><i class="glyphicon glyphicon-search"></i></button>
                        </span>
                    </form>
                </div>
                <!--不要把select标签放进form表单中-->
                <select id="option" class="form-control" style="height:32px;width:77px;">
                    <option value="0">全部</option>
                    <option value="1">新闻</option>
                    <option value="2">公告</option>
                    <option value="3">论文</option>
                </select>
                <!--不要把select标签放进form表单中-->
            </div>

后台函数处理

以上表单通过 js 向后台发起请求，相关js 如下：

function sub_search_form(){
    //1：新闻  2：公告  3：论文
      var obj = document.getElementById('option');
      var form = document.getElementById('search_form');
      var value = obj.value;
      //alert(value)
        switch (value){
            case '0': form.action = '/search/';
                break;
            case '1': form.action = '/search/news/';
                break;
            case '2': form.action = '/search/announcement/';
                break;
            case '3': form.action = '/search/thesis_information/';
                break;
            default:break;
        }
}

search/views.py 内容如下：

from haystack.generic_views import SearchView
from haystack.query import SearchQuerySet
from web.models import news, announcement, thesis_information
model_map ={'news': news, 'announcement': announcement, 'thesis_information': thesis_information}\

class VisitorSearchView(SearchView):
    def get_queryset(self):
        queryset = super(VisitorSearchView, self).get_queryset()
        self.context_object_name = 'search_list'
        # 获取model名
        model_name = self.kwargs.get('model')

        #如果分表查询
        if model_name:
            model = model_map[model_name]
            queryset = SearchQuerySet().models(model)
            if model_name == 'thesis_information':
                self.context_object_name = 'search_thesis_list'
        #不分表查询
        else:
            self.template_name = 'search/search_all.html'
        return queryset

search/urls.py

from django.urls import path
from search.views import VisitorSearchView
urlpatterns = [
    path('<str:model>/', VisitorSearchView.as_view()),
    path('', VisitorSearchView.as_view()),
]