在django-rest-framework 里使用全文搜索框架 haystack 和 drf_haystack

最新推荐文章于 2024-05-26 09:54:12 发布

smartwu_sir

最新推荐文章于 2024-05-26 09:54:12 发布

阅读量2.9k

点赞数

分类专栏：探索Django python 文章标签： haystack

本文链接：https://blog.csdn.net/smartwu_sir/article/details/80209907

版权

python 同时被 2 个专栏收录

41 篇文章 0 订阅

订阅专栏

探索Django

39 篇文章 0 订阅

订阅专栏

在django-rest-framework 里使用全文搜索框架 haystack 和 drf_haystack

1. 准备工作（不管是django还是django-rest-framework都是一样的准备工作）

下载安装

pip install whoosh
pip install jieba
pip install django-haystack
pip install drf_haystack

项目中配置

# 在INSTALL_APPS里加上 haystack (加在最后)
INSTALLED_APPS = [
    ...
    'haystack',
    ...
]
# 增加搜索引擎配置（有solr，whoosh，elastic search）,这里选择whoosh
HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.whoosh_backend.WhooshEngine',
        'PATH': os.path.join(os.path.dirname(__file__), 'whoosh_index'),
    },
}
# 配置自动更新索引
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
# 配置搜索结果的分页器的每页数量（rest-framework里面用不到）
# HAYSTACK_SEARCH_RESULTS_PER_PAGE = 10

2. 使用drf_haystack

指明要索引的字段

# 创建文件 templagtes/search/indexes/myapp/note_text.txt
# myapp是你要建立的索引的app，note是你要建立索引的那个模型名（小写）
{{ object.title }}
{{ object.content }}
# 意思是将 title 和 content 两个字段添加到索引

基本使用（主要是这四个文件）

# community/search_indexes.py
# 先在search_indexes.py中写一个searchIndex
from haystack import indexes
from community.models import Topic

class TopicIndex(indexes.SearchIndex, indexes.Indexable):
    """topic index"""
    text = indexes.CharField(document=True, use_template=True)
    content = indexes.CharField(model_attr="content")
    # 下面这俩字段并没有在上面加索引，写他俩是因为后面的过滤filter和排序order_by用到
    # 注意：这里修改的话一定要重新建立索引，才能生效。python manage.py rebuild_index
    date_added = indexes.DateTimeField(model_attr="date_added")
    type = indexes.IntegerField(model_attr="type")

    def get_model(self):
        return Topic

    def index_queryset(self, using=None):
        return self.get_model().objects.all()

# community/serializers.py
# 在serializers中写searchSerializer
from drf_haystack import serializers as HSER
from community.search_indexes import TopicIndex, ReviewIndex


class TopicIndexSer(HSER.HaystackSerializer):

    def to_representation(self, instance):
        # 注意这里的 instance.object 才是搜到的那个对象
        # (如果view里面的queryset是和rest-framework里的格式相同的话，instance才是搜到的那个对象)
        ret = super(TopicIndexSer, self).to_representation(instance)
        if isinstance(instance.object, Topic):
            ret["data"] = TopicSerializer(instance=instance.object).data
        else:
            ret["data"] = ReviewListSerializer(instance=instance.object).data
        return ret

    class Meta:
        # 这里用到了上面的SearchIndex
        index_classes = [TopicIndex]
        fields = ["content", "date_added", "type"]
        # 这里可以写ignore_fields来忽略搜索那个字段

# community/views.py
# 在views.py中写类视图
from drf_haystack.viewsets import HaystackViewSet
from community.serializers import TopicIndexSer
from community.utils import MyPagination
from drf_haystack.filters import HaystackFilter, BaseHaystackFilterBackend


class ContentSearchViewSet(HaystackViewSet):
    # 这是自己根据 PageNumberPagination 写的分页类，照样适用
    pagination_class = MyPagination
    # 这里可以写多个模型，相应的：serializer里也可以写多个index_classes
    index_models = [Topic]
    serializer_class = TopicIndexSer
    # 这时filter，这里用到了type
    filter_backends = [HaystackFilter]
    filter_fields = ("type",)

    def get_queryset(self, index_models=[]):
        queryset = self.object_class()._clone()
        # 这时改写的get_queryset函数，用到了date_added
        # 如果上面没有把date_added和type加进去，这里是不能使用的
        queryset = queryset.models(*self.get_index_models()).order_by("-date_added")
        # queryset = queryset.models(*self.index_models).order_by("-date_added")
        return queryset

    def get_index_models(self):
        # 这是自己写的传入一个model参数，可以过滤掉不同的模型，配合上面的queryset使用
        model = self.request.query_params.get("model", None)
        di = {
            None: self.index_models,
            "topic": [Topic],
            "review": [Review]
        }
        return di.get(model, self.index_models)

# community/urls.py
# 配置url
from rest_framework.router import DefaultRouter
from community.views import ContentSearchViewSet

router = DefaultRouter()
router.register(r"search", ContentSearchViewSet, base_name="search")
urlpatterns += router.urls

3. 配置完成，测试一下（亲测可行）

先生成索引： python manage.py rebuild_index
127.0.0.1:8000/community/search?content__contains=你好
127.0.0.1:8000/community/search?content__contains=你好&page=2
127.0.0.1:8000/community/search?content__contains=你好&type=2
127.0.0.1:8000/community/search?content__contains=你好&model=topic&page=2

注：不止是contains，not__contains,startswith,endswith,=,同样适用,page还有page_size选项

4. 不使用whoosh的分词器，使用jieba分词

将 haystack.backends.whoosh_backend 文件复制出来，找一个地方放,改名为 whoosh_cn_backend
from jieba.analyse import ChineseAnalyzer
把所有的 StemmingAnalyzer 改为 ChineseAnalyser
把 HAYSTACK_CONNECTIONS 里面的路径改一下即可。

我的改完之后是

HAYSTACK_CONNECTIONS = {
    'default': {
        # 'ENGINE': 'haystack.backends.whoosh_backend.WhooshEngine',
        'ENGINE': 'extra_apps.whoosh_cn_backend.WhooshEngine',
        'PATH': os.path.join(os.path.dirname(__file__), 'whoosh_index'),
    },
}

注：我这里没有涉及search模板的使用，和高亮语法的使用

smartwu_sir

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
1
评论
在django-rest-framework 里使用全文搜索框架 haystack 和 drf_haystack

在django-rest-framework 里使用全文搜索框架 haystack 和 drf_haystack参考http://drf-haystack.readthedocs.io/en/latest/01_intro.html参考https://blog.csdn.net/ac_hell/article/details/528759271. 准备工作（不管是django还是...
复制链接

扫一扫