Django博客搭建-新闻模块6-新闻搜索功能（Django+Haystack+elasticsearch）

最新推荐文章于 2022-05-05 06:38:44 发布

68岁爱用飘柔

最新推荐文章于 2022-05-05 06:38:44 发布

阅读量606

点赞数 3

分类专栏： # Django博客搭建文章标签：搜索引擎数据库 elasticsearch python django

本文链接：https://blog.csdn.net/jiangSummer/article/details/113815780

版权

Django博客搭建专栏收录该内容

17 篇文章 0 订阅

订阅专栏

Blog项目——新闻模块

文章目录

Blog项目——新闻模块

提前说明：由于版本原因，Elasticsearch-6以后的版本一个索引下只能一个类型，因此Haystack无法支持Elasticsearch6以上版本。最后会贴出官方回答。因此，要想使用下面的配置elsaticsearch+haystach+django，那就尽量使用elasticsearch2.4左右，haystack版本也在2.4左右，django在2.1左右。并且python中的安装包也尽量与2.X匹配，不然下面的文章对你来说可能就是一个负担，我在接下来几篇文章中会写elasticsearch的使用和对django3.1与elasticsearch7.10+版本的更新处理

官方回答：

Why are mapping types being removed?
Initially, we spoke about an “index” being similar to a “database” in an SQL database, and a “type” being equivalent to a “table”.

This was a bad analogy that led to incorrect assumptions. In an SQL database, tables are independent of each other. The columns in one table have no bearing on columns with the same name in another table. This is not the case for fields in a mapping type.

In an Elasticsearch index, fields that have the same name in different mapping types are backed by the same Lucene field internally. In other words, using the example above, the user_name field in the user type is stored in exactly the same field as the user_name field in the tweet type, and both user_name fields must have the same mapping (definition) in both types.

This can lead to frustration when, for example, you want deleted to be a date field in one type and a boolean field in another type in the same index.

On top of that, storing different entities that have few or no fields in common in the same index leads to sparse data and interferes with Lucene’s ability to compress documents efficiently.

For these reasons, we have decided to remove the concept of mapping types from Elasticsearch.

翻译中文：

最初，我们谈到“索引”类似于SQL数据库中的“数据库”，而“类型”等同于“表”。

这是一个不好的类比，导致了错误的假设。在SQL数据库中，表彼此独立。一个表中的列与另一表中具有相同名称的列无关。映射类型的字段不是这种情况。

在Elasticsearch索引中，在不同映射类型中具有相同名称的字段在内部由相同的Lucene字段支持。换句话说，使用上面的示例，类型中的user_name字段user存储在与类型中的字段完全相同的user_name字段中tweet，并且两个 user_name字段在两种类型中必须具有相同的映射（定义）。

例如，当您想deleted成为 同一索引date中的一种类型的boolean字段而另一种类型的字段时，这可能会导致挫败感。

最重要的是，存储在同一索引中具有很少或没有共同字段的不同实体会导致数据稀疏并干扰Lucene有效压缩文档的能力。

由于这些原因，我们决定从Elasticsearch中删除映射类型的概念。

一、需求分析

对于一个博客系统来说，搜索功能是必不可少的。在我们数据库中，可以使用模糊查询来对数据进行搜寻但是这样的效率非常低效。

# 模糊查询
# like
# %表示任意多个任意字符
# _表示一个任意字符
SELECT *
FROM users
WHERE username LIKE '%python%' AND is_delete = 0;

并且在多个字段中查询，使用like关键字不方便，因此我们就可以使用搜索引擎来实现全文检索。

二、搜索引擎原理

搜索引擎并不是直接在数据库中进行查询
会对数据库中的数据进行一遍预处理，单独建立一份索引结构数据
类似字典的索引检索页

三、Elasticsearch

Elasticsearch是一个高度可扩展的开源全文本搜索和分析引擎。它使您可以快速，近乎实时地存储，搜索和分析大量数据。它通常用作支持具有复杂搜索功能和要求的应用程序的基础引擎/技术。

特点

开源
搜索引擎首选
底层是开源库Lucene
REST API 的操作接口

在这里插入图片描述

搜索引擎在对数据构建索引时，需要进行分词处理。分词是指将一句话拆解成多个单字或词，这些字或词便是这句话的关键词。Elasticsearch 不支持对中文进行分词建立索引，需要配合扩展elasticsearch-analysis-ik来实现中文分词处理。

四、使用docker安装elasticsearch

a. 首先，将镜像文件和配置文件通过WinSCP放入服务器(为什么不用最新版本？最开始说过，ES6之后每个索引只支持一个类型，无法与我们仙子啊的)

b. 修改配置文件

进入elasticsearch/config/elasticsearch.yml第54行，将network.host进行修改，如果是vituralbox则修改成0.0.0.0如果是云服务器，则修改成自己的内网地址：

# network.host: 172.18.168.123
network.host: 0.0.0.0

c. 加载镜像

回到家目录，然后使用docker加载命令将镜像加载进入仓库

docker load -i elasticsearch-ik-2.4.6_docker.tar

查看是否成功：

summer~$ docker images
REPOSITORY                TAG         IMAGE ID       CREATED       SIZE
delron/elasticsearch-ik   2.4.6-1.0   095b6487fb77   2 years ago   689MB

d. 创建实例

docker run -dti --network=host --name=elasticsearch -v /home/summer/elasticsearch-2.4.6/config:/usr/share/elasticsearch/config delron/elasticsearch-ik:2.4.6-1.0

如果容器不稳定切换这条命令创建容器
docker run -dti --name=elasticsearch -p 9200:9200 delron/elasticsearch-ik:2.4.6-1.0

e. 进入项目虚拟环境中，安装相关包

# 进入项目虚拟环境
workon Blog_Django

pip install django-haystack
pip install elasticsearch==2.4.1

f. settings配置

INSTALLED_APPS = [
    'haystack',
]


# Haystack
HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
        'URL': 'http://127.0.0.1:9200/',  # 此处为elasticsearch运行的服务器ip地址，端口号默认为9200
        'INDEX_NAME': 'Blog_Django',  # 指定elasticsearch建立的索引库的名称
    },
}

# 设置每页显示的数据量
HAYSTACK_SEARCH_RESULTS_PER_PAGE = 5
# 当数据库改变时，会自动更新索引
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'

g. 创建search_indexes.py文件

# 在apps/news/search_indexes.py中创建如下类：（名称固定为search_indexes.py）

# -*- coding: utf-8 -*-
# @Auther:Summer
from haystack import indexes
from .models import News


# 第一个必须为app的名字，后面接Index
class NewsIndex(indexes.SearchIndex, indexes.Indexable):
	"""
	News索引数据模型类
	"""
	# 允许属于数据模板（该模板位于template/search/indexes/news/news_text.txt）
	# 此模板指明当将关键词通过text参数名传递时，可以通过news 的title、digest、content 来进行关键字索引查询
	text = indexes.CharField(document=True, use_template=True)
	id = indexes.IntegerField(model_attr='id')
	title = indexes.CharField(model_attr='title')
	digest = indexes.CharField(model_attr='digest')
	content = indexes.CharField(model_attr='content')
	image_url = indexes.CharField(model_attr='image_url')
	comments = indexes.IntegerField(model_attr='comments')

	def get_model(self):
		"""返回建立索引的模型类
		"""
		return News

	def index_queryset(self, using=None):
		"""返回要建立索引的数据查询集
		"""
		return self.get_model().objects.filter(is_delete=False)

h. 在template文件夹下创建search文件夹

在文件中创建一个indexes文件夹，同时在该文件夹下创建一个news文件夹，最后创建一个模型_text.txt的文件。

# template/search/indexes/news/news_text.txt
{{ object.title }}
{{ object.digest }}
{{ object.content }}

i. 使用终端完成索引创建

python manage.py rebuild_index

五、后端代码实现

视图部分

from django.core.paginator import Paginator, PageNotAnInteger, EmptyPage
from haystack.views import SearchView

class Search(SearchView):
	# 确定模板
	template = "news/search.html"

	def create_response(self):
		# 接收前台用户输入的查询值
		kw = self.request.GET("q", "")
		if not kw:
			show = True
			host_news = models.HotNews.objects.select_related('news').only('news_id', 'news__title', 'news__image_url').filter(is_delete=False).order_by('priority')

			paginator = Paginator(host_news, 5)
			try:
				page = paginator.page(int(self.request.GET.get("page", 1)))
			# 如果传的不是整数
			except PageNotAnInteger:
				# 默认返回第一页的数据
				page = paginator.page(1)
			except EmptyPage:
				page = paginator.page(paginator.num_pages)
			return render(self.request, self.template, locals())
		else:
			show = False
			return super().create_response()

路由部分

path("search", views.Search(), name="search"),

自制分页器

我们使用Paginator之后，获得的数据可能会有许多，这就需要用到分页器，因此我们可以自定义标签。

首先在app下创建一个templatetags文件夹，里面创建一个python文件

# templatetags/news_template.py
# -*- coding: utf-8 -*-
# @Auther:Summer
from django import template

register = template.Library()


@register.filter()
def page_bar(page):
	page_list = []
	# 左边
	if page.number != 1:
		page_list.append(1)
	if page.number - 3 > 1:
		page_list.append('...')
	if page.number - 2 > 1:
		page_list.append(page.number - 2)
	if page.number - 1 > 1:
		page_list.append(page.number - 1)

	page_list.append(page.number)
	# 右边
	if page.paginator.num_pages > page.number + 1:
		page_list.append(page.number + 1)
	if page.paginator.num_pages > page.number + 2:
		page_list.append(page.number + 2)
	if page.paginator.num_pages > page.number + 3:
		page_list.append('...')
	if page.paginator.num_pages != page.number:
		page_list.append(page.paginator.num_pages)
	return page_list

六、前端部分

{% extends 'base/base.html' %}
{% block title %}搜索{% endblock %}
{% load news_template %}
{% block link %}
    <link rel="stylesheet" href="../../static/css/news/search.css">

{% endblock %}

{% block main_contain %}
      <div class="main-contain ">
                   <!-- search-box start -->
                   <div class="search-box">
                       <form action="" style="display: inline-flex;">

                           <input type="search" placeholder="请输入要搜索的内容" name="q" class="search-control">


                           <input type="submit" value="搜索" class="search-btn">
                       </form>
                       <!-- 可以用浮动 垂直对齐 以及 flex  -->
                   </div>
                   <!-- search-box end -->
                   <!-- content start -->
                   <div class="content">
                   {% if not show %}
                       <!-- search-list start -->
{#                        {% if not show_all %}#}
                          <div class="search-result-list">
                            <h2 class="search-result-title">
                              搜索结果 <span style="font-weight: 700;color: #ff6620;">{{ paginator.num_pages }}</span>页
                            </h2>
                            <ul class="news-list">
                              {# 导入自带高亮功能 #}
                              {% load highlight %}
                              {% for one_news in page.object_list %}
                                <li class="news-item clearfix">
                                  <a href="{% url 'news:news_detail' one_news.id %}" class="news-thumbnail" target="_blank">
                                  <img src="{{ one_news.object.image_url }}">
                                  </a>
                                  <div class="news-content">
                                    <h4 class="news-title">
                                      <a href="{% url 'news:news_detail' one_news.id %}">
                                        {% highlight one_news.title with query %}
                                      </a>
                                    </h4>
                                    <p class="news-details">{{ one_news.digest }}</p>
                                    <div class="news-other">
                                      <span class="news-type">{{ one_news.object.tag.name }}</span>
                                      <span class="news-time">{{ one_news.object.update_time }}</span>
                                      <span
                                          class="news-author">{% highlight one_news.object.author.username with query %}

                                      </span>
                                    </div>
                                  </div>
                                </li>
                              {% endfor %}


                            </ul>
                          </div>

                        {% else %}

                          <div class="news-contain">
                            <div class="hot-recommend-list">
                              <h2 class="hot-recommend-title">热门推荐</h2>
                              <ul class="news-list">

                                {% for one_hotnews in page.object_list %}

                                  <li class="news-item clearfix">
                                    <a href="#" class="news-thumbnail">
                                      <img src="{{ one_hotnews.news.image_url }}">
                                    </a>
                                    <div class="news-content">
                                      <h4 class="news-title">
                                        <a href="{% url 'news:news_detail' one_hotnews.news.id %}">{{ one_hotnews.news.title }}</a>
                                      </h4>
                                      <p class="news-details">{{ one_hotnews.news.digest }}</p>
                                      <div class="news-other">
                                        <span class="news-type">{{ one_hotnews.news.tag.name }}</span>
                                        <span class="news-time">{{ one_hotnews.update_time }}</span>
                                        <span class="news-author">{{ one_hotnews.news.author.username }}</span>
                                      </div>
                                    </div>
                                  </li>

                                {% endfor %}


                              </ul>
                            </div>
                          </div>

                        {% endif %}

                       <!-- search-list end -->
                       <!-- news-contain start -->

                    {# 分页导航 #}
                     <div class="page-box" id="pages">
                       <div class="pagebar" id="pageBar">
                          <a class="a1">{{ page.paginator.count | default:0 }}条</a>
{#                          上一页的URL地址#}
                         {% if page.has_previous %}
                           {% if query %}
                             <a href="{% url 'news:search' %}?q={{ query }}&amp;page={{ page.previous_page_number }}&q={{ query }}"
                                class="prev">上一页</a>
                           {% else %}
                             <a href="{% url 'news:search' %}?page={{ page.previous_page_number }}" class="prev">上一页</a>
                           {% endif %}
                         {% endif %}


{#                          列出所有的URL地址 页码#}
                       {% if page.has_previous or page.has_next %}

                        {% for n in page|page_bar %}
                            {% if query %}
                                {% if n == '...' %}
                                    <span class="point">{{ n }}</span>
                                {% else %}
                                    {% if n == page.number %}
                                        <span class="sel">{{ n }}</span>
                                    {% else %}
                                        <a href="{% url 'news:search' %}?page={{ n }}&q={{ query }}">{{ n }}</a>
                                    {% endif %}
                                {% endif %}
                            {% else %}
                                {% if n == '...' %}
                                    <span class="point">{{ n }}</span>
                                {% else %}
                                    {% if n == page.number %}
                                        <span class="sel">{{ n }}</span>
                                    {% else %}
                                        <a href="{% url 'news:search' %}?page={{ n }}">{{ n }}</a>
                                    {% endif %}
                                {% endif %}
                            {% endif %}
                        {% endfor %}
                    {% endif %}

{#                       next_page 下一页的URL地址#}
                         {% if page.has_next %}
                           {% if query %}
                             <a href="{% url 'news:search' %}?q={{ query }}&amp;page={{ page.next_page_number }}&q={{ query }}"
                                class="next">下一页</a>
                           {% else %}
                             <a href="{% url 'news:search' %}?page={{ page.next_page_number }}" class="next">下一页</a>
                           {% endif %}
                         {% endif %}
                       </div>
                     </div>
                     <!-- news-contain end -->
                   </div>
                   <!-- content end -->
               </div>

{% endblock %}

添加额外的css

/* === current index start === */
#pages {
	padding: 32px 0 10px;
}

.page-box {
	text-align: center;
    /*font-size: 14px;*/
}

#pages a.prev, a.next {
	width: 56px;
	padding: 0
}

#pages a {
	display: inline-block;
	height: 26px;
	line-height: 26px;
	background: #fff;
	border: 1px solid #e3e3e3;
	text-align: center;
	color: #333;
	padding: 0 10px
}

#pages .sel {
	display: inline-block;
	height: 26px;
	line-height: 26px;
	background: #0093E9;
	border: 1px solid #0093E9;
	color: #fff;
	text-align: center;
	padding: 0 10px
}
/* === current index end === */

至此，你就可以运行项目，查看是否能完成该搜索功能。若想了解更多的elasticsearch可以查看官网或者bilibili视频。

68岁爱用飘柔

关注

3
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
Django博客搭建-新闻模块6-新闻搜索功能（Django+Haystack+elasticsearch）

Blog项目——新闻模块文章目录Blog项目——新闻模块一、需求分析二、搜索引擎原理三、Elasticsearch特点四、使用docker安装elasticsearch五、后端代码实现视图部分路由部分自制分页器六、前端部分添加额外的css提前说明：由于历史原因，elasticsearch在6.0之后就不在支持映射类型，最后会贴出官方回答。因此，要想使用下面的elsaticsearch+haystach+django，那就尽量使用elasticsearch2.4左右，haystack版本也在2.4左右，
复制链接

扫一扫