使用Elasticsearch实现一个简单的小说网站搜索功能

最新推荐文章于 2023-01-05 18:38:25 发布

XY_Noire

最新推荐文章于 2023-01-05 18:38:25 发布

阅读量2.3k

点赞数

分类专栏： python 文章标签： python elasticsearch

本文链接：https://blog.csdn.net/XY_Noire/article/details/99702419

版权

python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

数据源：Mysql中存放的网络爬取的小说信息。

选取其中的 id，名字，作者，介绍字段存到es中。

从mysql中其取出数据，转成字典（json）形式，再使用 es 的 bulk批量提交。

bulk 分 Elasticsearch.bulk 和 from elasticsearch.helpers import bulk，参数有点差异

后者的返回结果为提交结果成功数，失败数/错误列表


from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
import pymysql

es = Elasticsearch()

def getSQLData():
    conn = pymysql.connect(host='localhost',port=3306,user='root',passwd='a',db='books',charset='utf8')
    cur = conn.cursor()

    try:
        sql = 'SELECT nid,novelName,author,introduction FROM books'
        cur.execute(sql)
        return cur.fetchall()
    except Exception as e:
        print( e )

def save2Dict( data ):
    datalist = []
    for d in data:
        try:
            info = {}
            info["nid"] = d[0]
            info["name"] = d[1]
            info["author"] = d[2]
            info["intro"] = d[3]
            datalist.append( info )
        except:
            print( d ,"is error ")
            continue
    return  datalist

def save2ES( index,type, data ):
    actions = []
    for d in data:
        action = {
            "_index": index,
            "_type": type,
            "_source": d
        }
        actions.append(action)
        # 批量处理
    result = bulk(es, actions, index=index, raise_on_error=True)
    print( result )

直接用elasticsearch-head查看结果...

然后用最简单的模糊查询

def search( index, query ):
    body = {
        "query":{
            "multi_match":{
                "query":query
            }
        }
    }

    result = es.search(index=index,body=body)
    for i in result['hits']['hits']:
        print( i )

测试 search( "test1","斗陆")

if __name__ == '__main__':
    # data = getSQLData()
    # info = save2Dict( data )
    # save2ES("test1","books",info)
    search( "test1","斗陆")

成了，score是评分，越高表示越匹配。

这样看起来就跟我们平时搜索到的效果差不多了，而不是简单的 like %斗陆% 的效果。

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

这里的用的数据跟上面不一样，不过效果无影响。

es还有highlight高亮功能，使用后返回结果中会有一个新的部分叫做highlight，这里包含了来自字段中的文本，并且默认用<em></em>来标识匹配到的单词。标识符也可以自定义修改。

首先我们看下某小说网站的搜索结果

搜的值为“剑”，搜索结果中的关键字被单独标识了出来。

我们用es可以实现同样的效果。

def search( index, query ):
    body = {
      "query": {
        "multi_match": {
          "query": query
        }
      },
      "highlight": {
        "fields": {
          "*":{
            "pre_tags": ["<city style='color:red'>"],
            "post_tags": ["</city>"]
          }
        }
      }
    }

    result = es.search(index=index,body=body)
    return result['hits']["hits"]

"pre_tags": ["标识前缀"]
"post_tags": ["标识后缀"]

效果：

这里稍微处理一下，放在html中。

def showResult( result ):
    context = ""
    for data in result:
        context += "<div>书名：{}</div>".format( data['highlight'].get("name",[data["_source"]["name"]] )[0] )
        context += "<div>作者：{}</div>".format( data['highlight'].get("author",[data["_source"]["author"]] )[0] )
        context += "<div>介绍：{}</div>".format( data['highlight'].get("intro",[data["_source"]["intro"]] )[0] )
        context += "<div>类型：{}</div>".format( data['highlight'].get("type",[data["_source"]["type"]] )[0] )
        context += "<br>"

    htmlContext = """
    <!DOCTYPE html>
    <html>
    <head>
    <meta charset="UTF-8">
    <title>搜索测试</title>
    </head>
    <body>
    {}
    </body>
    </html>
    """.format( context )

    f = open("text.html","w",encoding="utf8")
    f.write( htmlContext )
    f.close()

这样我们就完成了一个仿照小说网站的搜索功能。es的查询功能十分丰富，这里只使用了最基本的一点，具体内容推荐去查看相关文档。

XY_Noire

关注

0
点赞
踩
7

收藏

觉得还不错? 一键收藏
2
评论
使用Elasticsearch实现一个简单的小说网站搜索功能

数据源：Mysql中存放的网络爬取的小说信息。选取其中的 id，名字，作者，介绍字段存到es中。从mysql中其取出数据，转成字典（json）形式，再使用 es 的 bulk批量提交。bulk 分 Elasticsearch.bulk 和 from elasticsearch.helpers import bulk，参数有点差异后者的返回结果为提交结果成功数，失败数/错误列...
复制链接

扫一扫