【boost搜索引擎】

最新推荐文章于 2023-09-12 12:19:02 发布

桑榆非晚ᴷ

最新推荐文章于 2023-09-12 12:19:02 发布

阅读量364

点赞数 1

分类专栏：实战项目文章标签：搜索引擎

本文链接：https://blog.csdn.net/langk_/article/details/129221470

版权

实战项目专栏收录该内容

4 篇文章 0 订阅

订阅专栏

🎉实战项目：Boost搜索引擎

博主主页：桑榆非晚ᴷ

博主能力有限，如果有出错的地方希望大家不吝赐教

给自己打气：成功没有快车道，幸福没有高速路。所有的成功，都来自不倦地努力和奔跑，所有的幸福都来自平凡的奋斗和坚持🥰🎉✨

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4Apj0kQ0-1677335193525)(C:\Users\13916\Pictures\Saved Pictures\壁纸\微信图片_20221103215440.png)]$

1.项目背景及项目目标

（1）在如今的信息时代下，市面上已经有了很多的公司有了自己的搜索引擎。比如最为知名的百度、搜狗、360搜索等搜索引擎。但是这些搜索引擎太过庞大，技术门槛太高，实现的资源成本也高，目前我们自主实现是不太可能的。我们可以自主实现一个站内搜索的搜索引擎，就比如我们经常使用的cplusplus.com网站，站内搜索的特点就是数据搜索更垂直，数据量更小。

（2）Boost作为C++的准标准库,在C++代码编写中使用频率很高,但是在官方的网站中,却没有站内搜索,并不便于用户的快速查找。 $[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-gPfescJv-1677335193526)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230223194720030.png)]$

所以我们的项目boost搜索引擎,就是用来提供对boost官方库中资源的搜索服务的。

2.搜索引擎的相关宏观原理

(1)通过爬虫程序在全网中抓取相关的html网页信息,存至server服务器端的磁盘当中。

(2)对这些html文件,进行去标签化与数据清理,即只保留网页文件中的主要信息(title,content,url)。

(3)对去标签化清理后的数据,建立索引,方便我们进行后续的检索查找。

(4)客户端在浏览器中发起http请求,服务端在索引中检索到相关的html网页主要信息。

(5)拼接多个网页的(title+content+url)信息,构建出一个新html网页,返回给用户。

PS:爬虫程序,涉及法律,技术等因素限制,所以我们暂时只爬取一个boost库官方网站,且通过正规

渠道下载boost库的相关文件，我们这里使用的是boost_1_81_0版本。

boost库下载链接：https://boostorg.jfrog.io/artifactory/main/release/1.81.0/source/
我们项目里使用boost_1_81_0.tar.gz

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Cwefiept-1677335193526)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230223195106941.png)]$

3.搜索引擎技术栈和项目环境

技术栈: C/C++、STL、准标准库Boost、Jsoncpp、cppjieba、cpp-httplib 、HTML5，CSS，js，jQuery、Ajax
项目环境： Centos 7云服务器、vim/gcc(g++)/git/Makefile 、 VSCode

4.正排索引 && 倒排索引 - 搜索引擎基本原理

文档1：雷军买了四斤小米
文档2：雷军发布了小米手机

(1) 正排索引：根据文档ID找到文档内容

文档ID	文档内容
1	雷军买了四斤小米
2	雷军发布了小米手机

(2) 文档分词：对目标文档进行分词(目的: 方便建立倒排索引与查找)

文档1[雷军买了四斤小米 ]: 雷军/买/四斤/小米/四斤小米
文档2[雷军发布了小米手机]：雷军/发布/小米/小米手机

PS:停止词如 “了” , “从” , “吗” , “the” , “a” 等,在我们分词的时候不纳入考虑范围。

(3) 倒排索引：根据文档内容，分词，整理不重复的各个关键字，对应联系到文档ID的方案

关键词（具有唯一性）	文档ID，权重（weight）
雷军	文档1、文档2
买	文档1
四斤	文档1
小米	文档1、文档2
四斤小米	文档1
发布	文档2
小米手机	文档2

(4) 模拟一次查找的过程：

用户输入 : 小米 -> 倒排索引中查找 -> 提取出文档ID{1,2} -> 根据正排索引

-> 找到文档内容 -> title+content+url 文档结果进行摘要 -> 构建响应结果

5. 编写数据去标签与数据清理的模块 `Parser`

(1) 获取相关boost资源,进入官网 https://www.boost.org 进行相应资源下载(我们以1.81.0为例)
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-l4CZQiwq-1677335193527)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230223203612570.png)]$
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MJLyTIqt-1677335193528)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230223204238317.png)]$
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-w1uBltLK-1677335193528)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230223211147376.png)]$
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RWUDYFbQ-1677335193529)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230223212431618.png)]$

(2) Parser.cc代码框架

#include <iostream>
#include <string>
#include <vector>

// 是一个目录，下面放的是所有的html网页
const std::string src_path = "data/input";
// 去标签后
const std::string output = "data/raw_html/raw.txt";

typedef struct DocInfo
{
    std::string title;   // 文档标题
    std::string content; // 文档内容
    std::string url;     // 该文档在官网中的url
}DocInfo_t;

// 规定
// const & :输入型参数
// * : 输出型参数
// & : 输入输出型参数

// 
bool EnumFile(const std::string &src_path, std::vector<std::string> *files_list);

bool ParseHtml(const std::vector<std::string> &files_list, std::vector<DocInfo_t> *results);

bool SaveHtml(const std::vector<DocInfo_t> &results, const std::string &output);


int main()
{ 
    // 创建一个用于保存文件名带路径的顺序容器vector
    std::vector<std::string> files_list;
    // 第一步：递归式的把每个html文件名带路径，保存到files_list,方便后期进行一个一个的文件进行读取
    if(!EnumFile(src_path, &files_list))
    {
        std::cerr << "enum file name error!" << std::endl;
        return 1;
    }
        
    // 第二部：按照files_list读取每一个文件的内容，并进行解析
    std::vector<DocInfo_t> results;
    if(!ParseHtml(files_list, &results))
    {
        std::cerr << "parse html error!" <<std::endl;
        return 2;
    }

    // 第三步：把解析完毕的各个文件内容，写入到output，按照\3作为每个文档内容的分隔符
    // 例如：
    // title\3content\3url \n title\3content\3url \n title\3content\3url \n
    if(!SaveHtml(results, output))
    {
        std::cerr << "save html error!" << std::endl;
        return 3;
    }

    return 0;
}

(3) EnumFile接口的实现:

要实现EnumFile接口,就是要在/data/input/文件夹下 , 提取每个html网页文件的路径名称。这时候就需要借助boost库中的接口来完成这一任务。

这里做一个区分,我们做站内搜索的版本是1.8.0 , 我们写代码要使用的boost库是1.53.0版本。
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zabLvSvT-1677335193529)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230224122138024.png)]$
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-TJXyaoqf-1677335193529)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230224122055295.png)]$

在云服务器中对boost库进行安装:sudo yum install -y boost-devel

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-o1FsB1ZD-1677335193530)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230224121052988.png)]$

(4) boost库的使用步骤：
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-GFlqVpYK-1677335193530)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230224122604078.png)]$
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-mhZMVarE-1677335193530)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230224122743876.png)]$
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-DTZGcbKG-1677335193531)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230224122243092.png)]$
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BfH3yOjq-1677335193531)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230224123034564.png)]$

EnumFile接口实现：

bool EnumFile(const std::string &src_path, std::vector<std::string> *files_list)
{
    namespace fs = boost::filesystem;
    // 创建一个path对象
    fs::path root_path(src_path);
    // 判断路径是否存在，不存在就没有必要再往下走了
    if(!fs::exists(root_path))
    {
        std::cerr << src_path << "not exists" << std::endl;
        return false;
    }

    // 定义一个空的迭代器，用来进行判断递归结束
    fs::recursive_directory_iterator end;
    for(fs::recursive_directory_iterator iter(root_path); iter != end; iter++)
    {
        // 判断是否是普通文件，html都是普通文件
        if(!fs::is_regular_file(*iter))
        {
            continue;
        }
        // 判断文件路径的后缀是否符合要求
        if(iter->path().extension() != ".html")
        {
            continue;
        }
        // std::cout << "debug: " << iter->path().string() << std::endl;
        // 当程序执行的这里，说明当前的路径式合法的，以.html结束的普通网页文件
        
        // 将所有带路径的html保存在files_list，方便后续进行文本分析
        // iter->path()获取的还是path对象，我们需要的是string风格的带路径的文件名
        files_list->push_back(iter->path().string());
    }
    return true;
}

ParseHtml接口实现：

bool ParseHtml(const std::vector<std::string> &files_list, std::vector<DocInfo_t> *results)
{
    // files_list中存放的都是以.html结尾的文件名的路径
    for(const std::string &file : files_list)
    {
        // 1.读取文件，Read()
        std::string result; // 读取文件的内容放到result中
        if(!ns_util::FileUtil::ReadFile(file, &result))
        {
            // 如果当前的.html文件读取失败，就不再进行解析，直接继续读取下一个.html文件
            continue;
        }
		
		// typedef struct DocInfo
		// {
    		// std::string title;   // 文档标题
    		// std::string content; // 文档内容
    		// std::string url;     // 该文档在官网中的url
        // }DocInfo_t;
        
        DocInfo_t doc;
        // 2.解析指定文件，提取title
        if(!ParseTitle(result, &doc.title))
        {
            continue;
        }
        // 3.解析指定文件，提取content
        if(!ParseContent(result, &doc.content))
        {
            continue;
        }
        // 4.解析指定文件路径，构建url
        if(!ParseUrl(file, &doc.url))
        {
            continue;
        }

        // done 
        results->push_back(std::move(doc));// bug:TODO 细节，本质会发生拷贝，效率可能比较低，所以使用c++11中的移动构造提高效率
    }

    return true;
}

ReadFile接口实现：

#pragma once
#include <iostream>
#include <string>
#include <fstream>

namespace ns_util
{
    class FileUtil
    {
    public:
        static bool ReadFile(const std::string &file_path, std::string *out)
        {
            // C++中的文件操作
            std::ifstream in(file_path, std::ios::in);
            if(!in.is_open())
            {
                std::cerr << "open file" << file_path << " error" << std::endl;
                return false;
            }

            std::string line;
            while(std::getline(in, line)) // 如何理解getline读取到文件结束呢？？getline的返回值是一个&，while(bool)，本质是因为返回类型重载了强制类型转化
            {
                *out += line;
            }

            in.close();
            return true;
        }
    };
}

ParseTitle接口实现：

static bool ParseTitle(const std::string &result, std::string *title)
{
    size_t begin = result.find("<title>");
    if(begin == std::string::npos)
    {
        return false;
    }
    size_t end = result.find("</title>");
    if(end == std::string::npos)
    {
        return false;
    }

    begin += std::string("<title>").size();
    if(begin > end)
    {
        return false;
    }

    *title = result.substr(begin, end - begin);
    return true;
}

ParseContent接口实现：

static bool ParseContent(const std::string &result, std::string *content) 
{
   // 去标签，基于一个简易的状态机
   enum status 
   {
        LABLE,
        CONTENT
   };

   enum status s = LABLE;
   for(char c :result)
   {
       switch(s)
       {
           case LABLE:
               {
                    if(c == '>')
                       s = CONTENT;
                    break;
               }
           case CONTENT:
               {
                    if(c == '<')
                        s = LABLE;
                    else 
                    {
                        // 我们不想被保留原始文件中的\n，因为我们想用\n作为html解析之后文本的分隔符
                        if(c == '\n')
                        {
                            c = ' ';
                        }
                        content->push_back(c);
                    }

                    break;
               }
            default:
               break;
       }
   }
    return true;
}

ParseUrl接口实现：
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-a0Pq2ehn-1677335193532)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230224135227117.png)]$

static bool ParseUrl(const std::string &file_path, std::string *url)
{
    // 需要把官网的链接与本地链接进行拼接
    std::string url_head = "https://www.boost.org/doc/libs/1_81_0/doc/html";
    // src_path = "data/input"
    // file_path = "data/input/*.html"
    std::string url_tail = file_path.substr(src_path.size());

    *url = url_head + url_tail;
    
    return true;
}

SaveHtml接口实现：

把解析好的（去标签的）各个文件内容从std::vector<DocInfo_t> results以格式为title**\3content\3url\n写入到磁盘"data/raw_html/raw.txt"**文件中。

bool SaveHtml(const std::vector<DocInfo_t> &results, const std::string &output)
{
    const char SEP = '\3';
    std::ofstream out(output, std::ios::out | std::ios::binary);
    if(!out.is_open())
    {
        std::cerr << "open " << output << " failed!" << std::endl;
        return false;
    }

    // 就可以进行文件的写入了
    for(const DocInfo_t &item : results)
    {
        std::string out_string;
        out_string = item.title;
        out_string += SEP;
        out_string += item.content;
        out_string += SEP;
        out_string += item.url;
        out_string += '\n';
        
        out.write(out_string.c_str(), out_string.size());
    }
    out.close();
    return true;
}

如下图所示，这就是最终处理的结果，20代表行号，这一行由title\3content\3url\n组成，
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Pn8Hsizo-1677335193532)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230224141428017.png)]$

最终Parser.cc的代码内容：

#include <iostream>
#include <string>
#include <vector>
#include <boost/filesystem.hpp>
#include "util.hpp"

// 是一个目录，下面放的是所有的html网页
const std::string src_path = "data/input";
// 
const std::string output = "data/raw_html/raw.txt";

typedef struct DocInfo
{
    std::string title;   // 文档标题
    std::string content; // 文档内容
    std::string url;     // 该文档在官网中的url
}DocInfo_t;

// const & :输入
// * : 输出
// & : 输入输出

bool EnumFile(const std::string &src_path, std::vector<std::string> *files_list);

bool ParseHtml(const std::vector<std::string> &files_list, std::vector<DocInfo_t> *results);

bool SaveHtml(const std::vector<DocInfo_t> &results, const std::string &output);


int main()
{ 
    // 创建一个用于保存文件名带路径的顺序容器vector
    std::vector<std::string> files_list;
    // 第一步：递归式的把每个html文件名带路径，保存到files_list,方便后期进行一个一个的文件进行读取
    if(!EnumFile(src_path, &files_list))
    {
        std::cerr << "enum file name error!" << std::endl;
        return 1;
    }
    
    
    // 第二部：按照files_list读取每一个文件的内容，并进行解析
    std::vector<DocInfo_t> results;
    if(!ParseHtml(files_list, &results))
    {
        std::cerr << "parse html error!" <<std::endl;
        return 2;
    }

    // 第三步：把解析完毕的各个文件内容，写入到output，按照\3作为每个文档的分隔符
    if(!SaveHtml(results, output))
    {
        std::cerr << "save html error!" << std::endl;
        return 3;
    }

    return 0;
}

bool EnumFile(const std::string &src_path, std::vector<std::string> *files_list)
{
    namespace fs = boost::filesystem;
    fs::path root_path(src_path);
    // 判断路径是否存在，不存在就没有必要再往下走了
    if(!fs::exists(root_path))
    {
        std::cerr << src_path << "not exists" << std::endl;
        return false;
    }

    // 定义一个空的迭代器，用来进行判断递归结束
    fs::recursive_directory_iterator end;
    for(fs::recursive_directory_iterator iter(root_path); iter != end; iter++)
    {
        // 判断是否是普通文件，html都是普通文件
        if(!fs::is_regular_file(*iter))
        {
            continue;
        }
        // 判断文件路径的后缀是否符合要求
        if(iter->path().extension() != ".html")
        {
            continue;
        }
        // std::cout << "debug: " << iter->path().string() << std::endl;
        // 当前的路径式合法的，以.html结束的普通网页文件
        
        // 将所有带路径的html保存在files_list，方便后续进行文本分析
        // iter->path()获取的还是path对象，我们需要的是string
        files_list->push_back(iter->path().string());
    }
    return true;
}

static bool ParseTitle(const std::string &result, std::string *title)
{
    size_t begin = result.find("<title>");
    if(begin == std::string::npos)
    {
        return false;
    }
    size_t end = result.find("</title>");
    if(end == std::string::npos)
    {
        return false;
    }

    begin += std::string("<title>").size();
    if(begin > end)
    {
        return false;
    }

    *title = result.substr(begin, end - begin);
    return true;
}

static bool ParseContent(const std::string &result, std::string *content) 
{
   // 去标签，基于一个简易的状态机
   enum status 
   {
        LABLE,
        CONTENT
   };

   enum status s = LABLE;
   for(char c :result)
   {
       switch(s)
       {
           case LABLE:
               {
                    if(c == '>')
                       s = CONTENT;
                    break;
               }
           case CONTENT:
               {
                    if(c == '<')
                        s = LABLE;
                    else 
                    {
                        // 我们不想被保留原始文件中的\n，因为我们想用\n作为html解析之后文本的分隔符
                        if(c == '\n')
                        {
                            c = ' ';
                        }
                        content->push_back(c);
                    }

                    break;
               }
            default:
               break;
       }
   }
    return true;
}

static bool ParseUrl(const std::string &file_path, std::string *url)
{
    std::string url_head = "https://www.boost.org/doc/libs/1_81_0/doc/html";
    std::string url_tail = file_path.substr(src_path.size());

    *url = url_head + url_tail;
    
    return true;
}

void ShowDoc(const DocInfo_t &doc)
{
    std::cout << "title: " << doc.title << std::endl; 
    std::cout << "content: " << doc.content << std::endl; 
    std::cout << "url: " << doc.url << std::endl; 
}

bool ParseHtml(const std::vector<std::string> &files_list, std::vector<DocInfo_t> *results)
{
    for(const std::string &file : files_list)
    {
        // 1.读取文件，Read()
        std::string result; // 读取文件的内容放到result中
        if(!ns_util::FileUtil::ReadFile(file, &result))
        {
            continue;
        }

        DocInfo_t doc;
        // 2.解析指定文件，提取title
        if(!ParseTitle(result, &doc.title))
        {
            continue;
        }
        // 3.解析指定文件，提取content
        if(!ParseContent(result, &doc.content))
        {
            continue;
        }
        // 4.解析指定文件路径，构建url
        if(!ParseUrl(file, &doc.url))
        {
            continue;
        }

        // done 
        results->push_back(std::move(doc));// bug:TODO 细节，本质会发生拷贝,效率可能比较低
        
        // for debug
        // ShowDoc(doc);
        // break;
    }

    return true;
}

bool SaveHtml(const std::vector<DocInfo_t> &results, const std::string &output)
{
    const char SEP = '\3';
    std::ofstream out(output, std::ios::out | std::ios::binary);
    if(!out.is_open())
    {
        std::cerr << "open " << output << " failed!" << std::endl;
        return false;
    }

    // 就可以进行文件的写入了
    for(auto &item : results)
    {
        std::string out_string;
        out_string = item.title;
        out_string += SEP;
        out_string += item.content;
        out_string += SEP;
        out_string += item.url;
        out_string += '\n';
        
        out.write(out_string.c_str(), out_string.size());
    }
    out.close();
    return true;
}

6. 编写建立索引的模块`Index`

(1) Index.hpp代码框架

#pragma once 
#include <iostream>
#include <string>
#include <vector>
#include <unordered_map>

namespace ns_index 
{
    // 正排索引存储的基本信息，文档ID -> 文档信息
    struct DocInfo
    {
        std::string title;   // 文档的标题
        std::string content; // 文档的内容
        std::string url;     // 官网文档url
        uint64_t doc_id;     // 文档ID
    };
	// 倒排索引存储的基本信息 关键词 -> 文档ID，关键词权重weight
    struct InvertedElem 
    {
        uint64_t doc_id; // 文档Id
        std::string word;// 搜索关键词
        int weight;      // 权重，后面详细解说
    };
    
    // 倒排拉链
    typedef std::vector<InvertedElem> InvertedList;
    
    class Index
    {
    public:
        Index(){}
        ~Index(){}
        
        // 构建文档索引
        // 根据去标签，格式化之后的文档，构建正排和倒排索引 
        // input = data/raw_html/raw.txt
        bool BuildIndex(const std::string &input) 
        {
            return true;
        }
        
        // 根据doc_id找到文档内容
        DocInfo *GetForwardIndex(uint64_t doc_id)
        {
            return nullptr;
        }

        // 根据关键字获取倒排拉链
        InvertedList *GetInveretList(const std::string &word)
        {
            return nullptr;
        }
    private:
        // 正排索引的数据结果用数组，数组的下标就是文档id
        std::vector<DocInfo> forward_index;
        // 倒排索引一定是一个关键字和一组(个)InvertedElem对应
        std::unordered_map<std::string, InvertedList> inverted_index;       
    };
}

BuildIndex接口：

bool BuildIndex(const std::string &input) 
{
    std::ifstream in(input, std::ios::in | std::ios::binary);
    if(!in.is_open())
    {
        std::cerr << "sorry, " << input << " open error" << std::endl;
        return false;
    }
    std::string line;
    int count = 0;
    while(std::getline(in, line))
    {
        DocInfo *doc = BuildForwardIndex(line); // 为啥还要返回正排节点
        if(nullptr == doc)
        {
            std::cerr << "build " << line << " error" << std::endl; // for debug
            continue; 
        }

        BuildInvertedIndex(*doc);
        count++;
        if(count % 50 == 0)
            std::cout << "当前已经建立的索引文档： " << count << std::endl;
    }

    in.close();
    return true;
}

BuildForwardIndex接口实现：

切分字符串-boost库split函数使用

举例使用 : 一个例子带你了解boost::split分词使用
#include <iostream>
#include <string>
#include <vector>
#include <boost/algorithm/string.hpp>

const std::string line = "####################";
int main()
{
    std::string src_str = "we may lose,we oftern lose,howervr,,,we never say die";
    std::vector<std::string> results1, results2, results3;
    std::string sep = ",";

    boost::split(results1, src_str, boost::is_any_of(","));
    for(auto &str : results1)
    {
        std::cout << str << std::endl;
    }
    std::cout << line << std::endl;

    boost::split(results2, src_str, boost::is_any_of(","), boost::token_compress_off);
    for(auto &str : results2)
    {
        std::cout << str << std::endl;
    }
    std::cout << line << std::endl;

    boost::split(results3, src_str, boost::is_any_of(","), boost::token_compress_on);
    for(auto &str : results3)
    {
        std::cout << str << std::endl;
    }
    std::cout << line << std::endl;
    
    return 0;
}
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9XRWcTSx-1677335193532)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230225213631027.png)]$

可见boost::token_compress_off不会把boost::is_any_of(“字符串”)进行压缩，比如,上面按,进行分割字符串，它们三个之前会有两个空字符串也会被进行分割，分割后的空字符串push_back到std::vector< std::string>中。而boost::token_compress_on会把boost::is_any_of(“字符串”)进行压缩，压缩成一个，比如面的,就被压缩为一个,没有空字符串被push_back到std::vector< std::string >中。最后观察不带第四个参数，我们可以看到它的默认参数给的是boost::token_compress_off。

注意：编译时要指明要链接的库-lboost_system -lboost_filesystem

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-EeqTfDj5-1677335193533)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230225214948829.png)]$

DocInfo *BuildForwardIndex(const std::string &line)
{
    // 1.解析line，字符串切分
    const std::string sep = "\3";
    std::vector<std::string> results;
    ns_util::StringUtil::Split(line, &results, sep);
    if(3 != results.size())
    {
        return nullptr;
    }
    // 2.字符串进行填充到Docinfo
    DocInfo doc;
    doc.title = results[0];
    doc.content = results[1];
    doc.url = results[2];
    doc.doc_id = forward_index.size(); // 先进行保存id，在插入，对应的id就是当前doc在vector中的下标！
    // 3.插入到正排索引的vector
    forward_index.push_back(std::move(doc));

    return &forward_index.back();
}

BuildInvertedIndex接口实现：

Jieba库的安装和使用
我们进入GitHub来获取cppjieba分词工具资源(链接如下)
cppjieba下载链接

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ZyZWyfFD-1677335193533)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230224153810145.png)]$

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-eEo0tK3p-1677335193534)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230224154045889.png)]$

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lncOrov3-1677335193534)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230224154909585.png)]$

bool BuildInvertedIndex(const DocInfo &doc)
{
    // 负责记录title和content中关键词出现的次数
    struct word_cnt
    {
        int title_cnt = 0;
        int content_cnt = 0;
    };
 	
    // 用来暂存词频的映射表 关键词->word_cnt
    std::unordered_map<std::string, word_cnt> word_map; 
 
    // 使用jieba分词将title进行分词，分词结果保存到title_words中
    std::vector<std::string> title_words;
    ns_util::JiebaUtil::CutString(doc.title, &title_words); 

    for(std::string s : title_words)
    {
        boost::to_lower(s); // 将我们的分词进行统一转化为小写的
        word_map[s].title_cnt++; // 如果存在就获取，如果不存在就新建
    }

    // 使用jieba分词将content词结果保存到content_wrods中
    std::vector<std::string> content_words;
    ns_util::JiebaUtil::CutString(doc.content, &content_words);
    
    for(std::string s :content_words)
    {
        boost::to_lower(s); // 将我们的分词进行统一转化为小写的
        word_map[s].content_cnt++;
    }

    const int X = 10;
    const int Y = 1;

    for(auto &word_pair : word_map)
    {
        InvertedElem item;
        item.doc_id = doc.doc_id;
        item.word = word_pair.first;
        item.weight = X * word_pair.second.title_cnt + Y * word_pair.second.content_cnt; // 相关性
        InvertedList &inverted_list = inverted_index[word_pair.first]; // 不理解，不存在就添加
        inverted_list.push_back(std::move(item));
    }

    return true;
}

GetForwardIndex接口实现：

DocInfo *GetForwardIndex(uint64_t doc_id)
{
    if(doc_id >= forward_index.size())
    {
        std::cerr << "doc_id out range, error!" << std::endl;
        return nullptr;
    }
    return &forward_index[doc_id];
}

GetInveretList接口实现：

InvertedList *GetInveretList(const std::string &word)
{
    auto iter = inverted_index.find(word); 
    if(iter == inverted_index.end())
    {
        std::cerr << word << " have no InvertedList" << std::endl;
        return nullptr;
    }
    return &(iter->second);
}

最终Index.hpp的代码内容(单例模式（懒汉模式)）：

#pragma once 
#include <iostream>
#include <string>
#include <vector>
#include <mutex>
#include <fstream>
#include <unordered_map>
#include "util.hpp"
namespace ns_index 
{
    struct DocInfo
    {
        std::string title;
        std::string content;
        std::string url;
        uint64_t doc_id; // 文档i:
    };

    struct InvertedElem 
    {
        uint64_t doc_id;
        std::string word;
        int weight;
    };
    // 倒排拉链
    typedef std::vector<InvertedElem> InvertedList;
    
    class Index
    {
    public:
        ~Index(){}
        static Index* GetInstance()
        {
            if(nullptr == instance)
            {
                mtx.lock();
                if(nullptr == instance)
                {
                    instance = new Index();
                }
                mtx.unlock();
            }
            
            return instance;
        }
        // 根据doc_id找到文档内容
        DocInfo *GetForwardIndex(uint64_t doc_id)
        {
            if(doc_id >= forward_index.size())
            {
                std::cerr << "doc_id out range, error!" << std::endl;
                return nullptr;
            }
            return &forward_index[doc_id];
        }

        // 根据关键字获取倒排拉链
        InvertedList *GetInveretList(const std::string &word)
        {
            auto iter = inverted_index.find(word); // 有疑问
            if(iter == inverted_index.end())
            {
                std::cerr << word << " have no InvertedList" << std::endl;
                return nullptr;
            }
            return &(iter->second);
        }

        // 构建文档索引
        // 根据去标签，格式化之后的文档，构建正排和倒排索引 
        // data/raw_html/raw.txt
        bool BuildIndex(const std::string &input) // parse处理完毕的数据交给我
        {
            std::ifstream in(input, std::ios::in | std::ios::binary);
            if(!in.is_open())
            {
                std::cerr << "sorry, " << input << " open error" << std::endl;
                return false;
            }
            std::string line;
            int count = 0;
            while(std::getline(in, line))
            {
                DocInfo *doc = BuildForwardIndex(line);
                if(nullptr == doc)
                {
                    std::cerr << "build " << line << " error" << std::endl; // for debug
                    continue; 
                }

                BuildInvertedIndex(*doc);
                count++;
                if(count % 50 == 0)
                    std::cout << "当前已经建立的索引文档： " << count << std::endl;
            }

            in.close();
            return true;
        }

    private:
        DocInfo *BuildForwardIndex(const std::string &line)
        {
            // 1.解析line，字符串切分
            const std::string sep = "\3";
            std::vector<std::string> results;
            ns_util::StringUtil::Split(line, &results, sep);
            if(3 != results.size())
            {
                return nullptr;
            }
            // 2.字符串进行填充到Docinfo
            DocInfo doc;
            doc.title = results[0];
            doc.content = results[1];
            doc.url = results[2];
            doc.doc_id = forward_index.size(); // 先进行保存id，在插入，对应的id就是当前doc在vector中的下标！
            // 3.插入到正排索引的vector
            forward_index.push_back(std::move(doc));

            return &forward_index.back();
        }

       bool BuildInvertedIndex(const DocInfo &doc)
       {
           struct word_cnt
           {
               int title_cnt = 0;
               int content_cnt = 0;
           };
           
           std::unordered_map<std::string, word_cnt> word_map; // 用来暂存词频的映射表
           std::vector<std::string> title_words;
           ns_util::JiebaUtil::CutString(doc.title, &title_words); 
           
           for(std::string s : title_words)
           {
               boost::to_lower(s); // 将我们的分词进行统一转化为小写的
               word_map[s].title_cnt++; // 如果存在就获取，如果不存在就新建
           }

           std::vector<std::string> content_words;
           ns_util::JiebaUtil::CutString(doc.content, &content_words);
           for(std::string s :content_words)
           {
               boost::to_lower(s); // 将我们的分词进行统一转化为小写的
               word_map[s].content_cnt++;
           }

           const int X = 10;
           const int Y = 1;

           for(auto &word_pair : word_map)
           {
               InvertedElem item;
               item.doc_id = doc.doc_id;
               item.word = word_pair.first;
               item.weight = X * word_pair.second.title_cnt + Y * word_pair.second.content_cnt; // 相关性
               InvertedList &inverted_list = inverted_index[word_pair.first]; // 不理解，不存在就添加
               inverted_list.push_back(std::move(item));
           }
           
           return true;
       }
    private:
        // 正排索引的数据结果用数组，数组的下标就是文档id
        std::vector<DocInfo> forward_index;
        // 倒排索引一定是一个关键字和一组(个)InvertedElem对应
        std::unordered_map<std::string, InvertedList> inverted_index;        
    private: 
        Index(){}
        Index(const Index&) = delete;
        Index& operator=(const Index&) = delete;

        static Index* instance;
        static std::mutex mtx;
    };
     Index* Index::instance = nullptr;
     std::mutex Index::mtx;
}

7. 编写搜索引擎模块`Searcher`

(1) searcher.hpp代码框架

#pragma once 
#include "index.hpp"

namespace ns_searcher 
{
    struct InvertedElemPrint 
    {
        uint64_t doc_id = 0;
        int weight = 0;
        std::vector<std::string> words;

    };
    class Searcher
    {
    public:
        Searcher() {}
        ~Searcher() {}

        void InitSearcher(const std::string & input) 
        {
            // 1. 获取或者创建index对象
            
            // 2. 根据index对象建立索引
            
        }
        
        // query: 搜索关键词
        // json_string: 返回给用户浏览器的搜索结构
        
        void Search(const std::string &query, std::string *json_string)
        {
            // 1.[分词]: 对我们的query进行按照searcher的要求进行分词
            // 2.[触发]: 就是根据分词的各个"词"进行index查找
          
            // 3.[合并排序]: 汇总查找结果，按照相关性进行降序排序
           
            // 4.[构建]: 根据查找出的结果，构建json串---jsoncpp--- 通过jsoncpp完成序列化和反序列化
         
        }
    
    private:
        ns_index::Index *index; // 供系统进行查找的索引
    };
}

Jsoncpp库的引入与使用

Jsoncpp的下载：sudo yum install -y jsoncpp-devel

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ic6ESw1z-1677335193534)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230224160911876.png)]$

对jsoncpp库的使用测试
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5Nr94yvn-1677335193535)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230225221917709.png)]$
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-etOlUb0L-1677335193535)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230224164550121.png)]$
由于Jsoncpp是第三方库，所以编译是要指明要链接的库

g++ test.cc -ljsoncpp

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-NPw3TErx-1677335193535)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230225222057134.png)]$
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-TuUoWWwG-1677335193536)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230225222144492.png)]$

最终searcher.hpp的代码内容：

#pragma once 
#include "index.hpp"
#include "util.hpp"
#include <algorithm>
#include "jsoncpp/json/json.h"

namespace ns_searcher 
{
    struct InvertedElemPrint 
    {
        uint64_t doc_id = 0;
        int weight = 0;
        std::vector<std::string> words;

    };
    class Searcher
    {
    public:
        Searcher() {}
        ~Searcher() {}

        void InitSearcher(const std::string & input) 
        {
            // 1. 获取或者创建index对象
            index = ns_index::Index::GetInstance();
            std::cout << "获取单列成功..." << std::endl;
            // 2. 根据index对象建立索引
            index->BuildIndex(input);
            std::cout << "建立正排和倒排索引成功..." << std::endl;
        }
        
        // query: 搜索关键词
        // json_string: 返回给用户浏览器的搜索结构
        
        void Search(const std::string &query, std::string *json_string)
        {
            // 1.[分词]: 对我们的query进行按照searcher的要求进行分词
            std::vector<std::string> words;
            ns_util::JiebaUtil::CutString(query, &words);
            // 2.[触发]: 就是根据分词的各个"词"进行index查找
            // ns_index::InvertedList inverted_list_all;
            std::vector<InvertedElemPrint> inverted_list_all;

            std::unordered_map<uint64_t, InvertedElemPrint> token_map;
            
            for(std::string word : words)
            {
                boost::to_lower(word);

                ns_index::InvertedList *inverted_list = index->GetInveretList(word);
                if(nullptr == inverted_list)
                {
                    continue;
                }
                // inverted_list_all.insert(inverted_list_all.end(), inverted_list->begin(), inverted_list->end());
                for(const auto &elem: *inverted_list)
                {
                    auto &item = token_map[elem.doc_id];
                    item.doc_id = elem.doc_id;
                    item.weight += elem.weight;
                    item.words.push_back(elem.word);
                }
            }
            for(const auto &item: token_map)
            {
                inverted_list_all.push_back(std::move(item.second));
            }
            // 3.[合并排序]: 汇总查找结果，按照相关性进行降序排序
            // std::sort(inverted_list_all.begin(), inverted_list_all.end(),\
            //        [](const ns_index::InvertedElem &e1, const ns_index::InvertedElem &e2)\
            //        { return e1.weight > e2.weight; }); 
            std::sort(inverted_list_all.begin(), inverted_list_all.end(),\
                [](const InvertedElemPrint &e1, const InvertedElemPrint &e2)
                    { return e1.weight > e2.weight; }); 
            // 4.[构建]: 根据查找出的结果，构建json串---jsoncpp--- 通过jsoncpp完成序列化和反序列化
            Json::Value root;
            for(auto &item : inverted_list_all)
            {
                ns_index::DocInfo *doc = index->GetForwardIndex(item.doc_id);
                if(nullptr == doc)
                {
                    continue;
                }
                Json::Value elem;
                elem["title"] = doc->title;
                elem["desc"] = GetDesc(doc->content, item.words[0]); // content是文档的去标签内容，但不是我们想要的，我们想要的只是其中的一部分 TODO
                elem["url"] = doc->url;
                // for debug
                elem["id"] = static_cast<int>(item.doc_id);
                elem["weight"] = item.weight;
                root.append(elem);
            }

            // Json::StyledWriter writer; // 方便调试
            Json::FastWriter writer;
            *json_string = writer.write(root);
        }
    private: 
            std::string GetDesc(const std::string &content, const std::string &word)
            {
                const size_t prev_step = 50;
                const size_t next_step = 100;
                // size_t pos = content.find(word); // 不能用这个找
                
                // if(pos == std::string::npos)
                // {
                //     return "None1";
                // }
                auto iter = std::search(content.begin(), content.end(), word.begin(), word.end(), [](int x, int y)
                        { return std::tolower(x) == std::tolower(y); });
                if(iter == content.end()) return "None1";
                
                size_t pos = std::distance(content.begin(), iter);

                size_t start = 0;
                size_t end = content.size() - 1;
                
                // 注意无符号数的相减
                if(pos > start + prev_step) start = pos - prev_step;
                if(pos + next_step < end) end = pos + next_step;
                if(start > end) return "None2";

                std::string desc = content.substr(start, end -start);
                desc += "...";
                return desc;

            }
    private:
        ns_index::Index *index; // 供系统进行查找的索引
    };
}

8.编写搜索服务端`http_server.cc`

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Mqfck6cP-1677335193536)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230225103054959.png)]$

升级新版本gcc

// 安装scl
sudo yum install centos-release-scl scl-uitls-build
// 升级新版本gcc 7.0版本以上的gcc就没问题
sudo yum install -y devtoolset-7-gcc devtoolset-7-gcc-c++

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jV2FzPQ4-1677335193536)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230225103419155.png)]$

// 启动scl更新gcc
scl enable devtoolset-7 bash
// (选做)配置自启动更新gcc
vim ~/.bash_profile
// 在该文件的末尾放上语句 scl enable devtoolset-7 bash 即可

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4DGKcB4q-1677335193537)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230225103920920.png)]$

安装cpp-httplib

// cpp-httplib v0.7.15版本链接
https://gitee.com/liveever/cpp-httplib/tags?page=2

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-PwrxWZ6K-1677335193537)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230225104457020.png)]$
把下载下来的cpp-httplib-v0.7.15.zip上传到Linux服务器上
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-cBqlkpBo-1677335193537)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230225105437886.png)]$

使用unzip指令对压缩包进行解压

unzip cpp-httplib-v0.7.15.zip

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0kKqD4EL-1677335193537)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230225105912538.png)]$
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BTo1b5eB-1677335193538)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230225114938139.png)]$

测试cpp-httplib库

#include "cpp-httplib/httplib.h"
int main()
{
    httplib::Server svr;
    svr.Get("/test", [](const httplib::Request &req, httplib::Response &rsp{
        rsp.set_content("你好,这是一个测试!", "text/plain; charset=utf-8"); 
    });
    svr.listen("0.0.0.0", 8081);
    return 0;
}

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Qluv1CEo-1677335193538)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230225115520487.png)]$
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-NVwi2pmi-1677335193539)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230225115545801.png)]$

http_server.cc代码框架

#include "cpp-httplib/httplib.h"
#include "searcher.hpp" 

const std::string root_path = "./wwwroot";
const std::string input = "data/raw_html/raw.txt";
int main()
{
    ns_searcher::Searcher search;
    search.InitSearcher(input);
    
    httplib::Server svr;
    svr.set_base_dir(root_path.c_str());
    svr.Get("/s", [&search](const httplib::Request &req, httplib::Response &rsp)
            {   // res.set_content("你好 世界!", "text/plain; charset=utf-8"); 
                if(!req.has_param("word"))
                {
                    rsp.set_content("必须要有搜索关键字！", "text/plain; charset=utf-8");
                    return;
                }
                
                std::string word = req.get_param_value("word");
                std::cout << "用户在搜索：" << word << std::endl; 
                std::string json_string;

                search.Search(word, &json_string);
                rsp.set_content(json_string, "application/json");
            
            });
    
    svr.listen("0.0.0.0", 8081);

    return 0;
}

http_server搜索端测试：
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1McYm0Mc-1677335193539)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230225120442600.png)]$

9. 编写前端模块

前端基础说明:

我们boost搜索引擎的主要代码（后端）已经完成，我们接下来简单介绍一下前端。

了解前端三大件：html , css , javascript（js）

html：是网页的骨骼 — 负责网页结构

css: 网页的皮肉 — 负责网页的美观

js：网页的灵魂 — 负责动态效果，以及前后端交互

前端学习网站推荐：http://www.w3school.com.cn

编写前端代码工具选择及其安装
我们使用Vscode连接云服务器进行前端代码的编写，下面我们安装Vscode并进行连接。

1.进入Vscode官方网站进行下载
https://code.visualstudio.com/

2.下载相关插件
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tk2KOWdH-1677335193539)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230225121455518.png)]$
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tEvulm7g-1677335193540)(C:\Users\13916\AppData\Roaming\Typora\typora-user-images\image-20230225121639662.png)]$

示例：

【1】安装好Remote - SSH之后，按F1打开输入对话框。

【2】输入remote-ssh

【3】ssh user_name@ip

3 Html网页结构书写

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <script src="http://ajax.aspnetcdn.com/ajax/jQuery/jquery-2.1.1.min.js"></script>
    <title>Boost 搜索引擎</title>
    <style>
        /* 去掉网页中所有的内外边距，html的盒子摘要 */
        * {
            /* 设置外边距 */
            margin: 0;
            /* 设置内边距 */
            padding: 0;
        }

        body {
            height: 100%;
        }
        /* 类选择器 .加类名 */
        .container {
            width: 800px;
            margin: 0px auto;
            margin-top: 15px;
        }
        .container .search {
            width: 100%;
            height: 52px;
        }
        .container .search input {
            float: left;
            width: 600px;
            height: 50px;
            border: 1px solid black;
            border-right: none;
            padding-left: 10px;
            color: #ccc;
            font-size: 15px;
        }
        .container .search button {
            float: left;
            width: 120px;
            height: 52px;
            background: #4e6ef2;
            color: #FFF;
            font-size: 19px;
            font-family: Georgia, 'Times New Roman', Times, serif;
        }
        .container .result {
            width: 100%;
        }
        .container .result .item {
            margin-top: 15px;
        }
        .container .result .item a {
            /* 设置为块级元素，单独站一行 */
            display: block;
            /* 去掉标题的下划线 */
            text-decoration: none;
            /* 设置字体大小 */
            font-size: 20px;
            /* 设置字体的颜色 */
            color: #4e6ef2; 
        }
        .container .result .item a:hover {
            /* 设置动态效果 */
            text-decoration: underline;
        }
        .container .result .item p {
            margin-top: 5px;
            font-size: 16px;
            font-family:'Lucida Sans', 'Lucida Sans Regular', 'Lucida Grande', 'Lucida Sans Unicode', Geneva, Verdana, sans-serif;
        }
        .container .result .item i {
            display: block;
            /* 取消url斜体 */
            font-style: normal;
            color: green;
        }
    </style>
</head>
<body>
    <div class="container">
        <div class="search">
            <input type="text" value="输入搜索关键字...">
            <button onclick="Search()">搜索一下</button>
        </div>   
        <div class="result">
            <!-- 动态生成网页内容 -->

            <!-- <div class="item">
                <a href="#">这是标题</a>
                <p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p>
                <i>https://fanyi.baidu.com/translate?aldtype=16047&query=Memory+Management+Unit&keyfrom=baidu&smartresult=dict&lang=auto2zh#pt/zh/param</i>
            </div> 
            <div class="item">
                <a href="#">这是标题</a>
                <p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p>
                <i>https://fanyi.baidu.com/translate?aldtype=16047&query=Memory+Management+Unit&keyfrom=baidu&smartresult=dict&lang=auto2zh#pt/zh/param</i>
            </div>   
            <div class="item">
                <a href="#">这是标题</a>
                <p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p>
                <i>https://fanyi.baidu.com/translate?aldtype=16047&query=Memory+Management+Unit&keyfrom=baidu&smartresult=dict&lang=auto2zh#pt/zh/param</i>
            </div> 
            <div class="item">
                <a href="#">这是标题</a>
                <p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p>
                <i>https://fanyi.baidu.com/translate?aldtype=16047&query=Memory+Management+Unit&keyfrom=baidu&smartresult=dict&lang=auto2zh#pt/zh/param</i>
            </div> 
            <div class="item">
                <a href="#">这是标题</a>
                <p>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要</p>
                <i>https://fanyi.baidu.com/translate?aldtype=16047&query=Memory+Management+Unit&keyfrom=baidu&smartresult=dict&lang=auto2zh#pt/zh/param</i>
            </div>  -->
        </div>  
    </div>
    <script>
        function Search()
        {
            // 是浏览器的一个弹出框
            // alert("hello js");
            // 1.提取数据 JQuery $可以理解为JQuery的别称
            let query = $(".container .search input").val();
            console.log("query = " + query); // console是浏览器的控制台，可以用它来查看js的数据

            // 2.发起http请求，ajax: 属于一个和后端进行数据交互的函数，JQuery中的
            $.ajax(
            {
                type: "GET", 
                url: "/s?word=" + query,
                success: function(data)
                {
                    console.log(data);
                    BuildHtml(data);
                }
            });
        }

        function BuildHtml(data)
        {
            // 获取html中的result标签
            let result_lable = $(".container .result");
            // 清空历史搜索结果
            result_lable.empty();
            for( let elem of data)
            {
                console.log(elem.title);
                console.log(elem.url);
                let a_lable = $("<a>",
                {
                    text: elem.title,
                    href: elem.url,
                    target: "_blank" // 跳转到新的网页
                });
                let p_lable = $("<p>", 
                {
                    text: elem.desc
                });
                let i_lable = $("<i>", 
                {
                    text: elem.url
                });

                let div_lable = $("<div>", 
                {
                    class: "item"
                });

                a_lable.appendTo(div_lable);
                p_lable.appendTo(div_lable);
                i_lable.appendTo(div_lable);
                div_lable.appendTo(result_lable);
            }
        }
    </script>
</body>
</html>