查询词提示系统的简单实现

原创 2015年11月18日 08:30:47

问题来源:
闲逛到牛客网,这是百度2016研发工程师笔试题(五)中的最后一道编程题,一共12道,一小时内完成;我只对了5道,25分。下面的代码是线下做的,断断续续花了大半天(中间国足客场0:0香港,终于杀死悬念)。

问题原文:
设计一个查询词提示系统

查询词提升是现代搜索引擎中广泛使用的一种技术,当用户输入查询词前缀时,会给出一系列相关的查询词推荐,例如在搜索框内输入”中国”,会提升”中国好声音”,”中国银行”, > “中国联通”等,尝试设计一个查询词提示系统,回答以下问题:
1. 给定一个查询词集合,用何种数据结构和算法来构建最基本的提示系统?要求输入中文和拼音都能正常工作
2. 用户输入的前缀下可能有很多可提示的查询词,如何对这些查询词进行排序,将用户选择概率更高的词放在前面?

我的思路:

Use a header item for a search prefex and it points to a doubly linked list for search keys that share the same search prefex
class SearchKeySystem provide interfaces for adding prefex, getting tips for one prefext, stimulating a search operation, and dumping the important data
The basic function works and still more to be done (marked as “TBD” in comments)
The most difficult part is the design and the implementation of maintaining the doubly linked list to be always sorted

Code for the Search Key system:

#pragma once
//Thomas Tang 2015-11-18 @cd

/*
查询词提升是现代搜索引擎中广泛使用的一种技术,当用户输入查询词前缀时,会给出一系列相关的查询词推荐,例如在搜索框内输入"中国",会提示"中国好声音",
"中国银行", "中国联通"等,尝试设计一个查询词提示系统,回答以下问题:
1.给定一个查询词集合,用何种数据结构和算法来构建最基本的提示系统?要求输入中文和拼音都能正常工作
2.用户输入的前缀下可能有很多可提示的查询词,如何对这些查询词进行排序,将用户选择概率更高的词放在前面?

Summary: use a header item for a search prefex and it points to a doubly linked list for search keys that share the same search prefex
class SearchKeySystem provide interfaces for adding prefex, getting tips for one prefext, stimulating a search operation, and dumping the important data
The basic function works and still more to be done (marked as "TBD" in comments)
The most difficult part is the design and the implementation of maintaining the doubly linked list to be always sorted
*/

#include <iostream>
#include <string>
#include <vector>

std::string GetPinYin(const std::string &key) //may be case sensitive or maybe case insensitive, may be only the first letter is capitalized
{
    return key;
}

//Data for the search prefex(work as the Header item)
struct SearchPrefex
{
    std::string Prefex; //prefex for search, by which you get the tips
    std::string PrefexPinYin; //case insensitive
    std::string SearchKeys; //a string of all the search keys separated by ; and it's always sorted by frequency and it's always updated
    struct SearchKey* FirstSearchKey; //points to its first <SearchKey> item
    SearchPrefex(const std::string &prefex)
    {
        Prefex = prefex;
        PrefexPinYin = GetPinYin(prefex);
        SearchKeys="";
        FirstSearchKey = NULL;
    }
};

//Data for the search keys
struct SearchKey
{
    std::string Key;  //keys used for actual search
    std::string KeyPinYin; //case insensitive
    unsigned long Frenquncy; //how many times it is used for actual search
    struct SearchKey* Next; //next SearchKey with the same prefex
    struct SearchKey* Previous; //previous SearchKey with the same prefex, NULL if it's the first item for the prefex
    SearchKey(const std::string &key)
    {
        Key = key;
        KeyPinYin = GetPinYin(key);
        Frenquncy = 1;
        Next = NULL;
        Previous = NULL;
    }
    /*bool operator>(const SearchKey &anotherkey)
    {
    if(Frenquncy >anotherkey.Frenquncy)
    return true;
    else
    return false;
    }*/

};

class SearchKeySystem
{


    //maintain the search prefex and keys in multiple linked lists, 
    //each linked list has a header item (of type SearchPrefex and it has the tip string <SearchKeys>) and multiple SearchKey items sorted by the Frequency
    //whenever the SearchKey items are updated (as Frequency changed or new items added), the <SearchKeys> in the header item get changed   
    //TBD: This vector should always be sorted by the length of the search prefex from the longest to the shortest
    std::vector<SearchPrefex*> AllSearchKeys;

    void UpdateSearchKeyHelper(SearchPrefex *prefexItem, const std::string &key)
    {
        std::cout << "Helper: before update the prefex is:  "<< prefexItem->Prefex <<". The keys are" <<prefexItem->SearchKeys << std::endl;
        SearchKey *firstkeyitem = prefexItem->FirstSearchKey;
        //Header only
        if(firstkeyitem==NULL)
        {
            prefexItem->FirstSearchKey = new SearchKey(key);
            prefexItem->SearchKeys = key;
            std::cout << "Helper: after update the prefex is:  "<< prefexItem->Prefex <<". The keys are" <<prefexItem->SearchKeys << std::endl;
        }
        else //has other SearchKey items
        {
            SearchKey *item = firstkeyitem; 
            while(item->Key != key && item->Next!= NULL)
                item = item->Next;

            //search for SearchKey item that equals to <key>
            if(item->Key == key) //found the SearchKey item
            {
                item->Frenquncy++;
                //sort it by checking with previous items
                SearchKey *updatedItem = item;

                //if it is the first key item, no more changes needed, else need to do more checking
                //in fact, this check is not necessary, as the following while cover it
                /*if(updatedItem != firstkeyitem) 
                {*/
                while(item->Previous!=NULL && item->Previous->Frenquncy < updatedItem->Frenquncy)
                    item = item->Previous;
                if(item!=updatedItem)//found a smaller Frequency one, need to switch the two, 
                {
                    SearchKey *Previous_of_updateditem = updatedItem->Previous;
                    SearchKey *Next_of_updateditem = updatedItem->Next;

                    SearchKey *Previous_of_item = item->Previous;
                    SearchKey *Next_of_item = item->Next;
                    if(Next_of_item = updatedItem) //two adjacent items to be switched
                    {
                        updatedItem->Previous = Previous_of_item; //could be null
                        if(Previous_of_item==NULL) 
                            prefexItem->FirstSearchKey = updatedItem;
                        else
                            Previous_of_item->Next = updatedItem;
                        updatedItem->Next = item;

                        item->Previous = updatedItem;
                        item->Next = Next_of_updateditem;
                        if(Next_of_updateditem!=NULL) Next_of_updateditem->Previous = item;
                    }                   
                    else
                    {                       

                        updatedItem->Previous = Previous_of_item;
                        if(Previous_of_item==NULL) 
                            prefexItem->FirstSearchKey = updatedItem;
                        else
                            Previous_of_item->Next = updatedItem;
                        updatedItem->Next = Next_of_item;
                        if(Next_of_item!=NULL) Next_of_item->Previous = updatedItem;


                        Previous_of_updateditem->Next = item;
                        item->Previous = Previous_of_updateditem;
                        item->Next = Next_of_updateditem;                       
                        if(Next_of_updateditem!=NULL) Next_of_updateditem->Previous = item;
                    }

                    //update the Tip string for the new sort

                    SearchKey *keyItem = prefexItem->FirstSearchKey;
                    std::string tempstr = "";
                    while(keyItem != NULL)
                    {
                        tempstr = (tempstr==""? keyItem->Key: tempstr+";"+keyItem->Key);
                        keyItem = keyItem->Next;
                    }
                    prefexItem->SearchKeys = tempstr;
                    std::cout << "Helper: after update the prefex is:  "<< prefexItem->Prefex <<". The keys are" <<prefexItem->SearchKeys << std::endl;
                }

            }
            else if(item->Next == NULL) //not found the search key, append it to the end
            {
                item->Next = new SearchKey(key);
                item->Next->Previous = item;
                //no need to sort it, also need to update the SearchKeys string
                prefexItem->SearchKeys = prefexItem->SearchKeys + ";" +key;
                std::cout << "Helper: after update the prefex is:  "<< prefexItem->Prefex <<". The keys are" <<prefexItem->SearchKeys << std::endl;
            }
        }
    }

public:
    SearchKeySystem()
    {}

    ~SearchKeySystem()
    {
        std::vector<SearchPrefex*>::iterator it= AllSearchKeys.begin();
        for(; it!=AllSearchKeys.end(); it++)
        {
            SearchKey *keyItem = (*it)->FirstSearchKey;
            SearchKey * tempkeyItem = NULL;
            delete *it;
            while(keyItem != NULL)
            {
                tempkeyItem = keyItem->Next;
                delete keyItem;
                keyItem = tempkeyItem;
            }
        }
    }

    //The function dumps all the important data in the search system
    void DumpAllTheData()
    {
        std::vector<SearchPrefex*>::iterator it= AllSearchKeys.begin();
        for(; it!=AllSearchKeys.end(); it++)
        {
            std::cout << (*it)->Prefex << "---" << (*it)->SearchKeys << std::endl;
        }
    }

    //This function stimulates one actual search operation of <key>
    //It may triggers the changes in the search system (to update its frequency, or to update the data in SearchPrefex)
    void UpdateSearchKey(const std::string &key)
    {
        std::cout << "did a search operation for "<< key << std::endl;
        std::vector<SearchPrefex*>::iterator it= AllSearchKeys.begin();
        size_t nsize = key.size();
        for(; it!=AllSearchKeys.end(); it++)
        {
            if( nsize > (*it)->Prefex.size())
                nsize =  (*it)->Prefex.size();
            if(!key.compare(0, nsize, (*it)->Prefex.c_str(), nsize) )
                break; 
        }
        //match one prefex or not match to any prefex(use it as a prefex and also a key)
        if(it!=AllSearchKeys.end())
        {
            UpdateSearchKeyHelper(*it, key);
        }
        else
        {           
            AllSearchKeys.push_back(new SearchPrefex(key));
            UpdateSearchKeyHelper(AllSearchKeys[AllSearchKeys.size()-1], key);
            std::cout << "added a new prefex for "<< key << std::endl;
        }   
    }

    //The Function adds a new search <prefex> into the system
    //ignore it if already in the system, add it if it's a completely new one(without appending any SearchKey Items to it)
    void AddSearchPrefex(const std::string &prefex)
    {
        std::vector<SearchPrefex*>::iterator it= AllSearchKeys.begin();
        for(; it!=AllSearchKeys.end(); it++)
        {
            if((*it)->Prefex == prefex)
                break; 
        }
        //found it or not found it in the vector of prefex list
        if(it==AllSearchKeys.end())//not found      
        {       
            AllSearchKeys.push_back(new SearchPrefex(prefex));      
            std::cout << "added a new prefex: "<< prefex << std::endl;
        }   
    }

    //The function accepts a search <prefex> (not actually hit ENTER to do the search), 
    //Returns the Tips(return empty string if no tips available; return a ; separated string for availe tips)
    //<prefex> could be Chinese characters or PinYin(Not supported but minor changes can do)
    //The tips are sorted by the frequency used in real search.
    std::string GetTips(std::string &prefex)
    {
        std::cout << "get tips for "<< prefex << std::endl;
        std::vector<SearchPrefex*>::iterator it= AllSearchKeys.begin();
        for(; it!=AllSearchKeys.end(); it++)
        {
            if((*it)->Prefex == prefex)
                break; 
        }
        //found it or not found it in the vector of prefex list
        if(it!=AllSearchKeys.end())
        {
            return ((*it)->SearchKeys);         
        }
        else
        {           
            AllSearchKeys.push_back(new SearchPrefex(prefex));
            std::cout << "added a new prefex: "<< prefex << std::endl;
            return "";
        }       
    }

};

Code for using the search key system:

#include <stdio.h>
#include <iostream>
#include "SearchKeys.h"

int main(int argc, char* argv[])
{
    SearchKeySystem *pSearchKeySystem = new SearchKeySystem();

    char ch = '0';

    while(ch != '5')
    {
        std::cout << "1. Add new search prefex\n"
            "2. Display tips for input search prefex\n"
            "3. To simulate the search operation\n"
            "4. Dump the collected data\n"
            "5. Exit\n";
        std::cin >> ch; 
        std::cin.get();

        switch(ch)
        {
        case '1': 
            {
                std::cout << "Input prefex you want to add (one per a line), use RET to return to upper menu\n";
                std::string prefex="";
                do
                {
                    std::getline(std::cin, prefex);
                    if(prefex == "RET")
                        break;
                    else
                    {
                        pSearchKeySystem->AddSearchPrefex(prefex);
                    }
                }
                while(true);
            }
            break;
        case '2':
            {
                std::cout << "Input prefex and its tips will be displayed, use RET to return to upper menu\n";
                std::string prefex="";
                do
                {
                    std::getline(std::cin, prefex);
                    if(prefex == "RET")
                        break;
                    else
                    {
                        std::cout << pSearchKeySystem->GetTips(prefex) << std::endl;
                    }
                }
                while(true);

            }
            break;
        case '3':
            {
                std::cout << "Input the search key, use RET to return to upper menu\n";
                std::string key="";
                do
                {
                    std::getline(std::cin, key);
                    if(key == "RET")
                        break;
                    else
                    {
                        pSearchKeySystem->UpdateSearchKey(key);
                    }
                }
                while(true);

            }
            break;

        case '4':
            {
                std::cout << "Here is all the data\n";
                pSearchKeySystem->DumpAllTheData();             
            }
            break;
        case '5':
            {
                std::cout << "ByeBye!\n";           
            }
            break;
        default:
            std::cout << "Wrong choice! Please try again!\n";
            break;
        }
    }

        delete pSearchKeySystem;
        return 0;
    }

More to be done:

The search prefex list is not sorted in the code

//This vector should always be sorted by the length of the search prefex from the longest to the shortest
std::vector<SearchPrefex*> AllSearchKeys;

PinYin(拼音)is not supported right now

It is possible to use API to get pinyin for each Chinese character from 金山词霸 or any other sources? Interesting! I will look into that later.

设计一个查询词提示系统

查询词提升是现代搜索引擎中广泛使用的一种技术,当用户输入查询词前缀时,会给出一系列相关的查询词推荐,例如在搜索框内输入"中国",会提升"中国好声音","中国银行", "中国联通"等,尝试设计一个查询词...
  • abc7845129630
  • abc7845129630
  • 2016年05月06日 10:27
  • 550

[置顶]搜索引擎-一种提示词推荐算法

搜索引擎可以说目前所有互联网应用里技术含量最高的一种。尽管应用形式比较简单:用户输入查询词,搜索引擎返回搜索结果。但是,搜索引擎需要达到的目标:更全、更快、更准。如何让搜索结果更准确始终是搜索引擎的一...
  • zxh19800626
  • zxh19800626
  • 2015年12月05日 23:47
  • 562

solr搜索提示,将词添加到词库中

solr wiki: http://wiki.apache.org/solr/Suggester/      实现对搜索时关键字的提示,同时扩展,将词库中未有的关键词添加到词库中,目前不支持去重功能...
  • yeshenrenjin
  • yeshenrenjin
  • 2013年07月14日 15:19
  • 1959

淘宝天猫详情页新广告法违禁词在线查询

新广告法的上线实行,限制了很多的违规词,敏感词的使用,极限用语的处罚由原来的退一赔三变更为罚款二十万元起!因为让电商卖家们每日过的如履薄冰,生怕详情中出现违禁词,遭到投诉,罚款,那么如果才能高效的确定...
  • wenbol123
  • wenbol123
  • 2017年09月12日 09:45
  • 3313

elasticsearch 实现联想输入搜索

通常,在项目中需要联想输入(即输入关键字,提示相关词条,类似百度google的搜索)的需求,可能大家都是用的数据库的like '%关键字%‘来实现。但是这样实现有几个问题。 第一、这样的搜索无论...
  • LanSeTianKong12
  • LanSeTianKong12
  • 2017年02月09日 11:02
  • 1752

淘宝详情页广告法检测工具 淘宝违规词检测、查询 淘宝详情页违规词检测、查询

做为一个小卖家,有的时候宝贝刚做起来有一些流量,(特别是今年)不清楚因为什么原因,就被淘宝判作违规宣传,不但强制删除宝贝,还对店铺进行扣分,致使小卖家们欲哭无泪,对店铺流量造成巨大损失。严重影响店铺日...
  • wenbol123
  • wenbol123
  • 2017年10月09日 19:21
  • 682

新广告法违规词、敏感词在线检测工具 淘宝违规词检测、查询

我是一个小卖家,有的时候宝贝刚做起来有一些流量,(特别是今年)不清楚因为什么原因,就被淘宝判作违规宣传,不但强制删除宝贝,还对店铺进行扣分,对店铺流量造成巨大损失。如何确定宣传文案中没有新广告法禁用敏...
  • Happiness_yi
  • Happiness_yi
  • 2017年05月05日 11:12
  • 20340

搜索提示是如何实现的

经典的想法就是一个Trie的 keysWithPrefix 问题。 更高级的,进一步考察,keysWithPrefix需要做prefix下的inOrder遍历,但是每当用户type下一个字符,那个提示...
  • binling
  • binling
  • 2015年09月21日 13:25
  • 480

新广告法违规词、敏感词在线检测工具

新广告法的上线实行,限制了很多的违规词,敏感词的使用,极限用语的处罚由原来的退一赔三变更为罚款二十万元起!如何确定广告中是否存在有敏感词呢?小龙经过多方努力,终于开发出了新广告法违规词、敏感词在线检测...
  • English0523
  • English0523
  • 2016年12月21日 09:59
  • 74024

仿百度联想词列表

第一步:将需要联想的词在页面加载的时候就加载到页面 例如:放到隐藏于中 function in...
  • zhang_Red
  • zhang_Red
  • 2012年12月19日 10:06
  • 1938
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:查询词提示系统的简单实现
举报原因:
原因补充:

(最多只允许输入30个字)