suggestTree-实现rank-ordered autocomplete suggestions的数据结构

 

suggestTree-实现rank-ordered autocomplete suggestions的数据结构

分类: 算法和数据结构学习   255人阅读  评论(0)  收藏  举报

通过该数据结构可以实现:当用户输入一个字符串,返回以这个字符串为前缀的排名最靠前的k个字符串。

和现在搜索引擎提供的关键词提示功能一样。

该数据结构以Ternary Search TreeTST)为基础实现的。

关于Ternary Search Tree是什么,可以参看前面的博文http://blog.csdn.net/suwei19870312/article/details/7467522

 

SuggestTree树的节点node

class Node
{
         public:
                   vector<string> list;                           //用于记录其孩子节点所能代表的字符串的集合,这些字符串[0-end]的字符都是相同的,既具有相同的前缀。
                   unsigned int count;                         //用于记录list中字符串的个数
                   unsigned int end;                            //用于记录list中字符串前end个字符是相同的
                   Node* left, *mid, *right;           //三分搜索树的3个子节点指针

}

 

SuggestTree树的构造:

如何构建SuggestTree,构建SuggestTree分为两个步骤:

1. 以给定的字符串集合构建TST tree.

2. 在构建好TST tree之后,把字符串加入到树的各个node中。

void Build(map<string, int>& iMap)

         {

                   root = NULL;

                   vectorPairType lMapVector;

                   lMapVector.insert(lMapVector.begin(), iMap.begin(), iMap.end());

                   //sort lMapVector by pair->key

                   //for the balance of TST tree

                   sort(lMapVector.begin(), lMapVector.end(), ComparePairKey());

                   hBuildTST(lMapVector, 0, lMapVector.size() - 1);

                   //sort lMapVector by pair->value

                   sort(lMapVector.begin(), lMapVector.end(), ComparePairValue());

                   vectorPairType::iterator ivter = lMapVector.begin();

                   for(; ivter != lMapVector.end(); ivter ++)

                   {

                            addToList(ivter->first);

                   }

         }

输入的是一个<关键字,排名>map

首先对map中的pair关键字key排序,之后以这个顺序递归的构建TST,目的是为了构建平衡的TST。防止由于插入字符串顺序的不同,而导致TST退化成一个单边的Tree,这样对于查找的性能不是很好。

接着,对map中的pair排名作为key排序,排序的目的是为了把排名最靠前的k个字符串放入到TSTnode中。

 

构建TST tree的流程:

void hBuildTST(vectorPairType& irVP,int min,int max)

         {

                   if(min <= max)

                   {

                            int mid = (min + max) / 2;

                            insert(irVP[mid].first);

                            hBuildTST(irVP, min, mid -1);

                            hBuildTST(irVP, mid + 1, max);

                   }

         }

前面已经说了构建TST tree的字符串集合是以”关键字”为key排序的一个list,递归构造TST Tree,每次取出[min, max]区域中的中间关键字插入到TST中。

 

插入节点流程:

void insert(string& suggestion)

         {

                   if(root == NULL)

                   {

                            root = new Node(suggestion);

                            return;

                   }

 

                   Node* lpn = root;

                   int i = 0;

                   while(true)

                   {

                            string s = lpn->list[0];

                            if(s.at(i) > suggestion.at(i))

                            {

                                     if(lpn->left == NULL)

                                     {

                                               lpn->left = new Node(suggestion);

                                               return;

                                     }

 

                                     lpn = lpn->left;

                            }

                            else if(s.at(i) < suggestion.at(i))

                            {

                                     if(lpn->right == NULL)

                                     {

                                               lpn->right = new Node(suggestion);

                                               return;

                                     }

                                     lpn = lpn->right;

 

                            }

                            else

                            {

                                     while( ++i < lpn->end)

                                     {

                                               if(i == suggestion.length() || s.at(i) != suggestion.at(i))

                                               {

                                                        lpn->mid =new Node(*lpn);

                                                        lpn->end = i;

                                                        break;

                                               }

                                     }

                                     lpn->count ++;

                                     if(i == suggestion.length())

                                               return;

 

                                     if(lpn->mid == NULL)

                                     {

                                               lpn->mid = new Node(suggestion, lpn->list);

                                               return;

                                     }

                                     lpn = lpn->mid;

                            }

                   }

         }

Insert的 workflow和普通的构建TST的流程很相似:

以suggestion字符串和Tree中已有的node的list[0]字符串做比较。

1.       如果Suggestion[i]小于 list[0][i], node节点更换为其左孩子,如果左孩子为空,直接以suggestion字符串构建node,作为当前节点的左孩子。

2.       如果Suggestion[i]大于list[0][i], node节点更换为其右孩子,如果有孩子为空,直接以suggestion字符串构造node,作为当前节点的右孩子。

3.       如果suggestion[i] == list[0][i], 这时候如果suggestion是list[0]的前缀,那么直接返回,如果list[0]是suggestion的前缀,那么node节点跟换为其中孩子,如果中孩子为,以suggestion和当前节点的list构造新的节点,作为当前节点的中孩子。

 

把关键字加入到node中的流程:

void addToList(string& suggestion)

         {

                   Node* lpn = root;

                   int i = 0;

                   while(true)

                   {

                            string s = lpn->list[0];

                            if(s.at(i) > suggestion.at(i))

                                     lpn = lpn->left;

                            else if(s.at(i) < suggestion.at(i))

                                     lpn = lpn->right;

                            else

                            {

                                     if(lpn->count > lpn->list.size())

                                     {

                                               lpn->list.resize(min(lpn->count, k));

                                               lpn->list[0] = suggestion;

                                               lpn->count = 1;

                                     }

                                     elseif(lpn-> count < lpn->list.size())

                                     {

                                               lpn->list[lpn->count++] = suggestion;

                                     }

                                     i = lpn->end;

                                     if(i == suggestion.length())

                                               return;

                                     lpn = lpn->mid;

                            }

                   }

         }

addToList的流程和insert()的流程相似,不同的是,这个过程并不修改TST tree的结构,只是填写node里的数据。

 

 给定前缀字符串,在suggestTree中查找的流程:

Node* hgetBestSuggesttions(string& prefix)

         {

                   if(prefix.length() == 0)

            return NULL;

        Node* lpn = root;

        int i = 0;

        while(lpn != NULL) {

            string s = lpn->list[0];

 

            if(s.at(i) > prefix.at(i))

                lpn = lpn->left;

            else if(s.at(i) < prefix.at(i))

                lpn = lpn->right;

            else{

                while(++i < lpn->end)

                    if(i == prefix.length())

                        return lpn;

                    else if(s.at(i) != prefix.at(i))

                        return NULL;

                if(i == prefix.length())

                    return lpn;

                lpn = lpn->mid;

            }

        }

         }

通过对比prefix和各个节点的list[0][0-end],如果找到和prefix相同的list[0][0-end],直接返回当前节点,该节点中list就是想要的排名靠前的相同前缀字符串集合。


 

suggestTree-实现rank-ordered autocomplete suggestions的数据结构

分类: 算法和数据结构学习   255人阅读  评论(0)  收藏  举报

通过该数据结构可以实现:当用户输入一个字符串,返回以这个字符串为前缀的排名最靠前的k个字符串。

和现在搜索引擎提供的关键词提示功能一样。

该数据结构以Ternary Search TreeTST)为基础实现的。

关于Ternary Search Tree是什么,可以参看前面的博文http://blog.csdn.net/suwei19870312/article/details/7467522

 

SuggestTree树的节点node

class Node
{
         public:
                   vector<string> list;                           //用于记录其孩子节点所能代表的字符串的集合,这些字符串[0-end]的字符都是相同的,既具有相同的前缀。
                   unsigned int count;                         //用于记录list中字符串的个数
                   unsigned int end;                            //用于记录list中字符串前end个字符是相同的
                   Node* left, *mid, *right;           //三分搜索树的3个子节点指针

}

 

SuggestTree树的构造:

如何构建SuggestTree,构建SuggestTree分为两个步骤:

1. 以给定的字符串集合构建TST tree.

2. 在构建好TST tree之后,把字符串加入到树的各个node中。

void Build(map<string, int>& iMap)

         {

                   root = NULL;

                   vectorPairType lMapVector;

                   lMapVector.insert(lMapVector.begin(), iMap.begin(), iMap.end());

                   //sort lMapVector by pair->key

                   //for the balance of TST tree

                   sort(lMapVector.begin(), lMapVector.end(), ComparePairKey());

                   hBuildTST(lMapVector, 0, lMapVector.size() - 1);

                   //sort lMapVector by pair->value

                   sort(lMapVector.begin(), lMapVector.end(), ComparePairValue());

                   vectorPairType::iterator ivter = lMapVector.begin();

                   for(; ivter != lMapVector.end(); ivter ++)

                   {

                            addToList(ivter->first);

                   }

         }

输入的是一个<关键字,排名>map

首先对map中的pair关键字key排序,之后以这个顺序递归的构建TST,目的是为了构建平衡的TST。防止由于插入字符串顺序的不同,而导致TST退化成一个单边的Tree,这样对于查找的性能不是很好。

接着,对map中的pair排名作为key排序,排序的目的是为了把排名最靠前的k个字符串放入到TSTnode中。

 

构建TST tree的流程:

void hBuildTST(vectorPairType& irVP,int min,int max)

         {

                   if(min <= max)

                   {

                            int mid = (min + max) / 2;

                            insert(irVP[mid].first);

                            hBuildTST(irVP, min, mid -1);

                            hBuildTST(irVP, mid + 1, max);

                   }

         }

前面已经说了构建TST tree的字符串集合是以”关键字”为key排序的一个list,递归构造TST Tree,每次取出[min, max]区域中的中间关键字插入到TST中。

 

插入节点流程:

void insert(string& suggestion)

         {

                   if(root == NULL)

                   {

                            root = new Node(suggestion);

                            return;

                   }

 

                   Node* lpn = root;

                   int i = 0;

                   while(true)

                   {

                            string s = lpn->list[0];

                            if(s.at(i) > suggestion.at(i))

                            {

                                     if(lpn->left == NULL)

                                     {

                                               lpn->left = new Node(suggestion);

                                               return;

                                     }

 

                                     lpn = lpn->left;

                            }

                            else if(s.at(i) < suggestion.at(i))

                            {

                                     if(lpn->right == NULL)

                                     {

                                               lpn->right = new Node(suggestion);

                                               return;

                                     }

                                     lpn = lpn->right;

 

                            }

                            else

                            {

                                     while( ++i < lpn->end)

                                     {

                                               if(i == suggestion.length() || s.at(i) != suggestion.at(i))

                                               {

                                                        lpn->mid =new Node(*lpn);

                                                        lpn->end = i;

                                                        break;

                                               }

                                     }

                                     lpn->count ++;

                                     if(i == suggestion.length())

                                               return;

 

                                     if(lpn->mid == NULL)

                                     {

                                               lpn->mid = new Node(suggestion, lpn->list);

                                               return;

                                     }

                                     lpn = lpn->mid;

                            }

                   }

         }

Insert的 workflow和普通的构建TST的流程很相似:

以suggestion字符串和Tree中已有的node的list[0]字符串做比较。

1.       如果Suggestion[i]小于 list[0][i], node节点更换为其左孩子,如果左孩子为空,直接以suggestion字符串构建node,作为当前节点的左孩子。

2.       如果Suggestion[i]大于list[0][i], node节点更换为其右孩子,如果有孩子为空,直接以suggestion字符串构造node,作为当前节点的右孩子。

3.       如果suggestion[i] == list[0][i], 这时候如果suggestion是list[0]的前缀,那么直接返回,如果list[0]是suggestion的前缀,那么node节点跟换为其中孩子,如果中孩子为,以suggestion和当前节点的list构造新的节点,作为当前节点的中孩子。

 

把关键字加入到node中的流程:

void addToList(string& suggestion)

         {

                   Node* lpn = root;

                   int i = 0;

                   while(true)

                   {

                            string s = lpn->list[0];

                            if(s.at(i) > suggestion.at(i))

                                     lpn = lpn->left;

                            else if(s.at(i) < suggestion.at(i))

                                     lpn = lpn->right;

                            else

                            {

                                     if(lpn->count > lpn->list.size())

                                     {

                                               lpn->list.resize(min(lpn->count, k));

                                               lpn->list[0] = suggestion;

                                               lpn->count = 1;

                                     }

                                     elseif(lpn-> count < lpn->list.size())

                                     {

                                               lpn->list[lpn->count++] = suggestion;

                                     }

                                     i = lpn->end;

                                     if(i == suggestion.length())

                                               return;

                                     lpn = lpn->mid;

                            }

                   }

         }

addToList的流程和insert()的流程相似,不同的是,这个过程并不修改TST tree的结构,只是填写node里的数据。

 

 给定前缀字符串,在suggestTree中查找的流程:

Node* hgetBestSuggesttions(string& prefix)

         {

                   if(prefix.length() == 0)

            return NULL;

        Node* lpn = root;

        int i = 0;

        while(lpn != NULL) {

            string s = lpn->list[0];

 

            if(s.at(i) > prefix.at(i))

                lpn = lpn->left;

            else if(s.at(i) < prefix.at(i))

                lpn = lpn->right;

            else{

                while(++i < lpn->end)

                    if(i == prefix.length())

                        return lpn;

                    else if(s.at(i) != prefix.at(i))

                        return NULL;

                if(i == prefix.length())

                    return lpn;

                lpn = lpn->mid;

            }

        }

         }

通过对比prefix和各个节点的list[0][0-end],如果找到和prefix相同的list[0][0-end],直接返回当前节点,该节点中list就是想要的排名靠前的相同前缀字符串集合。


1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值