搜索引擎之索引建立原理

<p><span style=""></span>
</p>
<p></p>
<p></p>

<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 18pt;"><span style="">我们假设有这样一张商业信息数据表,我们用什么样的算法和结构(索引技术)可以快速方便实现搜索功能呢?</span></p>
<p>
</p>
<table class="MsoNormalTable" style="margin: auto auto auto 5.4pt; border-collapse: collapse;" border="1" cellspacing="0" cellpadding="0"><tbody>
<tr style="height: 15pt;">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Sid</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 108pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="144" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Subject</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Type</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Price</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">id</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Description</span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">PostTime</span></span></p>
</td>
</tr>
<tr style="height: 15pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">0</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 108pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="144" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Sell apple</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-family: Times New Roman;"><span style="font-size: 9pt;" lang="EN-US">Sale</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">50</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">101</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US">We are one of the largest …</span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">2002-10-02</span></span></p>
</td>
</tr>
<tr style="height: 14.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">1</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 108pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="144" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Buy Digital Camera</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Buy</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">200</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">102</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US">Our company mobile-point is situated in the …</span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">2003-02-10</span></span></p>
</td>
</tr>
<tr style="height: 13.5pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">2</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 108pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="144" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Sell dog shoes</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-family: Times New Roman;"><span style="font-size: 9pt;" lang="EN-US">Sale</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">45</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">103</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Dog Shoes Model Number: 02GLPS074 Place of …</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">2003-12-18</span></span></p>
</td>
</tr>
<tr style="height: 17.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">3</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 108pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="144" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Buy Toys And Jewelry</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">Buy</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 36pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="48" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">1000</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">104</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">I am interested in items that can be ordered in small …</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: 9pt;" lang="EN-US"><span style="font-family: Times New Roman;">2003-08-20</span></span></p>
</td>
</tr>
</tbody></table>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="">(商业信息数据表)</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="">一般我们需要的查询的需求分为下面两类:</span></p>
<p class="MsoNormal" style=""><span style="" lang="EN-US"><span style=""><span style="font-family: Times New Roman;">a.<span style='font: 7pt "Times New Roman";'><span style="font-size: small;"> </span></span></span></span></span><span style="">希望通过某个关键字找到所需信息(根据关键字来搜索)</span></p>
<p class="MsoNormal" style=""><span style="" lang="EN-US"><span style=""><span style="font-family: Times New Roman;">b.<span style='font: 7pt "Times New Roman";'><span style="font-size: small;"> </span></span></span></span></span><span style="">希望能够根据</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">type, price, PostTime, id</span></span><span style="">等字段进行检索(单字段信息检索)</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><strong style=""><span style="" lang="EN-US"><span style="font-family: Times New Roman;">1</span></span></strong><strong style=""><span style="">.建立索引</span></strong><strong style=""><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span></span></span></strong></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="">对于</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">a</span></span><span style="">需求:</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 21pt;"><span style="">我们使用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;"> hashtable</span></span><span style="">来达到快速检索的目的,既将可能用于查询的关键字作为</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">hashtable</span></span><span style="">的</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key(</span></span><span style="">主键</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span><span style="">,跟该</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">有关的记录信息(我们称为</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TokenInfo</span></span><span style="">)作为</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">对应的值。主键</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">来源于用于关键字查询的字段的文本(这些字段一般都要求是文本类型),一般一段文本可以分解出许多</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">,我们把分解的过程称为分词,分解出来的</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">称为</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">。每一个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">是用于关键字查询的最小单位。</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TokenInfo</span></span><span style="">一般会记录该</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">的所在的文档</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">ID(Doc Number), </span></span><span style="">所在文本的位置</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">(Prox)</span></span><span style="">,在文本中出现的次数(</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Freq</span></span><span style="">),所在的字段等等。详细结构如下:</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"></span></span></p>
<p>
</p>
<table class="MsoNormalTable" style="margin: auto auto auto 5.4pt; width: 576px; border-collapse: collapse;" border="1" cellspacing="0" cellpadding="0"><tbody>
<tr style="height: 15.45pt;">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 91.35pt; padding-top: 0cm; height: 15.45pt; background-color: transparent;" colspan="2" width="122" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token Size</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 34.65pt; padding-top: 0cm; border-bottom: #ece9d8; height: 15.45pt; background-color: transparent;" rowspan="2" width="46" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small; font-family: Times New Roman;"></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 45pt; padding-top: 0cm; height: 15.45pt; background-color: transparent;" width="60" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 15.45pt; background-color: transparent;" colspan="2" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">DocFreq</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 81pt; padding-top: 0cm; height: 15.45pt; background-color: transparent;" colspan="2" width="108" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="font-size: small;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">DocList(</span></span><span style="">链表</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 126.2pt; padding-top: 0cm; height: 15.45pt; background-color: transparent;" colspan="2" width="168" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="font-size: small;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">ProxDelta(</span></span><span style="">文件位置指针</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span></span></p>
</td>
</tr>
<tr style="height: 12pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 36.95pt; padding-top: 0cm; height: 12pt; background-color: transparent;" width="49" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54.4pt; padding-top: 0cm; height: 12pt; background-color: transparent;" width="73" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">TokenInfo</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 306.2pt; padding-top: 0cm; height: 12pt; background-color: transparent;" colspan="7" width="408" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="font-size: small;"><span style="">(</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TokenInfo</span></span><span style="">)</span></span></p>
</td>
</tr>
<tr style="height: 18.75pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 36.95pt; padding-top: 0cm; height: 18.75pt; background-color: transparent;" width="49" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54.4pt; padding-top: 0cm; height: 18.75pt; background-color: transparent;" width="73" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">TokenInfo</span></span></span></p>
</td>
<td style="background-color: transparent; border: #ece9d8; padding: 0cm;" colspan="8" width="454">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: small; font-family: Times New Roman;"></span></p>
</td>
</tr>
<tr style="height: 14.8pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 36.95pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" width="49" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">…</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54.4pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" width="73" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… …</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 34.65pt; padding-top: 0cm; border-bottom: #ece9d8; height: 14.8pt; background-color: transparent;" rowspan="2" width="46" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small; font-family: Times New Roman;"></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 72pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" colspan="2" width="96" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Doc Number</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 63pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" colspan="2" width="84" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token Freq</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" colspan="2" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Field Bit</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 117.2pt; padding-top: 0cm; height: 14.8pt; background-color: transparent;" width="156" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: left;" align="left"><span style="font-size: small;"><span style="">下一个节点内容</span><span style=""><span style="font-family: Times New Roman;"> <span lang="EN-US">…</span></span></span></span></p>
</td>
</tr>
<tr style="height: 15pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 36.95pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="49" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Token</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54.4pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="73" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">TokenInfo</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 306.2pt; padding-top: 0cm; height: 15pt; background-color: transparent;" colspan="7" width="408" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="font-size: small;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">(DocList</span></span><span style="">链表</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span></span></p>
</td>
</tr>
<tr height="0">
<td style="background-color: transparent; border: #ece9d8;" width="49"></td>
<td style="background-color: transparent; border: #ece9d8;" width="73"></td>
<td style="background-color: transparent; border: #ece9d8;" width="46"></td>
<td style="background-color: transparent; border: #ece9d8;" width="60"></td>
<td style="background-color: transparent; border: #ece9d8;" width="36"></td>
<td style="background-color: transparent; border: #ece9d8;" width="36"></td>
<td style="background-color: transparent; border: #ece9d8;" width="48"></td>
<td style="background-color: transparent; border: #ece9d8;" width="60"></td>
<td style="background-color: transparent; border: #ece9d8;" width="12"></td>
<td style="background-color: transparent; border: #ece9d8;" width="156"></td>
</tr>
</tbody></table>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">(Token HashTable)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">ProxDelta(</span></span><span style="">文件位置指针</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span></p>
<p>
</p>
<table class="MsoNormalTable" style="margin: auto auto auto 5.4pt; width: 576px; border-collapse: collapse;" border="1" cellspacing="0" cellpadding="0"><tbody>
<tr style="height: 13.5pt;">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 27pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">N0</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 38.55pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="51" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(0,0)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 43.95pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="59" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(0,1)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 25.5pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="34" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">…</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 54pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(0,n0-1)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="36" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">N1</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 38.55pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="51" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(1,0)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 45pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="60" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(1,1)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 27.75pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="37" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">…</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 59.7pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="80" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">A(1,n1-1)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 45pt; padding-top: 0cm; height: 13.5pt; background-color: transparent;" width="60" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… …</span></span></span></p>
</td>
</tr>
<tr style="height: 17.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 189pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" colspan="5" width="252" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: small;"><span style="">第</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">1</span></span><span style="">个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">doc</span></span><span style="">中该</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">出现的位置情况</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 198pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" colspan="5" width="264" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="font-size: small;"><span style="">第</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">2</span></span><span style="">个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">doc</span></span><span style="">中该</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">key</span></span><span style="">出现的位置情况</span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 45pt; padding-top: 0cm; height: 17.25pt; background-color: transparent;" width="60" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… …</span></span></span></p>
</td>
</tr>
</tbody></table>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">(</span></span><span style="">详细的</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">HashTable</span></span><span style="">结构图</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 21pt;"><span style="">查询的时候我们会用同样的分词算法把用户输入的关键分解为多个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token,</span></span><span style="">每一个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">都去找这个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">HashTable, </span></span><span style="">然后将每一个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">查到的结果集进行合并,返回给用户。</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="">对于</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">b</span></span><span style="">需求:</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 21pt;"><span style="">一般字段有下面几种类型:</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TEXT, STRING, ENUM, RANGE, NUMBER, BIT, DATE</span></span><span style="">等。一般我们是将</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TEXT, STRING, NUMBER</span></span><span style="">类型的字段采用上述</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">HashTable</span></span><span style="">的方法建立索引,只不过他们的分词方法是不一样的,其中</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">TEXT</span></span><span style="">类型的字段是要进行分词的,</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">STRING</span></span><span style="">类型的字段是不需要分词的,整体作为一个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">,</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">NUMBER</span></span><span style="">类型的字段是将字段转化为数字作为</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Token</span></span><span style="">,这样可以节省空间;我们将</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">ENUM, RANGE, BIT</span></span><span style="">类型的字段采用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">的方法建立索引,下面具体说明</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">的索引结构(以</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Type</span></span><span style="">字段为例):</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"></span></span></p>
<div>
<table class="MsoNormalTable" style="margin: auto auto auto 5.4pt; border-collapse: collapse;" border="1" cellspacing="0" cellpadding="0"><tbody>
<tr style="height: 7.2pt;">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 63pt; padding-top: 0cm; height: 7.2pt; background-color: transparent;" width="84" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Type Value</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 225pt; padding-top: 0cm; height: 7.2pt; background-color: transparent;" width="300" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Bit Map</span></span></span></p>
</td>
</tr>
<tr style="height: 15.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 63pt; padding-top: 0cm; height: 15.25pt; background-color: transparent;" width="84" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Buy</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 225pt; padding-top: 0cm; height: 15.25pt; background-color: transparent;" width="300" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 1010</span></span></span></p>
</td>
</tr>
<tr style="height: 14.75pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 63pt; padding-top: 0cm; height: 14.75pt; background-color: transparent;" width="84" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="font-size: small;"><span style="font-family: Times New Roman;"><span style="" lang="EN-US">Sale</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 225pt; padding-top: 0cm; height: 14.75pt; background-color: transparent;" width="300" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 0101</span></span></span></p>
</td>
</tr>
<tr style="height: 15.75pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 63pt; padding-top: 0cm; height: 15.75pt; background-color: transparent;" width="84" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">…</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 225pt; padding-top: 0cm; height: 15.75pt; background-color: transparent;" width="300" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … … …</span></span></span></p>
</td>
</tr>
</tbody></table>
</div>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span><span style=""></span><span style=""></span>^doc#16<span style=""> </span>^doc#1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="">我们再举另外一个例子(用结构体来表示):</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 21pt;"><span style="">假定我们有两个</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">enum</span></span><span style="">类型的字段,每一个类型有几个可能的值,共</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">20</span></span><span style="">条记录:</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>Field:</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>Country - China, Hong Kong, Japan, Korea, USA</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""></span><span style=""> </span>Color - Blue ,Red, White</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>Records:</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>China-blue, Hong Kong-blue, Japan-red, China-red, China-white,</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>0-0<span style=""> </span><span style=""></span>1-0<span style=""> </span>2-1<span style=""> </span>0-1<span style=""> </span><span style=""></span><span style=""></span>0-2</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>USA-red, Korea-white, China-white, China-white, USA-blue,</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>4-1<span style=""> </span>3-2<span style=""> </span>0-2<span style=""> </span>0-2<span style=""> </span>4-0</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>China-red, USA-red, USA-blue, Hong Kong-white, Japan-Red,</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span><span style=""></span>0-1<span style=""> </span>4-1<span style=""> </span>4-0<span style=""> </span>1-2<span style=""> </span><span style=""></span><span style=""></span>2-1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>China-red, China-Red, China-BLUE, China-white, China-RED,</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>0-1<span style=""> </span><span style=""></span><span style=""></span>0-1<span style=""> </span>0-0<span style=""> </span>0-2<span style=""> </span>0-1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style=""><span style="font-family: Times New Roman;"> </span></span></span><span style="">这些数据就可以用下面的结构体来表示:</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>SEnumDesc {</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>iNumOfFields = 2;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pField[0] = "Country", pField[1] = "Color";</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pValues[0] = {</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>iNumOfValues = 5;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pVal[0] = "china", pVal[1] = "hong kong", pVal[2] = "japan",</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pVal[3] = "korea", pVal[4] = "usa";</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>},</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pValues[1] = {</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>iNumOfValues = 3;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pVal[0] = "blue", pVal[1] = "red", pVal[2] = "white";</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>};</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>iNumOfDocs = 20;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pBitmap[0] {</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[0][0] = xxxx xxxx xxxx 1111 1000 0101 1001 1001;<span style=""> </span>// china</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[1][0] = xxxx xxxx xxxx 0000 0010 0000 0000 0010;<span style=""> </span>// hong kong</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[2][0] = xxxx xxxx xxxx 0000 0100 0000 0000 0100;<span style=""> </span>// japan</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[3][0] = xxxx xxxx xxxx 0000 0000 0000 0100 0000;<span style=""> </span>// korea</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span><span style=""></span>pMap[4][0] = xxxx xxxx xxxx 0000 0001 1010 0010 0000;<span style=""> </span>// usa</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span><span style=""></span>//<span style=""> </span>^doc#20<span style=""> </span>^doc#1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pBitmap[1] {</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[0][0] = xxxx xxxx xxxx 0010 0001 0010 0000 0011;<span style=""> </span>// blue</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[1][0] = xxxx xxxx xxxx 1001 1100 1100 0010 1100;<span style=""> </span>// red</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>pMap[2][0] = xxxx xxxx xxxx 0100 0010 0001 1101 0000;<span style=""> </span>// white</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt 21pt; text-indent: 21pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;">//<span style=""> </span>^doc#20<span style=""> </span>^doc#1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-indent: 18pt;"><span style="">如果一个字段是</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Range</span></span><span style="">类型的,或者搜索的时候是要根据一个范围来查找的,比如说</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">price</span></span><span style="">字段,每一条记录都一个价格值,但是搜索的时候我们一般是根据一个价格范围来查找,对于这样的字段我们建立索引的时候也是采用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">结构,既先自己定义几个范围,一个记录该字段的值属于哪个范围,我们就将位置上设</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">1</span></span><span style="">。以</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">price</span></span><span style="">字段为例。</span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"></span></span></p>
<div>
<table class="MsoNormalTable" style="margin: auto auto auto 5.4pt; border-collapse: collapse;" border="1" cellspacing="0" cellpadding="0"><tbody>
<tr style="height: 15pt;">
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; width: 54pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Range</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 232.45pt; padding-top: 0cm; height: 15pt; background-color: transparent;" width="310" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">Bit Map</span></span></span></p>
</td>
</tr>
<tr style="height: 6.65pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 54pt; padding-top: 0cm; height: 6.65pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;"><50</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 232.45pt; padding-top: 0cm; height: 6.65pt; background-color: transparent;" width="310" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 0100</span></span></span></p>
</td>
</tr>
<tr style="height: 13.95pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 54pt; padding-top: 0cm; height: 13.95pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">[50,100)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 232.45pt; padding-top: 0cm; height: 13.95pt; background-color: transparent;" width="310" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 0001</span></span></span></p>
</td>
</tr>
<tr style="height: 14.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 54pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">[100,500)</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 232.45pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="310" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 0010</span></span></span></p>
</td>
</tr>
<tr style="height: 14.25pt;">
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; width: 54pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="72" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">>=500</span></span></span></p>
</td>
<td style="padding-right: 5.4pt; border-top: #ece9d8; padding-left: 5.4pt; padding-bottom: 0cm; border-left: #ece9d8; width: 232.45pt; padding-top: 0cm; height: 14.25pt; background-color: transparent;" width="310" valign="top">
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; text-align: center;" align="center"><span style="" lang="EN-US"><span style="font-size: small;"><span style="font-family: Times New Roman;">… … xxxx xxxx xxxx xxxx 0000 0000 0000 1000</span></span></span></p>
</td>
</tr>
</tbody></table>
</div>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style="font-family: Times New Roman;"><span style=""> </span>^doc#16<span style=""> </span>^doc#1</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><span style="" lang="EN-US"><span style=""><span style="font-family: Times New Roman;"> </span></span></span><span style="">对于</span><strong style=""><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Bit</span></span></strong><span style="">类型的字段同样也是采用</span><strong style=""><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span></strong><span style="">的结构,这里就不在阐述了。</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">结构的好处是节省空间,结果集合逻辑运算简单快速,但并不是所有</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">Enum</span></span><span style="">类型的字段都采用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">结构,当枚举值大于</span><strong style=""><span style="" lang="EN-US"><span style="font-family: Times New Roman;">32</span></span></strong><span style="">个的时候,采用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">BitMap</span></span><span style="">就不方便了,这个时候我们会采用</span><span style="" lang="EN-US"><span style="font-family: Times New Roman;">HashTable</span></span><span style="">的结构来建立索引。</span></p>
<p></p>
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值