BUAA-2021春-数据结构-综合作业-文本摘要生成(Hash实现 + SIMD优化 终测最速)

1 篇文章 0 订阅
1 篇文章 0 订阅

题目内容

问题描述

在自然语言文本处理中,有一种分析文本、自动抽取文本主题思想的方法(通常用于文本摘要生成),其方法如下:

1.        首先分析文本中非停用词(stop-word)的出现频度;

2.        统计文本中每个句子中非停用词频度之和。若某个非停用词在一个句子中出现多次,则都要计算;

3.        按非停用词频度之和由高至低输出前N个句子。

注:

l  单词为仅由字母组成的字符序列。包含大写字母的单词应将大写字母转换为小写字母后进行词频统计。

l  句子是由下面符号分隔的段落:句号(.)、问号(?)和惊叹号(!)。

l  在自然语言处理中,停用词(stop-word)指的是文本分析时不会提供额外语义信息的词的列表,如英文单词a,an,he,you等就是停用词。

输入形式

根据当前目录下停用词文件“stopwords.txt”,打开当前目录下文件“article.txt”,并从标准输入读入需要生成至文件的句子数N。按上面要求进行文本分析,抽取相关文本主题思想。

输出形式

在标准输出上按频度之和由高至低输出前5个句子的频度之和与句子。输出时先输出句子的频度和,然后空一个空格再输出整个句子,每个句子最后有一个回车。同时按频度之和由高至低输出前N个句子的频度之和与句子输出到文件“results.txt”中,输出要求同标准输出。输出时,若两个句子频度和相同,则按原文本中出现次序输出。

说明:输出句子时,从第一个非空字符开始,句子中每个组成部分形态(包括单词中字母大小写、间隔)应与原文本中一致


样例输入

200

(其它可从课程下载区下载样例文件“article.txt”和停用词表文件“stopwords.txt”到本地作为程序输入文件。)


样例输出

程序运行后,屏幕上输出结果为:

50286 James's eyes were hazel, his nose was slightly longer than Harry's and there was no scar on his forehead, but they had the same thin face, same mouth, same eyebrows; James's hair stuck up at the back exactly as Harry's did, his hands could have been Harry's and Harry could tell that, when James stood up, they would be within an inch of each other in height.

48188 I didn't practise, I didn't bother, I could've stopped myself having those dreams, Hermione kept telling me to do it, if I had he'd never have been able to show me where to go, and-Sirius wouldn't-Sirius wouldn't-'Something was erupting inside Harry's head: a need to justify himself, to explain-I tried to check he'd really taken Sirius, I went to Umbridge's office, I spoke to Kreacher in the fire and he said Sirius wasn't there, he said he'd gone!

39986 Little Ginny's been writing in it for months and months, telling me all her pitiful worries and woes - how her brothers tease her, how she had to come to school with secondhand robes and books, how - Riddle's eyes glinted - how she didn't think famous, good, great Harry Potter would ever like herAll the time he spoke, Riddle's eyes never left Harry's face.

39455 I mean, it was really great of you and everything,' said Hermione quickly, looking positively petrified at the look on Harry's face, everyone thought it was a wonderful thing to do-'That's funny,' said Harry through gritted teeth, because I definitely remember Ron saying I'd wasted time acting the hero .

39438 Harry, Ginny and Neville and each of the Death Eaters turned in spite of themselves to watch the top of the tank as a brain burst from the green liquid like a leaping fish: for a moment it seemed suspended in midair, then it soared towards Ron, spinning as it came, and what looked like ribbons of moving images flew from it, unravelling like rolls of film-Ha ha ha, Harry, look at it-' said Ron, watching it disgorge its gaudy innards, Harry, come and touch it; bet it's weird-'RON, NO!

所生成的结果文件“results.txt”内容应与下载区样例文件“results(example).txt”完全相同。


样例说明

程序运行时,对给定的文本文件article.txt按要求进行文本分析,抽取相关文本主题思想。

在本样例中,按输出格式要求,在屏幕上输出了频率之和由高至低的前5个句子(即results.txt文件中的前5个句子);results.txt则包含了200个按频度之和由高到低排列的所有句子的频度之和与句子文本。

频率最高的句子中包含的非停用词及其出现频度如下:(james:78)(s:4750)(eyes:480)(hazel:1)(nose:115)(slightly:164)(longer:59)(harry:5732)(s:4750)(scar:90)(forehead:42)(face:521)(mouth:182)(eyebrows:45)(james:78)(s:4750)(hair:166)(stuck:35)(exactly:101)(harry:5732)(s:4750)(did:667)(hands:175)(harry:5732)(s:4750)(harry:5732)(tell:332)(james:78)(stood:156)(inch:28)(height:15),这些非停用词的出现频度之和为50286。

频率次高的句子中包含的非停用词及其出现频度如下:(didn:400)(t:2308)(practise:20)(didn:400)(t:2308)(bother:16)(ve:693)(stopped:120)(having:103)(dreams:33)(hermione:1615)(kept:113)(telling:93)(d:621)(able:126)(sirius:647)(wouldn:113)(t:2308)(sirius:647)(wouldn:113)(t:2308)(erupting:2)(inside:198)(harry:5732)(s:4750)(head:490)(need:183)(justify:2)(explain:38)(tried:128)(check:35)(d:621)(really:288)(taken:79)(sirius:647)(went:198)(umbridge:571)(s:4750)(office:170)(spoke:65)(kreacher:118)(said:5086)(sirius:647)(wasn:141)(t:2308)(said:5086)(d:621)(gone:129),这些非停用词的出现频度之和为48188。

评分标准

本题是一个综合性能测试题,其评分标准为在所有程序中运行最快的将得满分,其它程序的得分以最快的程序运行时间为基准,根据其运行时间计算得出。程序运行无结果或结果错误将不得分。

 

解题思路

        先不考虑具体实现与性能,这道题实际上只要求做

                1.分词、分句

                2.统计词频、计算句子非停词频

                3.按句子非停词频进行排序,并按要求输出

        具体到程序中的流程,大致就是

                1.读入文章

                2.分词并统计词频、分句(记录句子信息)

                3.读入停词表,并将相关停词频率置0(相当于只考虑非停词频率)

                4.对句子按词频为第一关键词,读入顺序为第二关键词排序

                5.按照要求输出结果

        实际每一部分之间相互是比较分离的,可以逐部分进行实现这五个流程,也方便debug,之后简单说明一下各个部分中有用的思路。

        读入输出的io部分,似乎是程序最简单的部分,可能很多人会选择使用循环体套fscanf/fprintf,fputchar,fgetchar实现,但这种方式实际验证出来是比较低效的。文章存储在硬盘上,直接对文章进行io操作,相当于对硬盘发起一系列不连续的读写(个人理解),速度比对硬盘发起一次连续读写,把文件读入内存后再进行操作慢的多。往往看上去最简单部分的性能会被忽略,所以建议在io部分使用fread fwrite函数,先将文件读入一个大数组之后再进行后续操作,毕竟直接对内存进行io是比硬盘要快很多的。当然在linux系统下(评测机是linux)还可以使用mmap避免frwrite fread的一次内存拷贝,进一步提高性能。关于fread fwrite以及mmap,并不复杂,建议自行了解。

        分词与词频统计,是本题中的热点,会占用最长的时间。分词的思路比较简单,以非字母字符为间隔,间隔之间就是单词。取到单词后,可以先化一个区域暂存单词,进行一些预处理。由于题目要求了大小写不敏感,所以首先进行大小写的统一化(如一致转为小写)。大小写统一可以用if条件判断,或者c标准库函数实现,当然还有个小技巧。我们拿来ascii码表,可以发现小写字母a对应0x61,而大写字母A对应0x41,以二进制形式表示就是0b0110 0001(小写a)、0b0100 0001 (大写A)。 不难看出a A之间只相差一位0x20 (0b0010 0000),由于分词操作已经完成,不需要考虑字母之外的其它字符存在情况,因此对于大小写的统一,(如统一到小写)完全可以通过一次对0x20 (0b0010 0000)的或 运算完成。这样的操作可以减少条件判断的数量提高性能(虽然微乎其微吧),更重要的是,这种大小写统一的方式排除了条件判读,更易于并行化操作,可以很容易的使用SIMD完成这个操作,成倍数的提高性能。

        完成统一之后进行词频统计。本质上,词频统计需要做的就是进行一个查找,或者说映射的工作,输入一个词,返回这个词唯一对应的一块记录这个词的内存区域。这个查找的算法,将极大的影响其速度,以及本题最终的性能。我最开始写这道题时,使用的是Trie的数据结构和算法,这个算法非常适合本题,最常规的实现,就已经可以跑出非常快的速度。使用双数组实现的Trie树或者对指针的储存稍加优化,也能跑出很高的性能来。在CSDN上已有同学给出这类实现的代码,大家可以去参考他们的实现。使用Hash之前,还是得了解Hash。Hash本质上是一个任意长度数据到定长数(32bits/64bits/...)的一个映射函数。回到我们词频统计的需求上看,词频统计需要的是输入单词(任意长数据)->唯一确定地址的映射,而Hash(在忽略冲突后)可以提供任意长数据->某一确定数的映射,之后我们再寻找某一确定数到唯一地址的映射即可。最暴力的方式,就是直接使用Hash得到的确定数作为某一数组下标,对应元素用来储存需要的数据使用。但显然的问题在于,Hash得到的结果太长了,哪怕数组存储8bits信息,一个32位Hash也得占用4G的内存空间去储存Table,这对我们的需求显然不够现实。显然我们需要一些动态占用的储存,比如链表,红黑树,避免大片空间无用的浪费才有可能实现这个Hash。一种可能的思路是直接构建一颗红黑树,将Hash动态存入其中,可以提供高性能(logN的复杂度)的插入,查找。这种思路很不错,可以占用最有限的空间,提供比较高性能的查找插入服务。但是我们空间也并没有那么紧张,牺牲一部分空间去换取更高的性能对我们来说是完全值得的。完整的Hash作为数组下标储存是放不下,但是可以截取Hash中的有限位数数据作为数组下标,数组再指向一个链表首元素存储具体Hash的信息,这样一来,查找,插入的时间复杂度大概率为1,接近常数,空间占用也可以接受,因此最终我们采用了这个方案。最终程序中,截取了我们生成的64位Hash值的低18位,作为低18位一致的Hash链表指针数组的下标,并将64位Hash与单词的长度做异或操作(这里也可以选择一些其他信息,以充分利用低18位可用的冗余信息空间)存入链表之中。我们已经假设过了无冲突前提,每个Hash链表的节点,即对应一个单词,是单词唯一对应的那个内存区域。我们需要进行词频统计,因此在这个链表节点中储存词频信息 n ,插入节点时将n初始化为1,之后每次词频统计过程中的查找对n + 1,即完成了我们的需求。在第一遍对文章遍历,分词,词频统计的同时,可以另开一个指针数组,按照顺序存储文章中出现的每一个词的记录节点的对应指针,避免对全文的第二次分词遍历(这个是很慢的)。

        这里再多嘴一下无冲突的假设。在合理选用hash函数的前提下,本题中是基本不可能出现Hash冲突的情况。我的程序中选用了开源的高性能Hash函数xxhash3,并使用了其64位hash输出的版本。事先使用覆盖基本所有英语词汇的牛津词库对这个hash函数进行了测试,没有发现任何冲突。也对字符长度较小的单词进行了穷举式的测试(再长的,我这64G内存也不够跑了 orz)进行测试,没有发现任何冲突。讲道理,一个不计大小写的字母最多占用5bits(实际不到,因为26<32),64bits的hash函数在限制长度一致的情况下,理论上完全可以满足至少12个字符内无冲突,而超过12个字符的长单词,恐怕数量也很有限,冲突率足够低。因此本题最终实现上,完全扔掉了冲突检查,减少了大量的字符串比较占用时间。实际测试中,带冲突检查的版本(字符串检查使用标准库函数或者手写simd汇编的结果都差不多)大概可以跑到70-80ms,而扔掉冲突检查之后,就可以跑到50-60ms。对一个冲突可能性比彩票中奖率还低的玩意,自然是丢掉提高性能啦((。当然,如果助教哥哥故意构造几组冲突数据卡,那也只能认命,好在最后并没有((。

        分句没什么好啰嗦的,要仔细读题目,按照要求找到正确的句子开头和结尾。这里可以针对每个句子,记录句子中的词数量,后续统计词频时候,结合分词时记录的指针数组,可以帮助高效的完成词频求和。分句和分词可以同步进行,实现对文章的一次遍历,避免二次遍历的时间消耗。

        之后是读入停词表,并把停词节点上的词频信息分别置0,(当然,停词不存在时也完全不需要再插入)。

        对每个词频的计算,这就很简单了。根据遍历得到的按照原文顺序的每个句子中的词的数量信息,与按顺序记录的词的对应节点指针信息,求和就好了。这里一个建议是,为了排序时候,元素交换可以更快的进行,将求和得到的结果单独存入另一个数组中,而非最初始的句子信息结构体。元素更小,交换更快,实际中这个操作也可以提高几十ms到几百ms的性能。为了能使用qsort这种不稳定排序法,在记录中加入句子顺序(id)信息也是重要的。我使用的句子信息记录,以及排序函数如下所示。

struct nRecord
{
    uint32_t id;
    uint32_t n;
} sentenceN[SentenceSize] = {0};

int cmp(const void *a, const void *b)
{
    return (int)(((*((long long *)a) - *((long long *)b)) >> 32) | 1);
}

        其中,id存储的是取反后的句子顺序(从开始的第几个句子)。这样的操作可以减少一次比较操作。有人可能会问,为什么结构体中id在n之前,这样的话a前32位是否会存储id信息,变成id为第一关键词?这里涉及一个大小端的问题,x86的机器使用了小端模式。简单解释就是,我们看到的一个如 0xAABBCCDD的数据,它存储在内存中的方式实际上是 0xDD 0xCC 0xBB 0xAA。对于结构体也是一样,所以结构体中,虽然id在前,但在将整个结构体作为long long类型处理时,实际上是n在高位,而id在低位,id属于第一优先级。

        回到排序的实现,实际上在做出索引之后,使用qsort已经有很快的性能了,但仍有一点的提升空间。我们回到题目要求,会发现,要求输出的句子并不是全部句子,而是给定的固定数量的前K个句子。实际上,就像K=1时,我们只需要找频率最高的一个句子就好,并不需要对整个文章每个句子依据词频完成排序。因此我们这里可以选用一个堆排序法,维护一个最小堆,记录频次最高的前k个句子。由于使用最小堆,所以堆顶的元素即为已遍历元素中,频次最高的第k个句子。之后遍历未排序的句子,若有比这个最小堆堆顶元素大的句子,就进行交换,把那个句子交换到堆顶,之后维护一次最小堆。这样遍历完整个句子记录后,就得到了频次最高的k个句子。这样的操作,可以得到nlogk的时间复杂度,在n >> k的情况下,可以得到比qsort nlogn快得多的性能。

        在完整的堆排序实现中,还需要一个初始化最小堆,以及最小堆到递增递减序列的算法过程,而不仅仅是个堆的维护。关于堆排序的完整介绍,可以自行了解学习。我的实现中,由于性能够用,偷懒省略了堆的初始化以及堆到递增递减序列的实现,使用qsort代替这两个部分,并非标准堆排序,且性能有一点的提高空间,望周知。

        结果输出部分,没什么好说的,fprintf结合fwrite就好。

        差不多本题Hash的思路,以及一些需要注意的点就这么多((,之后就是SIMD优化时间了,还是先放代码吧。

参考代码

        参考代码1-Trie版本(辅助后续Debug)

#include <assert.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <x86intrin.h>

#define InputSize 24 * 1024 * 1024 //160M
#define StopWordSize 1024 * 1024
#define DictionSize 610 * 1024
#define WordSize 5120 * 1024
#define SentenceSize 5120 * 1024
#define DEBUG 0
#define DEBUGN 20000
#define inRange(x, a, b) ((x >= a) && (x <= b))
#define tosmall(c)                \
    {                             \
        if (inRange(c, 'A', 'Z')) \
        {                         \
            c = c - 'A' + 'a';    \
        }                         \
    }
#define swap(a, b) \
    do             \
    {              \
        a ^= b;    \
        b ^= a;    \
        a ^= b;    \
    } while (0);

char inBuf[InputSize] = {0}; //16 M input mem
char stopBuf[StopWordSize];  //128k stop-word mem

#if (DEBUG == 1)
long long timeBuf;
#define show_run_time()                                                                                                   \
    do                                                                                                                    \
    {                                                                                                                     \
        long long time = clock();                                                                                         \
        timeBuf = time - timeBuf;                                                                                         \
        printf("from begin %lld ms, from last %lld ms\n", time * 1000 / CLOCKS_PER_SEC, timeBuf * 1000 / CLOCKS_PER_SEC); \
        timeBuf = time;                                                                                                   \
    } while (0);
#else
#define show_run_time() \
    do                  \
    {                   \
        ;               \
    } while (0);
#endif

typedef struct TreeSturct
{
    int n;                       //conter
    struct TreeSturct *pSon[26]; //pointer to son
} tree_s;

typedef struct
{
    char *begin;
    char *end; //storage from start to end
    tree_s **pBegin;
    tree_s **pEnd;
} sentence_s;

struct nRecord
{
    uint32_t id;
    uint32_t n;
} sentenceN[SentenceSize];

tree_s treeMem[DictionSize];          //160k * 200 = 16M mem
tree_s *wordMem[WordSize];            //5.12M mem of word list
sentence_s sentenceMem[SentenceSize]; //320k sentence mem;

tree_s *unusedTree = treeMem;
tree_s **unusedWordRecord = wordMem;
sentence_s *unusedSentence = sentenceMem;

tree_s father;

int getDebugN(char *search)
{
    tree_s *pNow = &father;
    while (*search)
    {
        if (pNow->pSon[(*search) - 'a'] != NULL)
        {
            pNow = pNow->pSon[(*search) - 'a'];
            search++;
        }
        else
        {
            return 0;
        }
    }
    return pNow->n;
}

void processStopWord()
{
    int i = 0;
    register tree_s *pTreeNow = &father;
    register char temp = stopBuf[i];
    while (temp != 0)
    {
        if (inRange(temp, 'a', 'z'))
        {
            temp -= 'a';
            if (pTreeNow->pSon[temp] == NULL)
            {
                temp = stopBuf[++i];
                //printf("NULL handler %c \n", stopBuf[i]);
                while (temp != '\n' && temp != 0)
                    temp = stopBuf[++i];
                pTreeNow = &father;
                continue;
            }
            pTreeNow = pTreeNow->pSon[temp];
            temp = stopBuf[++i];
        }
        else
        {
            pTreeNow->n = 0;
            temp = stopBuf[++i];
            pTreeNow = &father;
            continue;
        }
    }
    pTreeNow->n = 0;
}

void processWord()
{
    int i = 0;
    int flagSentenceBegin = 0, flagSentenceBeginHalf = 0;

    register tree_s *pTreeNow;
    register sentence_s *pSentenceNow;
    //printf("a char : %02x \n", inBuf[i]);
    char temp = inBuf[i];
    //printf("a char : %c \n", temp);
    while (temp)
    {
        //printf("i %d \n", i);
        tosmall(temp);
        //printf("%d : %c %x\n", i, temp, temp);
        if ((temp == '.' || temp == '?' || temp == '!'))
        {
            if (flagSentenceBegin)
            {
                pSentenceNow->pEnd = unusedWordRecord;
                pSentenceNow->end = &inBuf[i];
                flagSentenceBegin = 0;
                flagSentenceBeginHalf = 0;
                //printf("a sentence end!\n");
            }
            flagSentenceBegin = 0;
            flagSentenceBeginHalf = 0;
        }
        else if ((temp != ' ' && temp != '\n') || flagSentenceBegin)
        {
            pTreeNow = &father;

            if (!(flagSentenceBegin || flagSentenceBeginHalf))
            {
                pSentenceNow = unusedSentence;
                pSentenceNow->begin = &inBuf[i];
                flagSentenceBeginHalf = 1;
            }

            if (!inRange(temp, 'a', 'z'))
            {
                temp = inBuf[++i];
                continue;
            }

            while (inRange(temp, 'a', 'z'))
            {

                uint8_t index = temp - 'a';
                if (pTreeNow->pSon[index] == NULL)
                {
                    pTreeNow->pSon[index] = unusedTree++;
                }
                pTreeNow = pTreeNow->pSon[index];

                temp = inBuf[++i];
                tosmall(temp);
            }

            pTreeNow->n += 1;
            *unusedWordRecord = pTreeNow;

            if (!flagSentenceBegin)
            {
                pSentenceNow->pBegin = unusedWordRecord;
                flagSentenceBegin = 1;
                unusedSentence++;
            }
            unusedWordRecord++;
            continue;
        }
        temp = inBuf[++i];
    }
}

int cmp(const void *a, const void *b)
{
    //printf("ret : %d \n", ret);
    return (int)(((*((long long *)a) - *((long long *)b)) >> 32) | 1);
    ;
}

int main()
{
    // get_file_part
    show_run_time();
    FILE *fp = fopen("article.txt", "rb");
    fread(inBuf, sizeof(char), InputSize, fp);
    fclose(fp);
    fp = fopen("stopwords.txt", "rb");
    fread(stopBuf, sizeof(char), StopWordSize, fp);
#if (DEBUG == 1)
    {
        printf("article size : %lld \n", strlen(inBuf));
        printf("stop-word size : %lld \n", strlen(stopBuf));
        /*for (int i = 0; i < 100; i++)
        {
            printf("%02x ", inBuf[i]);
        }*/
    }
#endif

    // file_get_end
    show_run_time();

    //start to analyse input
    processWord();
    processStopWord();
    show_run_time();

    //sum up all frequence
    int sentenceNum = unusedSentence - sentenceMem;
    for (int i = 0; i < sentenceNum; i++)
    {
        int tempN = 0;
        for (tree_s **j = sentenceMem[i].pBegin; j < sentenceMem[i].pEnd; j++)
        {
            tempN += j[0]->n;
        }
        sentenceN[i].n = tempN;
        sentenceN[i].id = ~i;
        //printf("Sentence %d : %d\n", i + 1, tempN);
    }
    tree_s **end = sentenceMem[sentenceNum - 1].pEnd;
    printf("word num is %d\n", end - wordMem);
    printf("sentence num is %d\n", sentenceNum);
    show_run_time();

    /* //debug
    char debugBuf[1000];
    while (scanf("%s", debugBuf) != EOF)
    {
        printf("The %s has frequence : %d\n", debugBuf, getDebugN(debugBuf));
    }*/

    //ranking: RBTree ranking first 100 element
    int outputN2, outputN;
#if (DEBUG == 0)
    scanf("%d", &outputN2);
#else
    outputN2 = DEBUGN;
#endif
    if (outputN2 < 5)
    {
        outputN = 5;
    }
    else
    {
        outputN = outputN2;
    }
    qsort(sentenceN, outputN, sizeof(long long), cmp);
    uint64_t *sentenceTopRecord = (uint64_t *)&sentenceN[0];
    /*int32_t outputNLargestBit = outputN;
    
    {
        uint8_t movePos = 0;
        while (outputNLargestBit >> movePos != 1)
        {
            movePos++;
        }
        outputNLargestBit = (1 << movePos);
    }
    //see the first k element of sentenceN as a minimal heap; and maintain it, so the first element is the minimal element
    */
    //seem to be useless

    //first get largest 100 num
    // maintain part begin
    for (int i = outputN; i < sentenceNum; i++)
    {
        //printf("%llu vs %llu\n", sentenceTopRecord[i], sentenceTopRecord[0]);
        if (sentenceTopRecord[i] > sentenceTopRecord[0])
        {
            //printf("swap!\n");
            swap(sentenceTopRecord[i], sentenceTopRecord[0]);
            int x = 0;
            while (1)
            {
                int l = 2 * x + 1, r = l + 1;
                if (l >= outputN)
                {
                    break;
                }
                if (r >= outputN)
                {
                    if (sentenceTopRecord[l] < sentenceTopRecord[x])
                    {
                        swap(sentenceTopRecord[l], sentenceTopRecord[x]);
                    }
                    break;
                }
                if (sentenceTopRecord[l] < sentenceTopRecord[x])
                {
                    if (sentenceTopRecord[l] < sentenceTopRecord[r])
                    {
                        swap(sentenceTopRecord[l], sentenceTopRecord[x]);
                        x = l;
                    }
                    else
                    {
                        swap(sentenceTopRecord[r], sentenceTopRecord[x]);
                        x = r;
                    }
                    continue;
                }
                if (sentenceTopRecord[r] < sentenceTopRecord[x])
                {
                    swap(sentenceTopRecord[r], sentenceTopRecord[x]);
                    x = r;
                    continue;
                }
                break;
            }
        }
    }
    // maintain part end

    qsort(sentenceN, outputN, sizeof(long long), cmp);
    show_run_time();

    for (int i = outputN - 1; i > outputN - 1 - 5; i--)
    {
        printf("%d ", sentenceN[i].n);
        /*for (char *j = sentenceMem[sentenceN[i].id].begin; j <= sentenceMem[sentenceN[i].id].end; j++)
        {
            putchar(*j);
        }*/
        int id = ~sentenceN[i].id;
        fwrite(sentenceMem[id].begin, sizeof(char), sentenceMem[id].end - sentenceMem[id].begin + 1, stdout);
        putchar('\n');
    }

    freopen("results.txt", "w", stdout);
    for (int i = outputN - 1; i >= 0; i--)
    {
        int id = ~sentenceN[i].id;
        printf("%d %d ", sentenceN[i].n, sentenceMem[id].pEnd - sentenceMem[id].pBegin);
        /*for (char *j = sentenceMem[sentenceN[i].id].begin; j <= sentenceMem[sentenceN[i].id].end; j++)
        {
            putchar(*j);
        }*/
        fwrite(sentenceMem[id].begin, sizeof(char), sentenceMem[id].end - sentenceMem[id].begin + 1, stdout);
        putchar('\n');
        for (tree_s **k = sentenceMem[id].pBegin; k < sentenceMem[id].pEnd; k++)
        {
            printf("%d ", k[0]->n);
        }
        putchar('\n');
    }
    return 0;
}

        参考代码2-Hash版本,有SIMD优化,未展开SIMD,有冲突检测(有大量debug使用的无用代码)

#define XXH_INLINE_ALL
#define DEBUG 0
#define DEBUGN 30000
#define PULLOVER
#define sizeOfMem 128l * 1024l * 1024l
#define WordSize 4096 * 1024
#define SentenceSize 512 * 1024
#include "xxhash.h"
#include <assert.h>
#include <ctype.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
#include <x86intrin.h>

#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)

uint8_t memRoot[sizeOfMem] __attribute__((aligned(16))) = {0};
uint8_t *mem = memRoot, *passageBuf, *stopWordBuf, *outBuf;
#define swap(a, b) \
    do             \
    {              \
        a ^= b;    \
        b ^= a;    \
        a ^= b;    \
    } while (0);
struct hashLinkList
{
    uint64_t hashNum; //last part of 64bits hash
    uint32_t next;    //location relative to memRoot
    uint32_t word;    //location relative to memRoot
    uint32_t n;
} * pHash;

uint32_t hashRecord[256l * 1024l] = {0}; //location relative to memRoot
uint32_t wordRecord[WordSize] = {0};

struct sentenceRecord
{
    uint32_t sentenceBegin;
    uint32_t sentenceEnd; //storage from start to end
    uint32_t wordStart;
    uint32_t wordEnd;
} sentenceRecord[SentenceSize] = {0};

struct nRecord
{
    uint32_t id;
    uint32_t n;
} sentenceN[SentenceSize] = {0};

uint32_t collitionTime = 0;
#if (DEBUG)
long long timeBuf;
#define showRunTime()                                                                                                     \
    do                                                                                                                    \
    {                                                                                                                     \
        long long time = clock();                                                                                         \
        timeBuf = time - timeBuf;                                                                                         \
        printf("from begin %lld ms, from last %lld ms\n", time * 1000 / CLOCKS_PER_SEC, timeBuf * 1000 / CLOCKS_PER_SEC); \
        timeBuf = time;                                                                                                   \
    } while (0);
#else
#define showRunTime() \
    do                \
    {                 \
        ;             \
    } while (0);
#endif

int cmp(const void *a, const void *b)
{
    //printf("ret : %d \n", ret);
    return (int)(((*((long long *)a) - *((long long *)b)) >> 32) | 1);
    ;
}

inline uint16_t Mystrcmp(char *a, char *b, uint32_t len)
{
#ifndef IGNOREHASHCOLLICTION
    if (a[len] != 0 || b[len] != 0)
        return 0;
    register uint64_t index = 0xFFFF;
    for (uint8_t i = 0; i < len; i += 16)
    {
        __asm__ __volatile__("movdqu (%1), %%xmm0\nmovdqu (%2), %%xmm1\npcmpistrm $0x08, %%xmm0, %%xmm1\npextrw $0, %%xmm0, %%r13\nandq %%r13, %0\n"
                             : "+r"(index)
                             : "rm"(a + i),
                               "rm"(b + i)
                             : "%xmm0", "%xmm1", "%r13");
        if (index != 0xFFFF)
            break;
    }
    /*if(ret != strcmp(a,b))
        {
            printf("%s %s %d %d dont match\n",a,b,ret,strcmp(a,b));
        }*/
    return ~(uint16_t)index;
#else
    return strcmp(a, b);
#endif
}

inline void printBin(uint16_t Oin)
{
    uint16_t in = Oin;
    for (uint8_t y = 0; y < 16; y++)
    {
        putchar('0' + (((1 << y) & in) ? 1 : 0));
    }
    putchar('\n');
}

inline void declaremMem(int size)
{
    mem += size;
}

inline void memAligned(int N, int position)
{
    declaremMem((N + position - ((uint32_t)mem % N)) % N);
}

uint32_t passageBufLenth, stopWordBufLength, outBufLength;
uint32_t wordNum = 0, wordNumMem[16] = {0}, wordBefore = 0, sentenceLeft = 0, sentenceNum = 0;

void processWord()
{
    register struct
    {
        uint16_t sentenceBeginMask;
        uint16_t gapMask;
        uint16_t sentenceMask;
        uint16_t wordMask;
    } mask, maskCopy;

    //mask allocation

    char upperMaskList[16] __attribute__((aligned(16))) = "AZ";
    char lowerMaskList[16] __attribute__((aligned(16))) = "az";
    char andMaskList[16] __attribute__((aligned(16))) = {
        0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,
        0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20};
    char endSentenceMaskList[16] __attribute__((aligned(16))) = ".!?";
    char gapMaskList[16] __attribute__((aligned(16))) = {0x20, '\n'};

    register char *wordTemp = mem;
    declaremMem(WordSize);

    pHash = NULL;

    register char *inputPassage = passageBuf;
    register char *endOfPassage = passageBuf + passageBufLenth;
    memAligned(16, 0);
    outBuf = mem;
    char *pNowOutput = mem;
    declaremMem(16);
    register struct
    {
        uint8_t wordReadState;
        uint8_t sentenceState;
        uint8_t wordShift;
        uint8_t sentenceShift;
        uint8_t cpyLeft;
        uint8_t temp;
        uint8_t sentenceSiginalPosition;
        uint8_t extractedWordPosition;
    } flag = {0, 0, 0, 0, 0, 0, 0, 0};
    for (; inputPassage < endOfPassage; inputPassage += 16)
    {
        __asm__ __volatile__("pcmpistrm $0x44, %2, %3\n" // cmd, passage, range ->save mask result to xmm0
                             "pand  %4, %%xmm0\n"        // behind was the result
                             "paddb %%xmm0, %2\n"        // behind was the result
                             "movaps %2, %1\n"           // save the lower output

                             "movq $0, %0\n"              //reset
                             "pcmpistrm $0x04, %2, %5\n"  // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n" // extra data from xmm0's first word and save to r15w register
                             "orq %%r15, %0\n"            // save data to "Mask"
                             "salq $16, %0\n"             // do a shift

                             "pcmpistrm $0x00, %2, %6\n"  // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n" // extra data from xmm0's first word
                             "orq %%r15, %0\n"            // save data to "Mask"
                             "salq $16, %0\n"             // do a shift

                             "pcmpistrm $0x00, %2, %7\n"  // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n" // extra data from xmm0's first word
                             "orq %%r15, %0\n"            // save data to "Mask"
                             "salq $16, %0\n"             // do a shift

                             : "=g"(mask)                            //0
                             : "rm"(pNowOutput[0]),                  //1
                               "x"(*(__m128i *)inputPassage),        //2
                               "x"(*(__m128i *)upperMaskList),       //3
                               "x"(*(__m128i *)andMaskList),         //4
                               "x"(*(__m128i *)lowerMaskList),       //5
                               "x"(*(__m128i *)endSentenceMaskList), //6
                               "x"(*(__m128i *)gapMaskList)          //7
                             : "%xmm0", "%r15");
        mask.sentenceBeginMask = ~(mask.gapMask | mask.sentenceMask);
        for (int i = 0; i < 16; i++)
        {
            //putchar(pNowOutput[i]);
        }
        /*putchar('\n');
        printBin(mask.wordMask);
        printBin(mask.sentenceMask);
        printBin(mask.sentenceBeginMask);
        printBin(mask.gapMask);//*/

        //word loop start, get the hash and store it in mem.
        maskCopy = mask;
        flag.cpyLeft = 0; //0 for outside a word, 1 for inside a word
        flag.sentenceShift = 0;
        flag.wordShift = 0;
        flag.sentenceSiginalPosition = __builtin_ctz(maskCopy.sentenceMask | 0xFFF00000);
        //word loop
        register uint32_t shiftOfWord = (uint8_t *)inputPassage - (uint8_t *)memRoot;
        for (uint8_t i = 0; i < 16;)
        {
            while (i > flag.sentenceSiginalPosition)
            {
                wordNumMem[flag.sentenceSiginalPosition] = wordNum;
                maskCopy.sentenceMask >>= flag.sentenceSiginalPosition - flag.sentenceShift + 1;
                flag.sentenceShift = flag.sentenceSiginalPosition + 1;
                flag.sentenceSiginalPosition = flag.sentenceShift + __builtin_ctz(maskCopy.sentenceMask | 0xFFF00000);
            }
            if (flag.wordReadState)
            {
                flag.temp = __builtin_ctz((~maskCopy.wordMask) | 0xFFF00000);
                i = flag.wordShift + flag.temp;
                if (i < 16)
                {
                    //printf("%s i: %d\n", outBuf, i);
                    memcpy(&wordTemp[flag.extractedWordPosition], &pNowOutput[flag.cpyLeft], i - flag.cpyLeft);
                    flag.extractedWordPosition += i - flag.cpyLeft;
                    wordTemp[flag.extractedWordPosition] = 0;
                    uint64_t hashTemp = XXH3_64bits(wordTemp, flag.extractedWordPosition);
                    //printf("\"%s\" 's hash is %llu\n", wordTemp, hashTemp);
                    uint32_t id = hashTemp & 0x03FFFF;
                    //futher hash process
                    if (hashRecord[id] == 0)
                    {
                        hashRecord[id] = (uint32_t)(mem - memRoot);
                        declaremMem(sizeof(struct hashLinkList));
                        pHash = (struct hashLinkList *)(hashRecord[id] + memRoot);
                        pHash->word = (uint32_t)(wordTemp - (char *)memRoot);
                        pHash->hashNum = hashTemp;
                        pHash->next = 0;
                        pHash->n = 1;
                        wordTemp += flag.extractedWordPosition + 1;

                        wordRecord[wordNum++] = (uint8_t *)pHash - (uint8_t *)memRoot;

                        flag.extractedWordPosition = 0;
                        flag.cpyLeft = 0;
                        flag.wordReadState = 0;
                        maskCopy.wordMask >>= flag.temp;
                        flag.wordShift = i;

                        continue;
                    }
                    pHash = (struct hashLinkList *)(hashRecord[id] + memRoot);
                    while (1)
                    {
                        if (likely(pHash->hashNum == hashTemp))
                        {
                            if (likely(Mystrcmp(pHash->word + memRoot, wordTemp, flag.extractedWordPosition) == 0))
                            {
                                pHash->n += 1;
                                wordRecord[wordNum++] = (uint8_t *)pHash - (uint8_t *)memRoot;
                                /*if (hashTemp == (XXH3_64bits("ter", 3)))
                                {
                                    //fwrite(pNowOutput, 1, 16, stdout);
                                    //putchar('\n');
                                    //fwrite(wordTemp, 1, 3, stdout);
                                    printf("%d sentence\n find ter %d in %p!\n", shiftOfWord + i, pHash->n, pNowOutput);
                                }*/
                                break;
                            }
                            collitionTime++;
                            //printf("collition between %s %s %x %x!\n",pHash->word + memRoot,wordTemp,pHash->hashNum, hashTemp);
                        }
                        if (pHash->next != 0)
                        {
                            pHash = pHash->next + memRoot;
                            continue;
                        }
                        pHash->next = (uint32_t)(mem - memRoot);
                        pHash = mem;
                        declaremMem(sizeof(struct hashLinkList));
                        pHash->word = (uint32_t)(wordTemp - (char *)memRoot);
                        wordTemp += flag.extractedWordPosition + 1;
                        pHash->n = 1;
                        pHash->hashNum = hashTemp;
                        pHash->next = 0;
                        wordRecord[wordNum++] = (uint8_t *)pHash - (uint8_t *)memRoot;
                        break;
                    }
                    //process end here
                    flag.extractedWordPosition = 0;
                    flag.cpyLeft = 0;
                    flag.wordReadState = 0;
                    maskCopy.wordMask >>= flag.temp;
                    flag.wordShift = i;
                    continue;
                }
                else
                {
                    //putchar('1');
                    memcpy(&wordTemp[flag.extractedWordPosition], &pNowOutput[flag.cpyLeft], 16 - flag.cpyLeft);
                    flag.extractedWordPosition += 16 - flag.cpyLeft;
                    flag.cpyLeft = 0;
                    break;
                }
            }
            else
            {
                flag.temp = __builtin_ctz((uint32_t)maskCopy.wordMask | 0xFFF00000);
                //printf("mask is %x\n", maskCopy.wordMask);
                i = flag.wordShift + flag.temp;
                if (i < 16)
                {
                    flag.cpyLeft = i;
                    flag.wordReadState = 1;
                    maskCopy.wordMask >>= flag.temp;
                    flag.wordShift += flag.temp;
                }
                else
                {
                    break;
                }
            }
        }
        while (16 >= flag.sentenceSiginalPosition)
        {
            wordNumMem[flag.sentenceSiginalPosition] = wordNum;
            maskCopy.sentenceMask >>= flag.sentenceSiginalPosition - flag.sentenceShift + 1;
            flag.sentenceShift = flag.sentenceSiginalPosition + 1;
            flag.sentenceSiginalPosition = flag.sentenceShift + __builtin_ctz(maskCopy.sentenceMask | 0xFFF00000);
        }

        //sentence loop
        flag.sentenceShift = 0;
        for (uint8_t i = 0; i < 16;)
        {
            if (flag.sentenceState)
            {
                flag.temp = __builtin_ctz(mask.sentenceMask | 0xFFF00000);
                i = flag.sentenceShift + flag.temp;
                if (i < 16)
                {
                    //putchar(*(i + (uint8_t *)inputPassage));
                    if (wordNumMem[i] != wordBefore)
                    {
                        sentenceRecord[sentenceNum].wordStart = wordBefore;
                        sentenceRecord[sentenceNum].wordEnd = wordNumMem[i];
                        sentenceRecord[sentenceNum].sentenceBegin = sentenceLeft;
                        sentenceRecord[sentenceNum].sentenceEnd = i + shiftOfWord;
                        sentenceNum++;
                    }
                    wordBefore = wordNumMem[i];
                    //printf("wordNum is %d \n", wordBefore);
                    flag.sentenceState = 0;
                    i++;
                    mask.sentenceMask >>= flag.temp + 1;
                    mask.sentenceBeginMask >>= flag.temp + 1;
                    flag.sentenceShift += flag.temp + 1;
                    continue;
                }
                else
                {
                    break;
                }
            }
            else //outside a sentence
            {
                flag.temp = __builtin_ctz(mask.sentenceBeginMask | 0xFFF00000);
                i = flag.sentenceShift + flag.temp;

                if (i < 16)
                {
                    //printf("%s\n", pNowOutput);
                    //printBin(mask.sentenceBeginMask);
                    //printf("i:%d %c\n", i, *(i + (uint8_t *)inputPassage));
                    sentenceLeft = i + shiftOfWord; //performance
                    mask.sentenceBeginMask >>= flag.temp;
                    mask.sentenceMask >>= flag.temp;
                    flag.sentenceShift += flag.temp;
                    flag.sentenceState = 1;
                }
                else
                {
                    break;
                }
            }
        }
    }
    outBufLength = (uint8_t *)pNowOutput - (uint8_t *)outBuf;
    declaremMem(outBufLength);
}

void processStopWord()
{
    uint32_t lenth = 0;
    for (int i = 0; i < stopWordBufLength; i++)
    {
        while (isalpha(stopWordBuf[i]))
        {
            lenth++;
            i++;
        }
        if (lenth)
        {
            uint64_t hashTemp = XXH3_64bits(&stopWordBuf[i - lenth], lenth);
            uint32_t id = hashTemp & 0x03FFFF;
            if (hashRecord[id] == 0)
            {
                lenth = 0;
                continue;
            }
            pHash = hashRecord[id] + memRoot;
            while (1)
            {
                if (likely(hashTemp == pHash->hashNum))
                {
                    stopWordBuf[i] = 0;
                    //printf("match! 0x%x 0x%x\n", &stopWordBuf[i - lenth], pHash->word + memRoot);
                    if (likely(Mystrcmp((char *)(&stopWordBuf[i - lenth]), pHash->word + (char *)memRoot, lenth) == 0))
                    {
                        pHash->n = 0;
                        //printf("Delete %s\n", pHash->word + memRoot,&stopWordBuf[i +1]);
                        break;
                    }
                }
                if (pHash->next == 0)
                {
                    break;
                }
                pHash = pHash->next + memRoot;
            }
        }
        lenth = 0;
    }
}

int main()
{
    memAligned(16, 0);
    showRunTime();
    FILE *fp;
    fp = fopen("article.txt", "rb");
    passageBuf = mem;
    uint8_t *readNow = passageBuf;
    int inByte = 0;
    while ((inByte = fread(readNow, sizeof(char), 4 * 1024 * 1024, fp)) != 0)
    {
        readNow += inByte;
    }
    passageBufLenth = readNow - passageBuf;
    passageBuf[passageBufLenth] = 0;
    passageBufLenth += 1;
    declaremMem(passageBufLenth);
    memAligned(16, 0);
    fclose(fp);
    fp = fopen("stopwords.txt", "rb");
    stopWordBuf = mem;
    readNow = stopWordBuf;
    inByte = 0;
    while ((inByte = fread(readNow, sizeof(char), 4 * 1024 * 1024, fp)) != 0)
    {
        readNow += inByte;
    }
    stopWordBufLength = readNow - stopWordBuf;
    stopWordBuf[stopWordBufLength] = 0;
    stopWordBufLength += 1;
    declaremMem(stopWordBufLength);
    fclose(fp);
    showRunTime();
    processWord();
    processStopWord();
#if (DEBUG)
    printf("lenthof PassageBuf = %d\n", passageBufLenth);
    printf("lenthof stopWordBuf = %d\n", stopWordBufLength);
    printf("collitionTime = %d\n", collitionTime);
    printf("num of sentence = %d\n", sentenceNum);
    showRunTime();
#endif
    //outputpart start

    for (uint32_t i = 0; i < sentenceNum; i++)
    {
        int tempN = 0;
        for (uint32_t k = sentenceRecord[i].wordStart; k < sentenceRecord[i].wordEnd; k++)
        {
            tempN += ((struct hashLinkList *)(wordRecord[k] + memRoot))->n;
        }
        sentenceN[i].n = tempN;
        sentenceN[i].id = ~i;
        //printf("Sentence %d : %d\n", i + 1, tempN);
    }
    /* //debug
    char debugBuf[1000];
    while (scanf("%s", debugBuf) != EOF)
    {
        printf("The %s has frequence : %d\n", debugBuf, getDebugN(debugBuf));
    }*/

    //ranking: RBTree ranking first 100 element
    int outputN2, outputN;
#if (DEBUG == 0)
    scanf("%d", &outputN2);
#else
    outputN2 = DEBUGN;
#endif
    if (outputN2 < 5)
    {
        outputN = 5;
    }
    else
    {
        outputN = outputN2;
    }
    qsort(sentenceN, outputN, sizeof(long long), cmp);
    uint64_t *sentenceTopRecord = (uint64_t *)&sentenceN[0];
    /*int32_t outputNLargestBit = outputN;
    
    {
        uint8_t movePos = 0;
        while (outputNLargestBit >> movePos != 1)
        {
            movePos++;
        }
        outputNLargestBit = (1 << movePos);
    }
    //see the first k element of sentenceN as a minimal heap; and maintain it, so the first element is the minimal element
    */
    //seem to be useless

    //first get largest 100 num
    // maintain part begin
    for (int i = outputN; i < sentenceNum; i++)
    {
        //printf("%llu vs %llu\n", sentenceTopRecord[i], sentenceTopRecord[0]);
        if (sentenceTopRecord[i] > sentenceTopRecord[0])
        {
            //printf("swap!\n");
            swap(sentenceTopRecord[i], sentenceTopRecord[0]);
            int x = 0;
            while (1)
            {
                int l = 2 * x + 1, r = l + 1;
                if (l >= outputN)
                {
                    break;
                }
                if (r >= outputN)
                {
                    if (sentenceTopRecord[l] < sentenceTopRecord[x])
                    {
                        swap(sentenceTopRecord[l], sentenceTopRecord[x]);
                    }
                    break;
                }
                if (sentenceTopRecord[l] < sentenceTopRecord[x])
                {
                    if (sentenceTopRecord[l] < sentenceTopRecord[r])
                    {
                        swap(sentenceTopRecord[l], sentenceTopRecord[x]);
                        x = l;
                    }
                    else
                    {
                        swap(sentenceTopRecord[r], sentenceTopRecord[x]);
                        x = r;
                    }
                    continue;
                }
                if (sentenceTopRecord[r] < sentenceTopRecord[x])
                {
                    swap(sentenceTopRecord[r], sentenceTopRecord[x]);
                    x = r;
                    continue;
                }
                break;
            }
        }
    }
    // maintain part end

    qsort(sentenceN, outputN, sizeof(long long), cmp);
    showRunTime();

    for (int i = outputN - 1; i > outputN - 1 - 5; i--)
    {
        printf("%d ", sentenceN[i].n);
        /*for (char *j = sentenceRecord[sentenceN[i].id].begin; j <= sentenceRecord[sentenceN[i].id].end; j++)
        {
            putchar(*j);
        }*/
        int id = ~sentenceN[i].id;
        fwrite(sentenceRecord[id].sentenceBegin + memRoot, sizeof(char), sentenceRecord[id].sentenceEnd - sentenceRecord[id].sentenceBegin + 1, stdout);
        putchar('\n');
    }

    int air = 0;
    if (outputN > sentenceNum)
    {
        air = outputN - sentenceNum;
    }
#if (DEBUG)
    freopen("lowerArticle.txt", "wb", stdout);
    for (int i = outputN - 1; i >= 0; i--)
    {
        int id = ~sentenceN[i].id;
        printf("%d %d ", sentenceN[i].n, sentenceRecord[id].wordEnd - sentenceRecord[id].wordStart);
        /*for (char *j = sentenceRecord[sentenceN[i].id].begin; j <= sentenceRecord[sentenceN[i].id].end; j++)
        {
            putchar(*j);
        }*/
        fwrite(sentenceRecord[id].sentenceBegin + memRoot, sizeof(char), sentenceRecord[id].sentenceEnd - sentenceRecord[id].sentenceBegin + 1, stdout);
        putchar('\n');
        for (uint32_t k = sentenceRecord[id].wordStart; k < sentenceRecord[id].wordEnd; k++)
        {
            printf("%s:%d ", (((struct hashLinkList *)(wordRecord[k] + memRoot))->word + memRoot), (((struct hashLinkList *)(wordRecord[k] + memRoot))->n));
        }
        putchar('\n');
    }
#else
    freopen("results.txt", "wb", stdout);
    for (int i = outputN - 1; i >= air; i--)
    {
        int id = ~sentenceN[i].id;
        printf("%d ", sentenceN[i].n);
        /*for (char *j = sentenceRecord[sentenceN[i].id].begin; j <= sentenceRecord[sentenceN[i].id].end; j++)
        {
            putchar(*j);
        }*/
        fwrite(sentenceRecord[id].sentenceBegin + memRoot, sizeof(char), sentenceRecord[id].sentenceEnd - sentenceRecord[id].sentenceBegin + 1, stdout);
        putchar('\n');
    }
#endif
    for (int i = 0; i < air; i++)
    {
        putchar('0');
        putchar('\n');
    }
    return 0;

    showRunTime();
}

        参考代码3-Hash版本,有SIMD优化,展开SIMD并重写部分逻辑,无冲突检测,这个版本比较难看(最终提交版本)

#define XXH_INLINE_ALL
#define DEBUG 0
#define DEBUGN 30000
#define PULLOVER
#define sizeOfMem 128l * 1024l * 1024l
#define WordSize 4096 * 1024
#define SentenceSize 512 * 1024
#include "xxhash.h"
#include <assert.h>
#include <ctype.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
#include <x86intrin.h>

uint8_t memRoot[sizeOfMem] __attribute__((aligned(16))) = {0};
uint8_t *mem = memRoot, *passageBuf, *stopWordBuf, *outBuf;
#define swap(a, b) \
    do             \
    {              \
        a ^= b;    \
        b ^= a;    \
        a ^= b;    \
    } while (0);
struct hashLinkList
{
    uint64_t hashNum; //last part of 64bits hash
    uint32_t next;    //location relative to memRoot
    uint32_t n;
} * pHash;

uint32_t hashRecord[256l * 1024l] = {0}; //location relative to memRoot
uint32_t wordRecord[WordSize] = {0};

struct sentenceRecord
{
    uint32_t sentenceBegin;
    uint32_t sentenceEnd; //storage from start to end
    uint32_t wordStart;
    uint32_t wordEnd;
} sentenceRecord[SentenceSize] = {0};

struct nRecord
{
    uint32_t id;
    uint32_t n;
} sentenceN[SentenceSize] = {0};

uint32_t collitionTime = 0;
#if (DEBUG)
long long timeBuf;
#define showRunTime()                                                                                                     \
    do                                                                                                                    \
    {                                                                                                                     \
        long long time = clock();                                                                                         \
        timeBuf = time - timeBuf;                                                                                         \
        printf("from begin %lld ms, from last %lld ms\n", time * 1000 / CLOCKS_PER_SEC, timeBuf * 1000 / CLOCKS_PER_SEC); \
        timeBuf = time;                                                                                                   \
    } while (0);
#else
#define showRunTime() \
    do                \
    {                 \
        ;             \
    } while (0);
#endif

int cmp(const void *a, const void *b)
{
    return (int)(((*((long long *)a) - *((long long *)b)) >> 32) | 1);
}



inline uint16_t Mystrcmp(char *a, char *b, uint32_t len)
{
#ifndef IGNOREHASHCOLLICTION
    if (a[len] != 0 || b[len] != 0)
        return 0;
    register uint64_t index = 0xFFFF;
    for (uint8_t i = 0; i < len; i += 16)
    {
        __asm__ __volatile__("movdqu (%1), %%xmm0\nmovdqu (%2), %%xmm1\npcmpistrm $0x08, %%xmm0, %%xmm1\npextrw $0, %%xmm0, %%r13\nandq %%r13, %0\n"
                             : "+r"(index)
                             : "rm"(a + i),
                               "rm"(b + i)
                             : "%xmm0", "%xmm1", "%r13");
        if (index != 0xFFFF)
            break;
    }
    /*if(ret != strcmp(a,b))
        {
            printf("%s %s %d %d dont match\n",a,b,ret,strcmp(a,b));
        }*/
    if (~(uint16_t)index)
        printf("collition!\n");
    return ~(uint16_t)index;
#else
    return strcmp(a, b);
#endif
}

void printBin(uint64_t Oin)
{
    uint64_t in = Oin;
    for (int8_t y = 0; y < 64; y++)
    {
        putchar('0' + (((1ull << y) & in) ? 1 : 0));
    }
    putchar('\n');
}

inline void declaremMem(uint32_t size)
{
    mem += size;
}

inline void memAligned(uint32_t N, uint32_t position)
{
    declaremMem((N + position - ((uint64_t)mem % N)) % N);
}

uint32_t passageBufLenth, stopWordBufLength, outBufLength;
uint32_t wordNum = 0, wordNumMem[64] = {0}, wordBefore = 0, sentenceLeft = 0, sentenceNum = 0;
void processWord()
{
    struct Mask
    {
        uint64_t wordMask;
        uint64_t sentenceMask;
        uint64_t gapMask;
        uint64_t sentenceBeginMask;
    } maskList[8];
    struct Mask maskCopy;

    //mask allocation

    //char upperMaskList[16] __attribute__((aligned(16))) = "AZ";
    char lowerMaskList[16] __attribute__((aligned(16))) = "AZaz";
    char andMaskList[16] __attribute__((aligned(16))) = {
        0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,
        0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20}; //actually is a orMaskList, but i'd wanna to change it name
    char endSentenceMaskList[16] __attribute__((aligned(16))) = ".!?";
    char gapMaskList[16] __attribute__((aligned(16))) = {0x20, '\n'};

    pHash = NULL;
    register char *inputPassage = (char *)passageBuf;
    register char *endOfPassage = (char *)passageBuf + passageBufLenth;
    memAligned(16, 0);
    outBuf = mem;
    char *pNow = (char *)mem;
    char *pNowL = pNow;
    char *newestWordL = NULL;
    char *pWordL = NULL;
    declaremMem(4096);
    struct
    {
        uint16_t wordReadState;
        uint16_t sentenceState;
        uint16_t wordShift;
        uint16_t sentenceShift;
        uint16_t temp;
        uint16_t sentenceSiginalPosition;
        uint32_t unused;
    } flag = {0, 0, 0, 0, 0, 0, 0};
    while (inputPassage < endOfPassage)
    {
        __asm__ __volatile__("pinsrq $0, (%4), %%xmm5\n"
                             "pinsrq $0, (%5), %%xmm6\n"
                             "pinsrw $0, (%6), %%xmm7\n"

                             "movq $0, %%r14\n" //reset
                             "movq $0, %%r13\n"
                             "movq $0, %%r12\n"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 0(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 0(%1)\n"           // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n" // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 16(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 16(%1)\n"          // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 32(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 32(%1)\n"          // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 48(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 48(%1)\n"          // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "movq %%r14, 0(%0)\n"
                             "movq %%r13, 8(%0)\n"
                             "movq %%r12, 16(%0)\n"

                             //5-8 tims
                             "movq $0, %%r14\n" //reset
                             "movq $0, %%r13\n"
                             "movq $0, %%r12\n"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 64(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 64(%1)\n"          // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n" // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 80(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 80(%1)\n"          // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 96(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 96(%1)\n"          // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 112(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 112(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "movq %%r14, 32(%0)\n"
                             "movq %%r13, 40(%0)\n"
                             "movq %%r12, 48(%0)\n"

                             "movq $0, %%r14\n" //reset
                             "movq $0, %%r13\n"
                             "movq $0, %%r12\n"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 128(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 128(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n" // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 144(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 144(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 160(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 160(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 176(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 176(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "movq %%r14, 64(%0)\n"
                             "movq %%r13, 72(%0)\n"
                             "movq %%r12, 80(%0)\n"

                             //9-12 tims
                             "movq $0, %%r14\n" //reset
                             "movq $0, %%r13\n"
                             "movq $0, %%r12\n"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 192(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 192(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n" // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 208(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 208(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 224(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 224(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 240(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 240(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "movq %%r14, 96(%0)\n"
                             "movq %%r13, 104(%0)\n"
                             "movq %%r12, 112(%0)\n"

                             "movq $0, %%r14\n" //reset
                             "movq $0, %%r13\n"
                             "movq $0, %%r12\n"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 256(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 256(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n" // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 272(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 272(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 288(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 288(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 304(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 304(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "movq %%r14, 128(%0)\n"
                             "movq %%r13, 136(%0)\n"
                             "movq %%r12, 144(%0)\n"

                             //13-16 tims
                             "movq $0, %%r14\n" //reset
                             "movq $0, %%r13\n"
                             "movq $0, %%r12\n"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 320(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 320(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n" // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 336(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 336(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 352(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 352(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 368(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 368(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "movq %%r14, 160(%0)\n"
                             "movq %%r13, 168(%0)\n"
                             "movq %%r12, 176(%0)\n"

                             "movq $0, %%r14\n" //reset
                             "movq $0, %%r13\n"
                             "movq $0, %%r12\n"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 384(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 384(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n" // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 400(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 400(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 416(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 416(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 432(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 432(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "movq %%r14, 192(%0)\n"
                             "movq %%r13, 200(%0)\n"
                             "movq %%r12, 208(%0)\n"

                             "movq $0, %%r14\n" //reset
                             "movq $0, %%r13\n"
                             "movq $0, %%r12\n"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 448(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 448(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             //"salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n" // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 464(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 464(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $16, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 480(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 480(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $32, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "vmovaps %3, %%xmm3\n"
                             "vmovaps 496(%2), %%xmm4\n"
                             "por %%xmm4, %%xmm3\n"              // behind was the result
                             "vmovaps %%xmm3, 496(%1)\n"         // save the lower output
                             "pcmpistrm $0x04, %%xmm4, %%xmm5\n" // generate character mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word and save to r15w register
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r14\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm6\n" // generate sentence mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r13\n"                // save data to "Mask"
                             "pcmpistrm $0x00, %%xmm4, %%xmm7\n" // generate gap mask in xmm0's first word
                             "pextrw $0, %%xmm0, %%r15\n"        // extra data from xmm0's first word
                             "salq $48, %%r15\n"                 // do a shift
                             "orq %%r15, %%r12\n"                // save data to "Mask"

                             "movq %%r14, 224(%0)\n"
                             "movq %%r13, 232(%0)\n"
                             "movq %%r12, 240(%0)\n"
                             :
                             : "r"(maskList),     //0
                               "r"(pNow),         //1
                               "r"(inputPassage), //2
                               //"r"((__m128i *)upperMaskList),        //3 unused
                               "x"(*(__m128i *)andMaskList),         //3
                               "r"((uint64_t *)lowerMaskList),       //4
                               "r"((uint64_t *)endSentenceMaskList), //5
                               "r"((uint32_t *)gapMaskList)          //6
                             : "%xmm0", "%xmm3", "%xmm4", "%xmm5", "%xmm6", "%xmm7", "%r12" /*gap*/, "%r13" /*sentence*/, "%r14" /*character*/, "%r15");
        for (int numOfMask = 0; numOfMask < 8; numOfMask++)
        {
            maskList[numOfMask].sentenceBeginMask = ~(maskList[numOfMask].gapMask | maskList[numOfMask].sentenceMask);

            //word loop start, get the hash and store it in mem.
            maskCopy = maskList[numOfMask];
            flag.sentenceShift = 0;
            flag.wordShift = 0;
            flag.sentenceSiginalPosition = __builtin_ctzll(maskCopy.sentenceMask);
            //word loop
            register uint32_t shiftOfWord = (uint8_t *)inputPassage - (uint8_t *)memRoot;

            /*if (shiftOfWord >= 1522240 && shiftOfWord <= 1522240 + 128)
            {
                fwrite(inputPassage, 1, 64, stdout);
                putchar('\n');
                printBin(maskList[numOfMask].wordMask);
                printBin(maskList[numOfMask].sentenceMask);
                printBin(maskList[numOfMask].sentenceBeginMask);
                printBin(maskList[numOfMask].gapMask); 
            } //*/
            for (int16_t i = 0; i < 64;)
            {
                while (i > flag.sentenceSiginalPosition)
                {
                    wordNumMem[flag.sentenceSiginalPosition] = wordNum;
                    maskCopy.sentenceMask >>= flag.sentenceSiginalPosition - flag.sentenceShift + 1;
                    flag.sentenceShift = flag.sentenceSiginalPosition + 1;
                    flag.sentenceSiginalPosition = flag.sentenceShift + __builtin_ctzll(maskCopy.sentenceMask);
                }
                if (flag.wordReadState)
                {
                    flag.temp = __builtin_ctzll((~maskCopy.wordMask));
                    i = flag.wordShift + flag.temp;
                    /*if (shiftOfWord >= 1522240 && shiftOfWord <= 1522240 + 128)
                    {
                        printf("inWord temp is : %d\ni is %d\n", flag.temp,i);
                    }//*/
                    if (i < 64)
                    {
                        //printf("%s i: %d\n", outBuf, i);
                        uint32_t length = i + pNow - newestWordL; //lenth
                        uint64_t hashTemp = XXH3_64bits(newestWordL, length);
                        //*(i + pNow) = 0;
                        //printf("\"%s\" 's hash is %llu\n", wordTemp, hashTemp);
                        uint32_t id = hashTemp & 0x03FFFF;
                        hashTemp ^= length;
                        //futher hash process
                        if (hashRecord[id] == 0)
                        {
                            hashRecord[id] = (uint32_t)(mem - memRoot);
                            declaremMem(sizeof(struct hashLinkList));
                            pHash = (struct hashLinkList *)(hashRecord[id] + memRoot);
                            pHash->hashNum = hashTemp;
                            pHash->next = 0;
                            pHash->n = 1;

                            wordRecord[wordNum++] = (uint8_t *)pHash - (uint8_t *)memRoot;

                            flag.wordReadState = 0;
                            maskCopy.wordMask >>= flag.temp;
                            flag.wordShift = i;

                            continue;
                        }
                        pHash = (struct hashLinkList *)(hashRecord[id] + memRoot);
                        while (1)
                        {
                            if (likely(pHash->hashNum == hashTemp))
                            {
                                pHash->n += 1;
                                wordRecord[wordNum++] = (uint8_t *)pHash - (uint8_t *)memRoot;
                                /*if (hashTemp == (XXH3_64bits("ter", 3) ^ 3))
                                {
                                    fwrite(pNow, 1, 64, stdout);
                                    putchar('\n');
                                    fwrite(newestWordL, 1, 3, stdout);
                                    printf("\n%d sentence\n find ter %d in %d %p!\n", shiftOfWord, pHash->n, i, pNow);
                                }//*/
                                break;
                                //printf("collition between %s %s %x %x!\n",pHash->word + memRoot,wordTemp,pHash->hashNum, hashTemp);
                            }
                            if (pHash->next != 0)
                            {
                                pHash = (struct hashLinkList *)(pHash->next + memRoot);
                                continue;
                            }
                            pHash->next = (uint32_t)(mem - memRoot);
                            pHash = (struct hashLinkList *)mem;
                            declaremMem(sizeof(struct hashLinkList));
                            pHash->n = 1;
                            pHash->hashNum = hashTemp;
                            pHash->next = 0;
                            wordRecord[wordNum++] = (uint8_t *)pHash - (uint8_t *)memRoot;
                            break;
                        }
                        //process end here

                        flag.wordReadState = 0;
                        maskCopy.wordMask >>= flag.temp;
                        flag.wordShift = i;
                        continue;
                    }
                    else
                    {
                        break;
                    }
                }
                else
                {
                    flag.temp = __builtin_ctzll((uint64_t)maskCopy.wordMask);
                    //printf("maskList[numOfMask] is %x\n", maskCopy.wordMask);
                    i = flag.wordShift + flag.temp;
                    /*if (shiftOfWord >= 1522240 && shiftOfWord <= 1522240 + 128)
                    {
                        printf("outWord temp is : %d\ni is %d\n", flag.temp,i);
                    }//*/
                    if (i < 64)
                    {
                        newestWordL = i + pNow;
                        pWordL = pNow;
                        flag.wordReadState = 1;
                        maskCopy.wordMask >>= flag.temp;
                        flag.wordShift += flag.temp;
                        continue;
                    }
                    else
                    {
                        break;
                    }
                }
            }

            while (64 >= flag.sentenceSiginalPosition)
            {
                wordNumMem[flag.sentenceSiginalPosition] = wordNum;
                maskCopy.sentenceMask >>= flag.sentenceSiginalPosition - flag.sentenceShift + 1;
                flag.sentenceShift = flag.sentenceSiginalPosition + 1;
                flag.sentenceSiginalPosition = flag.sentenceShift + __builtin_ctzll(maskCopy.sentenceMask);
            }

            //sentence loop
            flag.sentenceShift = 0;
            for (int8_t i = 0; i < 64;)
            {
                if (flag.sentenceState)
                {
                    flag.temp = __builtin_ctzll(maskList[numOfMask].sentenceMask);
                    i = flag.sentenceShift + flag.temp;
                    if (i < 64)
                    {
                        //putchar(*(i + (uint8_t *)inputPassage));
                        if (wordNumMem[i] != wordBefore)
                        {
                            sentenceRecord[sentenceNum].wordStart = wordBefore;
                            sentenceRecord[sentenceNum].wordEnd = wordNumMem[i];
                            sentenceRecord[sentenceNum].sentenceBegin = sentenceLeft;
                            sentenceRecord[sentenceNum].sentenceEnd = i + shiftOfWord;
                            sentenceNum++;
                        }
                        wordBefore = wordNumMem[i];
                        //printf("wordNum is %d \n", wordBefore);
                        flag.sentenceState = 0;
                        i++;
                        maskList[numOfMask].sentenceMask >>= flag.temp + 1;
                        maskList[numOfMask].sentenceBeginMask >>= flag.temp + 1;
                        flag.sentenceShift += flag.temp + 1;
                        continue;
                    }
                    else
                    {
                        break;
                    }
                }
                else //outside a sentence
                {
                    flag.temp = __builtin_ctzll(maskList[numOfMask].sentenceBeginMask);
                    i = flag.sentenceShift + flag.temp;

                    if (i < 64)
                    {
                        //printf("%s\n", pNowOutput);
                        //printBin(maskList[numOfMask].sentenceBeginMask);
                        //printf("i:%d %c\n", i, *(i + (uint8_t *)inputPassage));
                        sentenceLeft = i + shiftOfWord; //performance
                        maskList[numOfMask].sentenceBeginMask >>= flag.temp;
                        maskList[numOfMask].sentenceMask >>= flag.temp;
                        flag.sentenceShift += flag.temp;
                        flag.sentenceState = 1;
                    }
                    else
                    {
                        break;
                    }
                }
            }
            pNow += 64;
            inputPassage += 64;
        }
        //pNow process
        if (flag.wordReadState)
        {
            while ((newestWordL - pNowL) / 2048)
            {
                //printf("shift %d\n", pNow - pWordL);
                memcpy(pNowL, pWordL, pNow - pWordL);
                pNow -= pWordL - pNowL;
                newestWordL -= pWordL - pNowL;
                pWordL = pNowL;
            }
        }
        else
        {
            pNow = pNowL;
        }
    }
}

void processStopWord()
{
    uint32_t lenth = 0;
    for (int i = 0; i < stopWordBufLength; i++)
    {
        while (likely(isalpha(stopWordBuf[i])))
        {
            lenth++;
            i++;
        }
        if (likely(lenth))
        {
            uint64_t hashTemp = XXH3_64bits(&stopWordBuf[i - lenth], lenth);
            uint32_t id = hashTemp & 0x03FFFF;
            hashTemp ^= lenth;
            if (hashRecord[id] == 0)
            {
                lenth = 0;
                continue;
            }
            pHash = (struct hashLinkList *)(hashRecord[id] + memRoot);
            while (1)
            {
                if (likely(hashTemp == pHash->hashNum))
                {
                    stopWordBuf[i] = 0;
                    //printf("match! 0x%x 0x%x\n", &stopWordBuf[i - lenth], pHash->word + memRoot);
                    pHash->n = 0;
                    //printf("Delete %s\n", pHash->word + memRoot,&stopWordBuf[i +1]);
                    break;
                }
                if (pHash->next == 0)
                {
                    break;
                }
                pHash = (struct hashLinkList *)(pHash->next + memRoot);
            }
        }
        lenth = 0;
    }
}

int main()
{
    memAligned(16, 0);
    showRunTime();
    FILE *fp;
    fp = fopen("article.txt", "rb");
    passageBuf = mem;
    uint8_t *readNow = passageBuf;
    int inByte = 0;
    while ((inByte = fread(readNow, sizeof(char), 4 * 1024 * 1024, fp)) != 0)
    {
        readNow += inByte;
    }
    passageBufLenth = readNow - passageBuf;
    passageBuf[passageBufLenth] = 0;
    passageBufLenth += 1;
    declaremMem(passageBufLenth);
    memAligned(512, 0);
    memset(readNow, 0, mem - readNow);
    fclose(fp);
    fp = fopen("stopwords.txt", "rb");
    stopWordBuf = mem;
    readNow = stopWordBuf;
    inByte = 0;
    while ((inByte = fread(readNow, sizeof(char), 4 * 1024 * 1024, fp)) != 0)
    {
        readNow += inByte;
    }
    stopWordBufLength = readNow - stopWordBuf;
    stopWordBuf[stopWordBufLength] = 0;
    stopWordBufLength += 1;
    declaremMem(stopWordBufLength);
    fclose(fp);
    showRunTime();
    processWord();
    processStopWord();
#if (DEBUG)
    printf("lenthof PassageBuf = %d\n", passageBufLenth);
    printf("lenthof stopWordBuf = %d\n", stopWordBufLength);
    printf("collitionTime = %d\n", collitionTime);
    printf("num of sentence = %d\n", sentenceNum);
    printf("num of word = %d\n", sentenceRecord[sentenceNum - 1].wordEnd - sentenceRecord[0].wordStart);
    showRunTime();
#endif
    //outputpart start
    for (uint32_t i = 0; i < sentenceNum; i++)
    {
        int tempN = 0;
        for (uint32_t k = sentenceRecord[i].wordStart; k < sentenceRecord[i].wordEnd; k++)
        {
            tempN += ((struct hashLinkList *)(wordRecord[k] + memRoot))->n;
        }
        sentenceN[i].n = tempN;
        sentenceN[i].id = ~i;
        //printf("Sentence %d : %d\n", i + 1, tempN);
    }
    /* //debug
    char debugBuf[1000];
    while (scanf("%s", debugBuf) != EOF)
    {
        printf("The %s has frequence : %d\n", debugBuf, getDebugN(debugBuf));
    }*/

    //ranking: RBTree ranking first 100 element
    int outputN2, outputN;
#if (DEBUG == 0)
    scanf("%d", &outputN2);
#else
    outputN2 = DEBUGN;
#endif
    if (outputN2 < 5)
    {
        outputN = 5;
    }
    else
    {
        outputN = outputN2;
    }
    qsort(sentenceN, outputN, sizeof(long long), cmp);
    uint64_t *sentenceTopRecord = (uint64_t *)&sentenceN[0];
    /*int32_t outputNLargestBit = outputN;
    
    {
        uint8_t movePos = 0;
        while (outputNLargestBit >> movePos != 1)
        {
            movePos++;
        }
        outputNLargestBit = (1 << movePos);
    }
    //see the first k element of sentenceN as a minimal heap; and maintain it, so the first element is the minimal element
    */
    //seem to be useless

    //first get largest 100 num
    // maintain part begin
    for (int i = outputN; i < sentenceNum; i++)
    {
        //printf("%llu vs %llu\n", sentenceTopRecord[i], sentenceTopRecord[0]);
        if (sentenceTopRecord[i] > sentenceTopRecord[0])
        {
            //printf("swap!\n");
            uint64_t tempSwap = sentenceTopRecord[i];
            sentenceTopRecord[i] = sentenceTopRecord[0];
            sentenceTopRecord[0] = tempSwap;
            int x = 0;
            while (1)
            {
                int l = 2 * x + 1, r = l + 1;
                if (l >= outputN)
                {
                    break;
                }
                if (r >= outputN)
                {
                    if (sentenceTopRecord[l] < sentenceTopRecord[x])
                    {
                        tempSwap = sentenceTopRecord[l];
                        sentenceTopRecord[l] = sentenceTopRecord[x];
                        sentenceTopRecord[x] = tempSwap;
                    }
                    break;
                }
                if (sentenceTopRecord[l] < sentenceTopRecord[x])
                {
                    if (sentenceTopRecord[l] < sentenceTopRecord[r])
                    {
                        tempSwap = sentenceTopRecord[l];
                        sentenceTopRecord[l] = sentenceTopRecord[x];
                        sentenceTopRecord[x] = tempSwap;
                        x = l;
                    }
                    else
                    {
                        tempSwap = sentenceTopRecord[r];
                        sentenceTopRecord[r] = sentenceTopRecord[x];
                        sentenceTopRecord[x] = tempSwap;
                        x = r;
                    }
                    continue;
                }
                if (sentenceTopRecord[r] < sentenceTopRecord[x])
                {
                    tempSwap = sentenceTopRecord[r];
                    sentenceTopRecord[r] = sentenceTopRecord[x];
                    sentenceTopRecord[x] = tempSwap;
                    x = r;
                    continue;
                }
                break;
            }
        }
    }
    // maintain part end

    qsort(sentenceN, outputN, sizeof(long long), cmp);
    showRunTime();

    for (int i = outputN - 1; i > outputN - 1 - 5; i--)
    {
        printf("%d ", sentenceN[i].n);
        /*for (char *j = sentenceRecord[sentenceN[i].id].begin; j <= sentenceRecord[sentenceN[i].id].end; j++)
        {
            putchar(*j);
        }*/
        int id = ~sentenceN[i].id;
        fwrite(sentenceRecord[id].sentenceBegin + memRoot, sizeof(char), sentenceRecord[id].sentenceEnd - sentenceRecord[id].sentenceBegin + 1, stdout);
        putchar('\n');
    }

    int air = 0;
    if (outputN > sentenceNum)
    {
        air = outputN - sentenceNum;
    }
#if (DEBUG)
    freopen("lowerArticle.txt", "wb", stdout);
    for (int i = outputN - 1; i >= 0; i--)
    {
        int id = ~sentenceN[i].id;
        printf("%d %d ", sentenceN[i].n, sentenceRecord[id].wordEnd - sentenceRecord[id].wordStart);
        /*for (char *j = sentenceRecord[sentenceN[i].id].begin; j <= sentenceRecord[sentenceN[i].id].end; j++)
        {
            putchar(*j);
        }*/
        fwrite(sentenceRecord[id].sentenceBegin + memRoot, sizeof(char), sentenceRecord[id].sentenceEnd - sentenceRecord[id].sentenceBegin + 1, stdout);
        putchar('\n');
        for (uint32_t k = sentenceRecord[id].wordStart; k < sentenceRecord[id].wordEnd; k++)
        {
            printf("%d ", (((struct hashLinkList *)(wordRecord[k] + memRoot))->n));
        }
        putchar('\n');
    }
#else
    freopen("results.txt", "wb", stdout);
    for (int i = outputN - 1; i >= air; i--)
    {
        int id = ~sentenceN[i].id;
        printf("%d ", sentenceN[i].n);
        /*for (char *j = sentenceRecord[sentenceN[i].id].begin; j <= sentenceRecord[sentenceN[i].id].end; j++)
        {
            putchar(*j);
        }*/
        fwrite(sentenceRecord[id].sentenceBegin + memRoot, sizeof(char), sentenceRecord[id].sentenceEnd - sentenceRecord[id].sentenceBegin + 1, stdout);
        putchar('\n');
    }
#endif
    for (int i = 0; i < air; i++)
    {
        putchar('0');
        putchar('\n');
    }
    showRunTime();
    return 0;
}

关于SIMD

        待续...

  • 3
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 4
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值