XVII Open Cup named after E.V. Pankratiev. Eastern Grand Prix. Problem G. Gmoogle 模拟、字符串处理、文本搜索

最新推荐文章于 2019-11-03 20:34:43 发布

ProLightsfxjh

最新推荐文章于 2019-11-03 20:34:43 发布

阅读量834

点赞数

分类专栏：字符串：未分类模拟 OpenCup 算法的艺术文章标签： ACM 模拟字符串处理文本搜索

本文链接：https://blog.csdn.net/ProLightsfxjh/article/details/78750987

版权

算法的艺术同时被 3 个专栏收录

330 篇文章 313 订阅

订阅专栏

OpenCup

12 篇文章 0 订阅

订阅专栏

字符串：未分类

11 篇文章 0 订阅

订阅专栏

XVII Open Cup named after E.V. Pankratiev. Eastern Grand Prix.

Problem G. Gmoogle
Input le: standard input
Output le: standard output
Time limit: 1 second
Memory limit: 256 megabytes

You are hired to create alpha version of the new searching engine named GMoogle. Alpha version should
work with the content, represented as a database of sentences:
• Content is merged into line S, consisting of characters `a'-`z', `A'-`Z', spaces, notation marks (\.!?")
(quotes are not counted) and decimal digits.
• If one of characters .!?" presents in the S, then it denotes the end of the sentence, except for
one special case: if rst non-space character after `.' is lowercase English letter, then it is an
abbreviation sign but not the end of the sentence; for example, string I like tea in a 500
ml. cup" contains one sentence, but strings Cup is 500 ml. I want it" and Cup is 500
ml. 500 ml is great for me" contains two sentences).
• First non-space character after the end of sentence is considered as the rst character of the new
sentence.
• word is contiguous sequence of characters `a'-'z', `A'-`Z', delimited by spaces, notation signs or
beginning/end of the sentence/string. It is guaranteed that digits can not be neighbors of the
letters, i.e. sequences like 10ml" or R2D2" are illegal.
• S may contain the sentences containing no words. It is guaranteed that S does not contains two or
more characters .!?" in a row.
After the content is indexed, users make requests. Each request can be represented as a string q, consisting
of one or more words (de nition of the word is given above). Words are separated by arbitrary number
of spaces (1 or more), heading and trailing spaces are possible.
Your program has to print all sentences from S, where all words from q are presented (in any order).
Words are considered equal, if all the letters at the corresponding positions are the same (case insensitive,
i.e. `B' and `b' are considered the same.

Input
First line of the input contains non-empty line S, consisting of no more than 1000 characters. Next line
contains one integer n (1 n 100) | number of the requests. Then n requests q1; : : : ; qn follow, each
on separate line in the format, described above. Note that in S and qi trailing and heading spaces are
allowed.

Output
For each request q1; q2; :::; qn print the request at the separate line. Then print the list of found sentences
in same order they present in S, one sentence per line. Requests and answers are printed in the quotes;
answers are preceeded by single `-' and single space; heading and trailing spaces must be eliminated.
Look the sample for clarify.

Example

standard input
Hello everyone. I want 2 coffee if
you have it. I like coffee very much.
4
HELLO
Coffee
much coffee
VoDka

standard output
Search results for "HELLO":
- "Hello everyone."
Search results for "Coffee":
- "I want 2 coffee if you have it."
- "I like coffee very much."
Search results for "much coffee":
- "I like coffee very much."
Search results for "VoDka":

Source

XVII Open Cup named after E.V. Pankratiev. Eastern Grand Prix.

My Solution

题意：要求模拟一个搜索系统，给出文本，然后每次查询几个单词要求输出所以出现查询单词的句子。

模拟、字符串处理、文本搜索

先把文本预处理成一个一个单独的句子，并标号0、1、2......，并且用map<string, vector<int>>建立单词到句子的映射。

然后对于每个单独查询的每个单词都会有一个集合，然后对这些集合取一个交集就是答案了。

这里用到的求交集的方法是是用一个map<int, int> check表示这些集合里每个句子出现的次数，最后遍历一遍check，

出现次数为查询的单词的个数的句子构成的集合就是所求的交集。

注意点：1、一个句子里可能出现几个相同的单词，建立映射的时候，一个单词只映射一次到该句子。

2、当'.'后面的第一个非空字符是小写字母时，这里不是句子的结束。

3、这里文本的最后一句可能没有标点符号且可能有很多空格，处理一下即可。

4、故意把文本处理成单个句子的方法是先拿出单独的句子，然后确定该句在此处结尾时，在建立这句的单词带这句话的映射。

5、无论是单词的映射还是查询，都全部用cctype里的isuppper和tolower来转化成小写字母进行比较。

时间复杂度 O（nlogn + k*qlogn）

空间复杂度 O（n）

#include <bits/stdc++.h>
using namespace std;
string s, word, line;
vector<string> senc;
map<string, vector<int>> mp;
map<int, int> check;
int main () {
    #ifdef LOCAL
    freopen("g.txt", "r", stdin);
    #endif // LOCAL

    getline(cin, s);
    int n, sz = s.size(), i, j, len, cnt = 0, k;
    while(s[sz-1] == ' '){
        sz--;
    }
    bool flag;
    for(i = 0; i < sz; i++){
        if(s[i] == '.' || s[i] == '!' || s[i] == '?'){
            flag = true;
            if(s[i] == '.'){

                for(j = i + 1; j < sz; j++){
                    if(islower(s[j])){
                        flag = false;//cout <<"?"<<endl;
                        break;
                    }
                    else if(s[j] != ' ' && s[j] != '\0'){
                            //cout << s[j] << " ? \n";
                        break;
                    }

                }

            }
            if(!flag){
                line += s[i];
                continue;
            }

            len = line.size();
            if(len != 0){
                //cout << line << endl;
                for(j = 0; j < len; j++){
                    if(islower(line[j])){
                        word += line[j];
                    }
                    else if(isupper(line[j])){
                        word += tolower(line[j]);
                    }
                    else if(!word.empty()){
                        if(mp[word].empty() || (!mp[word].empty() && mp[word].back() != cnt))
                            mp[word].push_back(cnt);
                        //cout << word << " " << cnt << endl;
                        word.clear();
                    }
                }
                if(!word.empty()){
                    if(mp[word].empty() || (!mp[word].empty() && mp[word].back() != cnt))
                        mp[word].push_back(cnt);
                    //cout << word << " " << cnt << endl;
                word.clear();
                }
                line += s[i];
                senc.push_back(line);
                line.clear();
                cnt++;
            }
        }
        else{
            if(line.size() == 0 && (s[i] == ' ' || s[i] == '\0')){ //!
                    ;
            }
            else{
                line += s[i];
            }
        }
    }
    len = line.size();
    if(len != 0){
                //cout << line << endl;
                for(j = 0; j < len; j++){
                    if(islower(line[j])){
                        word += line[j];
                    }
                    else if(isupper(line[j])){
                        word += tolower(line[j]);
                    }
                    else if(!word.empty()){
                        if(mp[word].empty() || (!mp[word].empty() && mp[word].back() != cnt))
                            mp[word].push_back(cnt);
                        //cout << word << " " << cnt << endl;
                        word.clear();
                    }
                }
                if(!word.empty()){
                    if(mp[word].empty() || (!mp[word].empty() && mp[word].back() != cnt))
                        mp[word].push_back(cnt);
                    //cout << word << " " << cnt << endl;
                word.clear();
                }
                //line += s[i];
                senc.push_back(line);
                line.clear();
                cnt++;
            }
    /*
    for(auto x = mp.begin(); x != mp.end(); x++){
        cout << (x->first) << endl;
        sz = (x->second).size();
        for(i = 0; i < sz; i++){
            cout << " " << (x->second)[i] ;
        }
        cout << endl;
    }
    cout << endl;
    */

    cin >> n;
    getchar();
    for(i = 0; i < n; i++){
        getline(cin, s);
        cout << "Search results for \"" << s << "\":\n";
        len = s.size();
        cnt = 0;
        for(j = 0; j < len; j++){
            if(islower(s[j])){
                word += s[j];
            }
            else if(isupper(s[j])){
                word += tolower(s[j]);
            }
            else if(!word.empty()){
                if(mp.find(word) != mp.end()){
                    sz = mp[word].size();
                    for(k = 0; k < sz; k++){
                        check[mp[word][k]]++;
                    }
                }
                cnt++;
                word.clear();
            }
        }
        if(!word.empty()){
                if(mp.find(word) != mp.end()){
                    sz = mp[word].size();
                    for(k = 0; k < sz; k++){
                        check[mp[word][k]]++;
                    }
                }
                cnt++;
                word.clear();
            }

        for(auto x = check.begin(); x != check.end(); x++){
            if((x->second) == cnt){
                cout << "- \"" << senc[x->first] << "\"\n";
            }
        }
        check.clear();
    }
}

Thank you!

------from ProLights

ProLightsfxjh

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
XVII Open Cup named after E.V. Pankratiev. Eastern Grand Prix. Problem G. Gmoogle 模拟、字符串处理、文本搜索

题意：要求模拟一个搜索系统，给出文本，然后每次查询几个单词要求输出所以出现查询单词的句子。模拟、字符串处理、文本搜索先把文本预处理成一个一个单独的句子，并标号0、1、2......，并且用map<string, vector<int>>建立单词到句子的映射。然后对于每个单独查询的每个单词都会有一个集合，然后对这些集合取一个交集就是答案了。这里用到的求交集的方法是是用一个map<int, int> check表示这些集合里每个句子出现的次数，最后遍历一遍check，出现次数为查询的单词的个数的句子
复制链接

扫一扫